This is true both in terms of optimizing PDF for search as well as for human visitors. Recently, there was an article on MarketingSherpa (membership required) that set forth 10 tips for optimizing PDFs for search. The tips listed excluded some of the most crucial factors in PDF optimization, including specifying the reading order and how to influence PDF search result descriptions to promote actual click-through. Without these and other matters addressed, your PDF stands little chance of clicked on, even if it does rank highly in the search engine results.
Although most SEO experts would recommend placing content on the web in html form, PDFs still play a vital role, especially in B2B marketing. Research (and our own experience as a B2B marketing firm) has shown that some of the content most sought by business purchasers are case studies, whitepapers, and technical articles—and PDFs are a great vehicle for these content-heavy B2B marketing pieces. Business purchasers specifically looking for this type of information can narrow their search to look exclusively for PDFs using the advance search options of search engines like Google and Yahoo.
The source documents from which PDFs are created are often very expensive to write and design (e.g., brochures, product sheets, etc.) Yet, when it comes to making a PDF, B2B marketers invest a couple minutes to convert such documents into a PDF and post it. What a waste! If you’re going to make a significant investment in a printed piece, why wouldn’t spend even an hour to help ensure your prospects can find it?
Here’s a comprehensive guide to optimizing PDFs for search engines, searchers, and site visitors.
Make sure you create text-based PDFs
Search engines read text. If you want to optimize PDFs for search, your PDFs need to be text-based. If you’re using a text-based software like Microsoft Word, this isn’t an issue. Programs like InDesign and Quark will also create text-based PDFs. However, if you’re creating your document in a program like Photoshop, your output is likely going to be an image-based PDF, which the search engines can’t read.
If you don’t know how the PDF was created and you want to know whether it is text-based, it is not enough to place your cursor over the text to see if you can select the words. Some PDFs created with Photoshop have selectable text (i.e., you can select copy with your mouse), but Acrobat still recognizes it as part of an image. See the image below.
The only way to be sure is to select Advanced>Accessibility>Add Tags to Document and then select Advanced>Accessibility>TouchUp Reading Order. Acrobat will then show you the various parts of the PDF, including what is truly text and what is seen as images.
If you have existing PDFs that are image-based, provided the resolution is high enough (>144 dpi), you can attempt to have Acrobat identify and convert the text in the document. To do this, go to Documents>Recognize Text Using OCR. Be sure to specify Formatted Text and Graphics as the desired conversion. I hesitate to even bother recommending this tip because if the dpi of the images in the image-based PDF exceeds 144 dpi, you’re probably not going to want to use it for the web anyway. The file would likely be larger than you would want to post on the web.
Specify document properties for the PDF
Most PDFs are created without specifying document properties. Specifiable document properties include Title, Author, Subject, and Keywords. To specify document properties in Acrobat, go to File>Document Properties.
In optimizing PDFs for search, the most important property is the Title. And although the file name of the PDF will show as the title tag of the PDF document window in Acrobat, the file name is not the PDF’s Title. The Title of the PDF can only be specified in the document properties, and it is invisible when viewing the PDF.
Failing to specify and optimize the Title property will hurt not only your chances of ranking well, but also your chances of click-through. The Title property is the equivalent of an html title tag. As such, it represents the words that will usually be displayed as the heading of a search result. Just as with an html title tag, you’ll want to specify an enticing, keyword-rich Title property for each PDF. And remember, the Title property isn’t just for search engines; it needs to promote click-through, too. While the Title property can be essentially as long as you want it to be, keep in mind that a search engine like Google is only going to display about 65 characters.
We’ve all seen some pretty strange looking headings for PDFs in the search results. Not only do they look ridiculous, they probably won’t get clicked. If you don’t specify the Title property, the search engine will use the PDF content to create a search result heading.
While you’re still in the Acrobat document properties section, go ahead and complete the other document properties of Author, Subject, and Keywords. These presently appear to have no bearing on search results, but I would complete them anyway. While the search engines aren’t going to put any weight behind meta keywords, perhaps in the future they may treat the Subject property in a manner similar to how they currently treat meta descriptions. Besides, if you don’t complete them now, it won’t be fun to go back and do this later to all the PDFs on your site.
Optimize PDF copy
The same rules used for optimizing web page content apply in optimizing PDF content. Make sure you’re visible copy elements are optimized.
Keep the content of the PDF focused
Because PDFs are often multi-page documents, especially in the case of whitepapers, technical articles, and case studies (which are great for B2B marketers), they represent the opportunity to offer deep content on a focused subject. Search engines like content. More correctly, search engines like focused, authoritative content. A highly focused, well-optimized PDF has more chance of ranking well for a related search term than a PDF that appears to focus on many different concerns. If you have large PDFs that cover multiple topics in depth, consider splitting them into smaller PDFs.
Specify the reading order
This is one of the most important steps for optimizing PDFs for search. Although search engines will usually display the PDF Title as the search result’s heading, search engines typically look to the content of the PDF to extract the description displayed under the search result. That can lead to some undesirable results. Check out the search result below.
This is not a very enticing search result, even though it is currently the #2 ranking in the Google search results for transit seating. Although American Seating is an industry leader in this category, this search result is actually hurting their image. The actual PDF pertains to one specific seat model, and it’s an old one at that. It would be far better if there were a great PDF on transit seating or if the link went to the company’s transit seating section on its website.
All that’s somewhat of a moot point, though. While the heading of the search result is okay, the description under that heading is terrible—and it’s not likely to get click on in the first place. Why did Google display this information? Because it’s the first thing Google read in the PDF.
Every PDF has a reading order. Properly optimized web pages have valuable content read first. The same is true for optimized PDFs. To determine the reading order of your PDF, select Advanced>Accessibility>Add Tags to Document in Acrobat. Then select Advanced>Accessibility>Touch Up Reading Order. The reading order of the PDF will then be displayed.
The reading order of the transit seating PDF above does not start with valuable content. Rather, lots of unimportant areas of the PDF are “read” prior to reading the most important content. Google displayed what it read first. If you want PDFs to be optimized for search, make sure you understand the reading order of the PDF and use the Touch Up Reading Order tool to manage what the search engine will read first.
Influence meta descriptions
Usually (but not always) search engines will display the specified meta description of a given web page when returning that web page in the search results. While this is great for web pages, there is no corresponding easy option for PDFs.
The best way to influence what is displayed as a description under the search result heading is to ensure that the first thing that the search engine reads is that description.
Depending on the layout and copy of your PDF, an appropriate description may be found in the first sentence or two of copy on the first page of the PDF. If so, merely specify the reading order to make sure that paragraph is read first.
But what if that’s not how you want to start copy for the PDF? Or what if that copy is not on the first page?
If that’s not how you want to start copy for the PDF, I’d recommend considering a descriptive “footer.” I wouldn’t actually make it a footer using Acrobat, but I would place normal text as if it were a footer on the page. Then specify the reading order to make sure this footer gets read first. When you write the copy for that footer, keep in mind that it will likely be the words that are displayed under the search result heading. Make sure the description is keyword-rich, promotes click-through from the search results, and makes sense to the PDF reader when they’re looking at the page.
If your copy doesn’t start until page three of the PDF (e.g., a cover graphic with cutlines followed by a table of contents on page two), you’ll want to make sure the search engine isn’t reading any copy on the first two pages. There is no way to make the reading order of a PDF start on page three. Accordingly, you’ll have to make the first to pages into images.
Tag your PDFs
Just as you tag parts of an html document for search optimization, you also should tag PDF contents. Tagging of content (e.g., heading, text, image descriptions) created in other programs generally will not transfer to the PDF when it is created. A quick way to check if anything in your PDF is tagged is to check the document properties.
Using Acrobat, select Advanced>Accessibility>Add Tags to Document. Acrobat will give you a document report and recommend things you may want to consider changing. Then select Advanced>Accessibility>TouchUp Reading Order. You’ll have the ability to tag text, headings, and alternate text for images, etc.
To add ALT tags to images, click on the Show Order button on the TouchUp Reader Order panel. Then click on the Content tab in the Order Panel. From there you can select an image. Then click on the Options pull-down and select properties.
Build links into PDFs
Be sure you include links in your PDFs, and pay attention to the anchor text used. Search engines do recognize these links. Not very often, but sometimes you’ll find backlinks in PDFs. Their limited occurrence, however, is likely related to the fact that most people don’t put links into PDFs; most people treat PDFs as static print documents.
There’s a good business reason to include links, too. Often, PDFs are passed along to others via email. Recipients will be reading the PDF in isolation (i.e., they didn’t get it from your website.) By placing links into PDFs, you give these readers an easy way to click back into your site, where you can further influence them. At a minimum, include a link to your website in the contact information you provide in the PDF
Verify your links
If you have links in your original document, those links likely won’t transfer when creating the PDF. They may look like links, but they won’t be active links. After the PDF is created, test all links in the document. Use Acrobat to insert hyperlinks in the PDF. Click on the Link tool or choose Tools>Advanced Editing>Link Tool to insert hyperlinks.
Save the PDF as an accessible version
While search engines’ capabilities tend to lag new versions of Acrobat. Although Acrobat 8 is out, save your PDFs as version 1.6 (Acrobat 7) or lower to ensure search engines can index the content.
Not only is saving PDFs at a lower version good for the search engines, it’s also good for users. Not everyone has the latest versions of Acrobat Reader. Accordingly, I’d recommend saving PDFs as version 1.5 or lower. This way it will be good for search engines and most readers.
Optimize the file size for search
Keep the file size of PDFs manageable—both for search engines and searchers. A huge PDF file is annoying and unnecessary. People wishing to see the PDF may abandon the download if it takes too long. Search engines don’t like huge PDFs either. If the file is too big, the search engines may abandon indexing of its content.
Older versions of Acrobat didn’t have great options for optimizing the file size of an existing PDF. The most apparent way to Acrobat will optimize the file size if you choose File>Reduce File Size. However, as noted above, the only way you can reduce the file size with this tool is to specify the compatibility of the PDF. If you make the PDF less compatible with earlier versions of Acrobat, the file size will be reduced. However, not everyone has the latest version of Acrobat. So don’t try to reduce file size by making your PDF compatible with only the latest version of Acrobat.
Newer versions of Acrobat allow you to choose Advanced > PDF Optimizer to reduce the size of the file. This option allows you to specify the compression settings of various components of the PDF. It works quite well, but you’ll have to sometimes try a few different settings with each file to strike the right balance between the quality of rendered images and file size.
Ideally, the best way to manage file size of PDFs is use the available tools in the software that created the PDF. For instance, both InDesign and Quark allow the user to specify the compression settings for images and the target resolution of the final document.
Enable your PDFs for fast web view
Fast Web View is an Abobe Acrobat option that allows your web-posted PDF to be rendered more quickly. Instead of waiting until the entire PDF’s data is downloaded before rendering the PDF on screen, Fast Web View allows the PDF to be rendered a page at a time. So as soon as the first page is processed, the first page is rendered. You can check whether your PDF is enabled for Fast Web View by checking the document properties.
This tip isn’t necessarily important for search engines to index PDFs, but it is important for site visitors and web searchers. When searchers get PDFs in the search results, there is generally an option to view as html or as a PDF. Enabling Fast Web View helps ensure that impulsive and impatient web searchers don’t abandon your PDF before it even gets a chance to become visible to the searcher. You can enable Fast Web View by going to the Preferences>General Settings panel in Adobe Acrobat. This allows the PDF to be “loaded” a page at a time, rather than waiting for the whole PDF to download.
Watch where you place PDFs on your site
If you want your PDFs to be found an indexed by the search engines, don’t bury them in your site architecture. Keep links to PDFs closer to the root level of the site’s file structure and within a few clicks of the home page.
Use keyword-rich anchor text to link to PDFs
The anchor text of links pointing to the PDF is important in search optimization. This is true for links on your site as well as other sites linking to your PDF. Ideally, anchor text should reflect the keyword strategy of the PDF.
While it’s true that search engines don’t like duplicate content, contrary to what was noted in the MarketingSherpa article, search engines won’t punish you in the rankings for duplicate content. When it comes to duplicate web pages, search engines are merely going to select only one of these pages to index.
In web pages, a site may have the same page content in different forms. For instance the same products may be on the page when they are organized by price as when they are organized by size. If both of these pages can be indexed by the search engine, the search engine is only going to choose one of these pages to index. The site owner should select the page she wants indexed and noindex the other pages.
The issue of duplicate content really doesn’t apply to PDFs in the same manner; the PDF is the PDF. The area where duplicate content does come into play is when a web page’s contents are the same as a PDF’s contents. In that case, alter the contents of one of these vehicles if you want both to be indexed by the search engines.
Don’t do anything in a PDF that you wouldn’t do in a web page
This includes keyword stuffing, small or invisible text, text behind images, etc. Enough said.
Recheck things before you post the PDF
Lastly, it’s real easy to make small, inadvertent changes that have a big impact, both in terms of optimizing PDFs for search and for human readers, especially if, lets say, you happen to use two computers at different times to optimize or edit the PDF. The computers may have different preference settings. this could affect Fast Web View, the version at which the pdf is saved, changes to the reading order. If you’re going to further optimize existing PDFs, may sure you check all of these things before posting a new version of the PDF.
Galen De Young is a recognized expert in the field of B2B search marketing and Managing Director of both the B2B marketing consulting firm Proteus B2B and its B2B search marketing division Proteus SEO. He writes a regular column for Search Engine Land on B2B search marketing. His articles have been published by many other organizations, including MarketingProfs and the Direct Marketing Association. Galen is a frequent speaker at colleges and universities, ad industry events, and search marketing conferences. He also blogs at b2b-seo.com.