What you don’t know about optimizing PDFs can hurt you | 17 Strategies for success

Galen De Young ( @GalenDY ) | April 1, 2008 · 23 comments | B2B Search Marketing

This is true both in terms of optimizing PDF for search as well as for human visitors. Recently, there was an article on MarketingSherpa (membership required) that set forth 10 tips for optimizing PDFs for search. The tips listed excluded some of the most crucial factors in PDF optimization, including specifying the reading order and how to influence PDF search result descriptions to promote actual click-through. Without these and other matters addressed, your PDF stands little chance of clicked on, even if it does rank highly in the search engine results.

Although most SEO experts would recommend placing content on the web in html form, PDFs still play a vital role, especially in B2B marketing. Research (and our own experience as a B2B marketing firm) has shown that some of the content most sought by business purchasers are case studies, whitepapers, and technical articles—and PDFs are a great vehicle for these content-heavy B2B marketing pieces. Business purchasers specifically looking for this type of information can narrow their search to look exclusively for PDFs using the advance search options of search engines like Google and Yahoo.

The source documents from which PDFs are created are often very expensive to write and design (e.g., brochures, product sheets, etc.) Yet, when it comes to making a PDF, B2B marketers invest a couple minutes to convert such documents into a PDF and post it. What a waste! If you’re going to make a significant investment in a printed piece, why wouldn’t spend even an hour to help ensure your prospects can find it?

Here’s a comprehensive guide to optimizing PDFs for search engines, searchers, and site visitors.

Make sure you create text-based PDFs

Search engines read text. If you want to optimize PDFs for search, your PDFs need to be text-based. If you’re using a text-based software like Microsoft Word, this isn’t an issue. Programs like InDesign and Quark will also create text-based PDFs. However, if you’re creating your document in a program like Photoshop, your output is likely going to be an image-based PDF, which the search engines can’t read.

If you don’t know how the PDF was created and you want to know whether it is text-based, it is not enough to place your cursor over the text to see if you can select the words. Some PDFs created with Photoshop have selectable text (i.e., you can select copy with your mouse), but Acrobat still recognizes it as part of an image. See the image below.

The only way to be sure is to select Advanced>Accessibility>Add Tags to Document and then select Advanced>Accessibility>TouchUp Reading Order. Acrobat will then show you the various parts of the PDF, including what is truly text and what is seen as images.

text that is not text in a pdf

If you have existing PDFs that are image-based, provided the resolution is high enough (>144 dpi), you can attempt to have Acrobat identify and convert the text in the document. To do this, go to Documents>Recognize Text Using OCR. Be sure to specify Formatted Text and Graphics as the desired conversion. I hesitate to even bother recommending this tip because if the dpi of the images in the image-based PDF exceeds 144 dpi, you’re probably not going to want to use it for the web anyway. The file would likely be larger than you would want to post on the web.

Specify document properties for the PDF

Most PDFs are created without specifying document properties. Specifiable document properties include Title, Author, Subject, and Keywords. To specify document properties in Acrobat, go to File>Document Properties.

PDF document properties box

In optimizing PDFs for search, the most important property is the Title. And although the file name of the PDF will show as the title tag of the PDF document window in Acrobat, the file name is not the PDF’s Title. The Title of the PDF can only be specified in the document properties, and it is invisible when viewing the PDF.

Failing to specify and optimize the Title property will hurt not only your chances of ranking well, but also your chances of click-through. The Title property is the equivalent of an html title tag. As such, it represents the words that will usually be displayed as the heading of a search result. Just as with an html title tag, you’ll want to specify an enticing, keyword-rich Title property for each PDF. And remember, the Title property isn’t just for search engines; it needs to promote click-through, too. While the Title property can be essentially as long as you want it to be, keep in mind that a search engine like Google is only going to display about 65 characters.

We’ve all seen some pretty strange looking headings for PDFs in the search results. Not only do they look ridiculous, they probably won’t get clicked. If you don’t specify the Title property, the search engine will use the PDF content to create a search result heading.

While you’re still in the Acrobat document properties section, go ahead and complete the other document properties of Author, Subject, and Keywords. These presently appear to have no bearing on search results, but I would complete them anyway. While the search engines aren’t going to put any weight behind meta keywords, perhaps in the future they may treat the Subject property in a manner similar to how they currently treat meta descriptions. Besides, if you don’t complete them now, it won’t be fun to go back and do this later to all the PDFs on your site.

Optimize PDF copy

The same rules used for optimizing web page content apply in optimizing PDF content. Make sure you’re visible copy elements are optimized.

Keep the content of the PDF focused

Because PDFs are often multi-page documents, especially in the case of whitepapers, technical articles, and case studies (which are great for B2B marketers), they represent the opportunity to offer deep content on a focused subject. Search engines like content. More correctly, search engines like focused, authoritative content. A highly focused, well-optimized PDF has more chance of ranking well for a related search term than a PDF that appears to focus on many different concerns. If you have large PDFs that cover multiple topics in depth, consider splitting them into smaller PDFs.

Specify the reading order

This is one of the most important steps for optimizing PDFs for search. Although search engines will usually display the PDF Title as the search result’s heading, search engines typically look to the content of the PDF to extract the description displayed under the search result. That can lead to some undesirable results. Check out the search result below.

This is not a very enticing search result, even though it is currently the #2 ranking in the Google search results for transit seating. Although American Seating is an industry leader in this category, this search result is actually hurting their image. The actual PDF pertains to one specific seat model, and it’s an old one at that. It would be far better if there were a great PDF on transit seating or if the link went to the company’s transit seating section on its website.

All that’s somewhat of a moot point, though. While the heading of the search result is okay, the description under that heading is terrible—and it’s not likely to get click on in the first place. Why did Google display this information? Because it’s the first thing Google read in the PDF.

Every PDF has a reading order. Properly optimized web pages have valuable content read first. The same is true for optimized PDFs. To determine the reading order of your PDF, select Advanced>Accessibility>Add Tags to Document in Acrobat. Then select Advanced>Accessibility>Touch Up Reading Order. The reading order of the PDF will then be displayed.

The reading order of the transit seating PDF above does not start with valuable content. Rather, lots of unimportant areas of the PDF are “read” prior to reading the most important content. Google displayed what it read first. If you want PDFs to be optimized for search, make sure you understand the reading order of the PDF and use the Touch Up Reading Order tool to manage what the search engine will read first.

Influence meta descriptions

Usually (but not always) search engines will display the specified meta description of a given web page when returning that web page in the search results. While this is great for web pages, there is no corresponding easy option for PDFs.

The best way to influence what is displayed as a description under the search result heading is to ensure that the first thing that the search engine reads is that description.

Depending on the layout and copy of your PDF, an appropriate description may be found in the first sentence or two of copy on the first page of the PDF. If so, merely specify the reading order to make sure that paragraph is read first.

But what if that’s not how you want to start copy for the PDF? Or what if that copy is not on the first page?

If that’s not how you want to start copy for the PDF, I’d recommend considering a descriptive “footer.” I wouldn’t actually make it a footer using Acrobat, but I would place normal text as if it were a footer on the page. Then specify the reading order to make sure this footer gets read first. When you write the copy for that footer, keep in mind that it will likely be the words that are displayed under the search result heading. Make sure the description is keyword-rich, promotes click-through from the search results, and makes sense to the PDF reader when they’re looking at the page.

If your copy doesn’t start until page three of the PDF (e.g., a cover graphic with cutlines followed by a table of contents on page two), you’ll want to make sure the search engine isn’t reading any copy on the first two pages. There is no way to make the reading order of a PDF start on page three. Accordingly, you’ll have to make the first to pages into images.

Tag your PDFs

Just as you tag parts of an html document for search optimization, you also should tag PDF contents. Tagging of content (e.g., heading, text, image descriptions) created in other programs generally will not transfer to the PDF when it is created. A quick way to check if anything in your PDF is tagged is to check the document properties.

document properties tagging

Using Acrobat, select Advanced>Accessibility>Add Tags to Document. Acrobat will give you a document report and recommend things you may want to consider changing. Then select Advanced>Accessibility>TouchUp Reading Order. You’ll have the ability to tag text, headings, and alternate text for images, etc.

To add ALT tags to images, click on the Show Order button on the TouchUp Reader Order panel. Then click on the Content tab in the Order Panel. From there you can select an image. Then click on the Options pull-down and select properties.

adding alt tags to pdfs

Build links into PDFs

Be sure you include links in your PDFs, and pay attention to the anchor text used. Search engines do recognize these links. Not very often, but sometimes you’ll find backlinks in PDFs. Their limited occurrence, however, is likely related to the fact that most people don’t put links into PDFs; most people treat PDFs as static print documents.

There’s a good business reason to include links, too. Often, PDFs are passed along to others via email. Recipients will be reading the PDF in isolation (i.e., they didn’t get it from your website.) By placing links into PDFs, you give these readers an easy way to click back into your site, where you can further influence them. At a minimum, include a link to your website in the contact information you provide in the PDF

Verify your links

If you have links in your original document, those links likely won’t transfer when creating the PDF. They may look like links, but they won’t be active links. After the PDF is created, test all links in the document. Use Acrobat to insert hyperlinks in the PDF. Click on the Link tool or choose Tools>Advanced Editing>Link Tool to insert hyperlinks.

Save the PDF as an accessible version

While search engines’ capabilities tend to lag new versions of Acrobat. Although Acrobat 8 is out, save your PDFs as version 1.6 (Acrobat 7) or lower to ensure search engines can index the content.

Not only is saving PDFs at a lower version good for the search engines, it’s also good for users. Not everyone has the latest versions of Acrobat Reader. Accordingly, I’d recommend saving PDFs as version 1.5 or lower. This way it will be good for search engines and most readers.

Optimize the file size for search

Keep the file size of PDFs manageable—both for search engines and searchers. A huge PDF file is annoying and unnecessary. People wishing to see the PDF may abandon the download if it takes too long. Search engines don’t like huge PDFs either. If the file is too big, the search engines may abandon indexing of its content.

Older versions of Acrobat didn’t have great options for optimizing the file size of an existing PDF. The most apparent way to Acrobat will optimize the file size if you choose File>Reduce File Size. However, as noted above, the only way you can reduce the file size with this tool is to specify the compatibility of the PDF. If you make the PDF less compatible with earlier versions of Acrobat, the file size will be reduced. However, not everyone has the latest version of Acrobat. So don’t try to reduce file size by making your PDF compatible with only the latest version of Acrobat.

Newer versions of Acrobat allow you to choose Advanced > PDF Optimizer to reduce the size of the file. This option allows you to specify the compression settings of various components of the PDF. It works quite well, but you’ll have to sometimes try a few different settings with each file to strike the right balance between the quality of rendered images and file size.

Ideally, the best way to manage file size of PDFs is use the available tools in the software that created the PDF. For instance, both InDesign and Quark allow the user to specify the compression settings for images and the target resolution of the final document.

Enable your PDFs for fast web view

Fast Web View is an Abobe Acrobat option that allows your web-posted PDF to be rendered more quickly. Instead of waiting until the entire PDF’s data is downloaded before rendering the PDF on screen, Fast Web View allows the PDF to be rendered a page at a time. So as soon as the first page is processed, the first page is rendered. You can check whether your PDF is enabled for Fast Web View by checking the document properties.

fast web view for pdfs

This tip isn’t necessarily important for search engines to index PDFs, but it is important for site visitors and web searchers. When searchers get PDFs in the search results, there is generally an option to view as html or as a PDF. Enabling Fast Web View helps ensure that impulsive and impatient web searchers don’t abandon your PDF before it even gets a chance to become visible to the searcher. You can enable Fast Web View by going to the Preferences>General Settings panel in Adobe Acrobat. This allows the PDF to be “loaded” a page at a time, rather than waiting for the whole PDF to download.

Watch where you place PDFs on your site

If you want your PDFs to be found an indexed by the search engines, don’t bury them in your site architecture. Keep links to PDFs closer to the root level of the site’s file structure and within a few clicks of the home page.

Use keyword-rich anchor text to link to PDFs

The anchor text of links pointing to the PDF is important in search optimization. This is true for links on your site as well as other sites linking to your PDF. Ideally, anchor text should reflect the keyword strategy of the PDF.

Duplicate content

While it’s true that search engines don’t like duplicate content, contrary to what was noted in the MarketingSherpa article, search engines won’t punish you in the rankings for duplicate content. When it comes to duplicate web pages, search engines are merely going to select only one of these pages to index.

In web pages, a site may have the same page content in different forms. For instance the same products may be on the page when they are organized by price as when they are organized by size. If both of these pages can be indexed by the search engine, the search engine is only going to choose one of these pages to index. The site owner should select the page she wants indexed and noindex the other pages.

The issue of duplicate content really doesn’t apply to PDFs in the same manner; the PDF is the PDF. The area where duplicate content does come into play is when a web page’s contents are the same as a PDF’s contents. In that case, alter the contents of one of these vehicles if you want both to be indexed by the search engines.

Don’t do anything in a PDF that you wouldn’t do in a web page

This includes keyword stuffing, small or invisible text, text behind images, etc. Enough said.

Recheck things before you post the PDF

Lastly, it’s real easy to make small, inadvertent changes that have a big impact, both in terms of optimizing PDFs for search and for human readers, especially if, lets say, you happen to use two computers at different times to optimize or edit the PDF. The computers may have different preference settings. this could affect Fast Web View, the version at which the pdf is saved, changes to the reading order. If you’re going to further optimize existing PDFs, may sure you check all of these things before posting a new version of the PDF.

Galen De Young is a recognized expert in the field of B2B search marketing and Managing Director of both the B2B marketing consulting firm Proteus B2B and its B2B search marketing division Proteus SEO. He writes a regular column for Search Engine Land on B2B search marketing. His articles have been published by many other organizations, including MarketingProfs and the Direct Marketing Association. Galen is a frequent speaker at colleges and universities, ad industry events, and search marketing conferences. He also blogs at b2b-seo.com.

Leave a Comment

{ 14 comments… read them below or add one }

Graham Strong April 3, 2008 at 10:53 am

Hi Galen,

I followed you over from your Writing White Papers post. What a great post!

Thanks again for the info — yes, I do intend to spend some extra time now to optimize my PDFs. (And you just landed a new subscriber.)

Talk to you soon,

~Graham

Reply

Galen De Young April 3, 2008 at 12:09 pm

Thanks, Graham!

Reply

Andrew April 4, 2008 at 6:46 pm

Fantastic article! I got a question the other day about optimizing PDF’s, so your timing is great. Thank you so much for this info.

Reply

Mark Alves April 5, 2008 at 10:29 am

I’m following Graham’s lead and am subscribing, too. I found you through
http://www.seomoz.org/blog/thursday-roundup-for-the-week-of-33008

This is the first time I’ve seen the touch-up reading order tip after reading many PDF optimization articles.

Another suggestion is picking a meaningful file name. In addition to the optimization benefit, you’ll also give readers a chance of remembering what the document was if they later refer to their list of saved PDFs.

Reply

Paul Burani, Clicksharp Marketing April 7, 2008 at 4:37 pm

Wow — amazing where assumptions will get you (nowhere). I always thought that the metadata was the extend of what you could do to optimize a PDF. Nice work Galen.

Reply

Chris Moritz April 24, 2008 at 4:54 pm

Thorough and helpful look at the topic. Although I have to castigate you a bit for generating a ton of metadata work for me, as now I’m going to have to go back and reformat a ton of PDFs!

Any advice on creating accessible and search-friendly PDFs correctly the first time in InDesign?

Reply

Galen De Young April 24, 2008 at 5:08 pm

Thanks, Chris

It is a lot of work…best to do things right the first time, when you create them. Going back to redo things isn’t fun.

Re InDesign, just watch the compression settings for images and the target resolution of the final document. And make sure you re-read the issues regarding reading order…things like remembering footers to influence the meta description and also perhaps making the first page or two image only if your real text doesn’t start until a subsequent page. Most of the other things you can do in Acrobat later.

Good luck.

Reply

Deborah May 12, 2008 at 2:56 pm

Great advice, glad to see all the good tips. I’ve been optimizing PDFs for several years. In the past few months, I’ve changed my workflow to use many of the tips you’ve posted.

Regarding “And although the file name of the PDF will show as the title tag of the PDF document window in Acrobat, the file name is not the PDF’s Title”, you can change the title tag of the PDF document window to display the title tag.

Go to File > Properties > Initial View. In the Windows Options section, change the Show drop down menu to “Document Title”. Click OK.

An option to consider instead of the File > Reduce File Size, is the Advanced > PDF Optimizer which allows you to configure the settings for image downsampling, saving for a specific Acrobat version, and many other settings.

Using File > Reduce File Size doesn’t give you any control over how Acrobat chooses to reduce the file size, and you may find the resulting images and text are not the quality you expect.

Advanced > PDF Optimizer allows you to customize how the file will be reduced in size.

Reply

Dinara Seitova July 23, 2008 at 8:34 pm

Thank you, great article. Is there a way to meta tag PDFs that are “gated”, meaning they require a registration? Thank you!

Reply

Galen De Young July 24, 2008 at 11:46 am

Hi, Dinara

I don’t think there’s any way to accomplish what you’re trying to do through meta tags. Generally, a PDF requiring registration is blocked by the registration process, which is generally independent of the PDF iteself. What you may want to consider doing is providing an optimized excerpt of the PDF either on or before the registration page. Make sure you optimize the other components of that excerpt page as well, not just the copy you’ve extracted.

Galen

Reply

John Rasco August 31, 2009 at 12:59 pm

Great post, Galen. It’s amazing to me how many PDFs are posted on the website, but were intended for the printer…odd sizes, printer spreads or oversize paper stock. We recommend creating versions which work with 8.5 x 11 paper, so if someone hits “print,” they have a useful document.

Reply

Steve Williams October 8, 2009 at 5:40 pm

Great post! Very rarely do I come across a post that hits every point like this one. Thanks!

Reply

John Keating September 24, 2010 at 4:03 am

Thank you for this information. I was just about to create approx 10 PDF’s for download and didnt even consider the depth of detail you have covered here. Excellent post and of course, like all good ideas, its almost common sense. I just hope these values still hold in google today.

Reply

Galen De Young ( @GalenDY ) September 24, 2010 at 8:52 am

Hi, John. Glad the information was helpful. It’s been a while since we posted this information, and we probably should revisit it and edit it for any new changes, but the aim of the tips remains the same…helping to ensure search engines can properly get to the information and helping make sure the information is optimized. Those values still hold true for Google and the other search engines. Thanks for leaving a comment.

Reply

{ 9 trackbacks }

Previous post:

Next post: