In the earlier days of search engine optimization (SEO), it was widely believed that PDF documents were unable to be indexed by Google, and as a result of this misconception, PDFs have largely been overlooked in SEO activities for many years. In reality, Google can easily process PDFs, and there actually exists a sizable SEO opportunity that comes with optimizing these types of files. Long-form, text-rich pieces of content like PDFs often contain the most important subjects of a website, which makes them incredibly valuable for search visitors.
While some would argue that HTML-based content is always the best approach when dealing with search engines, the truth is that many B2B and government websites have a large collection of PDF assets, such as whitepapers, technical briefs, and brochures. Additionally, brand elements may exist in print materials that are imperative to a prospect’s overall experience; and, even if a company wanted to convert all of their PDFs to HTML, many lack the time, budget, or expertise to undergo such a task.
In order to start sending quality organic traffic to your PDFs, it’s important to make sure they are properly optimized for both search engine crawlability and user readability. Below is a comprehensive list of best practices to help you get started creating SEO-friendly PDF documents.
Ensure Your PDFs Are Text-Based
Because so many PDFs were created without SEO in mind, a large number of them are completely image-based, making them difficult for search engines to understand in their entirety. While the ability of search engines to crawl and process images has improved tremendously over time, they still understand plain text much better than images. Creating mostly text-based PDFs will allow search engines to crawl your content to its fullest capacity, especially when it comes to important headings and subheadings that define different sections of your PDF.
Add Alt Tags to All Your Images
While your entire PDF should not be a single image, there’s nothing wrong with adding images within the body of your document. Just like a web page, it’s important to add alternate text (alt text for short) to your images in order to allow search engines to understand the image’s content. An image’s alt text should use descriptive SEO keywords and should be unique to each individual image within your document. If you’re using the most recent version of Adobe Acrobat, you can add alt text to an image by navigating to Tools > Protect & Standardize > Accessibility > Set Alternative Text (Fig 1.) This will prompt you with a simple editor window, allowing you to quickly add alt text to all of your document’s images.
Give Your Document a Descriptive File Name
Be sure to include the keywords you want to rank for in the physical name of the PDF file; this makes the content both user and search friendly. The search engine results page will always display the entire URL, meaning a keyword-focused file name can help to increase your click-through-rate (Fig. 2). Always separate the words in a file name with dashes or underscores as it makes it easier for the search engine to process each individual keyword.
Define a Title (And Other Metadata)
The title of your PDF can easily be set in the Documents Properties (Fig.3) section of your PDF editor. Search engines use this field in the exact same way they use a <title> tag in an HTML document, so it’s extremely important to title your PDF with the keywords you’re seeking to rank for, as this is the hyperlinked text a user sees in the search results. For example, in Fig 2., the title of the PDF is Quick Guide to RSS CONTENT SUBSCRIBING. If you don’t define this title on your own, search engines can pull a title directly from your content; although, it might not be the title you want.
Because PDFs are typically long-form and cover specific content, it’s important to hone in on a single keyword or phrase and title your PDF appropriately; a broad keyword strategy will typically not be successful. Additionally, while there is the option to add other metadata such as author, subject, and keywords, these fields don’t seem to have any effect on search results.
Define Reading Order Clearly
Identifying and defining different sections of important content by using headings and subheadings in an HTML based document is often as easy as wrapping the copy in some simple tags; for example, a <h1> tag for a heading and a <h2> for a subheading. With PDFs, it’s not always that rudimentary. Fortunately, most PDF editing software allows users to manually classify content with the Reading Order Tool. With this, you can go into your PDF, highlight titles, headings, and paragraphs, and then designate everything as you would in an HTML document (Fig. 4). If your document contains a variety of subsections with important keywords as their headings, it might be worth spending some time with this tool and properly segmenting your PDF.
Compress Images & Overall File Size
Load speed is an important factor when determining SEO rankings for standard webpages; PDFs are no different. If your document contains a variety of images or charts, make sure they are all compressed as best as possible. Web tools like TinyPNG can help shrink image size if you’re without a creative team or resources. After you’ve minified your images, it’s also worth compressing the size of the actual PDF as well. This can be done with online tools like SmallPDF, or, if you’re using Adobe Acrobat, by following the below steps:
- Open a PDF file in Acrobat
- Navigate to Document > Reduce File Size.
- Choose Acrobat 8.0 And Later for file compatibility, and click OK.
- Name the modified file. Click Save to complete the process.
- Minimize the Acrobat window. View the size of the reduced file. The file size is smaller.
Don’t Forget Your Backlinks
Depending on who you ask, backlinks are one of the most important factors in search engine optimization. Add both internal and external links as appropriate, as they can be used by both crawlers and users alike. The most beneficial part of internal backlinking is that if other websites are hosting your PDF, then your site receives both the backlink embedded within the PDF and the traffic that potentially comes with it.
Track Your PDF’s Performance
You can’t manage what you can’t measure, so if you can’t see the results of your PDF optimizations, what was the point of your efforts? If you’re using Google Analytics (and you should be), then setting up goals within the platform can help you monitor PDF performance. For starters, consider tracking organic visitors who downloaded your PDF. This would give you visibility not only into your organic traffic but also how users are engaging with your content at a granular level. If you received a large amount of traffic but not a single visitor downloaded your PDF, then it might be time to reevaluate your content.
What If I Don’t Want My PDFs to Appear In Search?
For whatever reason, you might not want your PDFs appearing search results; maybe you’re hosting your PDF files openly on your site, but require a form submission to actually access the content, or your PDF contains sensitive information that you don’t want readily available to the worldwide web. In any case, it’s possible to prevent search engines from crawling your documents. The easiest way to do this is to add an X-Robots-Tag: noindex in the HTTP header that serves the file. If you need help adding this, refer to this guide regarding noindex from Google Developers. If the need to hide a PDF is more urgent, you can use the URL Removal Tool from Google.