PDF and Web Site Searching
As mentioned above, PDF files are hard on search engines, and HTML pages are much easier for them to deal with. However, if you must have PDF, please follow these procedures.
Preparing PDF Files for Searching
Make sure each PDF file has correct document properties, especially the title. An incorrect title makes it difficult for a person viewing search results to tell if this file is useful to them.
Check the PDF file format version number and make sure your search engine can read that version. Acrobat 5 uses the PDF 1.4 format.
If possible, break long PDF files into smaller single-subject files, such as book sections, chapters or even chapter sections. That way, no one will accidentally download a very long document just because a word has been matched.
PDF and Metadata
Metadata is defined as "information about information". For simple search engines, that generally constitutes the document title, description, keywords, file size and modification date, but it can be much richer than that, providing many more ways to describe an object, and to search for that object. For more information, see the SearchTools Report on Metadata
When search tools index PDF files, they can get the text from the PDF information fields, such as a document title and additional keywords. If the document creator didn't enter that information, the indexer may attempt to generate a title, or may just use the file name of the document.
Adobe XMP
With Acrobat 5.0 and new releases of other products, Adobe is supporting a new eXtensible Metadata Platform (XMP, previously called XAP). This allows the files to contain substantially more information about themselves, including Dublin Core data such as author, description, actual modification date and so on. This has not been widely used and we know of no search engines that take advantage of this metadata.