Results 1 to 8 of 8

Thread: Are pdf files web searchable

  1. #1
    Join Date
    Jun 2006
    Location
    Missoula, Montana
    Posts
    10

    Are pdf files web searchable

    Does anyone know what the current status is on whether the search engine spidering includes the text in pdf files?

    Thanks!

  2. #2
    Join Date
    Jan 2005
    Posts
    2,087
    Not so far as I'm aware, no.

  3. #3
    Join Date
    Jan 2005
    Location
    Idaho USA
    Posts
    1,498
    Anything in a PDF can be a good place to provide links, but I think JP is correct. The SE don't crawl them.
    The Old Sarge

  4. #4
    Join Date
    Jun 2006
    Location
    Missoula, Montana
    Posts
    10
    Thanks for the replies! I know that there is software available for an organization to search pdf documents on a network. I wonder how long it will take this technology to get to search engines?

  5. #5
    Join Date
    Jun 2006
    Location
    Missoula, Montana
    Posts
    10

    How do they get information for their listings?

    My boss asked an interesting question: If the search engines can't search pdf's, why do they list them and how do they get the description for the listing?

  6. #6
    Join Date
    Jan 2005
    Location
    Idaho USA
    Posts
    1,498
    Nora,

    After you posed the original question, I got to thinking the same thing ...

    Here's something I found:

    PDF and Web Site Searching
    As mentioned above, PDF files are hard on search engines, and HTML pages are much easier for them to deal with. However, if you must have PDF, please follow these procedures.

    Preparing PDF Files for Searching
    Make sure each PDF file has correct document properties, especially the title. An incorrect title makes it difficult for a person viewing search results to tell if this file is useful to them.


    Check the PDF file format version number and make sure your search engine can read that version. Acrobat 5 uses the PDF 1.4 format.


    If possible, break long PDF files into smaller single-subject files, such as book sections, chapters or even chapter sections. That way, no one will accidentally download a very long document just because a word has been matched.
    PDF and Metadata
    Metadata is defined as "information about information". For simple search engines, that generally constitutes the document title, description, keywords, file size and modification date, but it can be much richer than that, providing many more ways to describe an object, and to search for that object. For more information, see the SearchTools Report on Metadata

    When search tools index PDF files, they can get the text from the PDF information fields, such as a document title and additional keywords. If the document creator didn't enter that information, the indexer may attempt to generate a title, or may just use the file name of the document.

    Adobe XMP
    With Acrobat 5.0 and new releases of other products, Adobe is supporting a new eXtensible Metadata Platform (XMP, previously called XAP). This allows the files to contain substantially more information about themselves, including Dublin Core data such as author, description, actual modification date and so on. This has not been widely used and we know of no search engines that take advantage of this metadata.
    You can read the entire piece at http://www.searchtools.com/info/pdf.html

    Glad you brought it up again. Very intersting reading.
    The Old Sarge

  7. #7
    Join Date
    Aug 2006
    Posts
    61
    If you want to make a PDF searchable follow the above guidelines and break each section down to a file size no larger than 50k - 85k as that is the largest file size a search engine spider will index in one single visit. Any larger than that and the spider will make two or more trips to the file for indexing which will mean a delay.
    Increase Sales with the First Smart Shopping Cart FREE Trial!

  8. #8
    Join Date
    Sep 2006
    Posts
    34
    PDF files are search able by SE and I would agree with auto that you should break them down for faster indexing.
    Increase office productivity discover hidded Excel Secrets!

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •