Jump to content

Search .pdf/.doc content


crochk

Recommended Posts

I have a site where users upload files (mainly PDF and DOC) through an HTML form that is then processed using PHP.  I need for them to be able to search the content of the files through a basic HTML/PHP search form.

 

As I see it, either I need a way for the php to search within the files saved on my server, or for the PHP processing the submit form to insert into the MySQL database the complete contents of the file.

 

Is there a way to do any of those, or another way to search the files?

 

Thank you.

Link to comment
https://forums.phpfreaks.com/topic/122435-search-pdfdoc-content/
Share on other sites

Best way to do this is to dump contents into a database and search through that. As far as actually grabbing the content within, I'm not entirely sure if this is easy.

 

http://answers.yahoo.com/question/index?qid=20080823105354AARm3fc

 

is a start

discomatt is correct - this will not be easy. You will want to search for 3rd party tools to read the data in the documents. PDF and Word Docs include a lot of "unseen" code that describes haw the content is to be dislayed - sort of like HTML. But, it's more complex in that the "codes" and format changes. A PDF created in an older format may look very different under the hood than a new version. And with Word, the document format has completely changed with Office 2007.

 

Trying to create your own utility to read the content from these documents would be a huge undertaking. There should be some utiliies available - however probably at a cost.

 

http://www.snowbound.com/solutions/text_extraction.html

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.