Jump to content

Search .pdf/.doc content


crochk

Recommended Posts

I have a site where users upload files (mainly PDF and DOC) through an HTML form that is then processed using PHP.  I need for them to be able to search the content of the files through a basic HTML/PHP search form.

 

As I see it, either I need a way for the php to search within the files saved on my server, or for the PHP processing the submit form to insert into the MySQL database the complete contents of the file.

 

Is there a way to do any of those, or another way to search the files?

 

Thank you.

Link to comment
Share on other sites

discomatt is correct - this will not be easy. You will want to search for 3rd party tools to read the data in the documents. PDF and Word Docs include a lot of "unseen" code that describes haw the content is to be dislayed - sort of like HTML. But, it's more complex in that the "codes" and format changes. A PDF created in an older format may look very different under the hood than a new version. And with Word, the document format has completely changed with Office 2007.

 

Trying to create your own utility to read the content from these documents would be a huge undertaking. There should be some utiliies available - however probably at a cost.

 

http://www.snowbound.com/solutions/text_extraction.html

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.