PDF Parser

rfeio · April 2, 2009

Hi,

My site has several PDF documents that users can download. However, I would like to be able to index the content of those pdf files, so that the users could do a search for a given argument, and the site would return which pdf files would be relevant.

I was thinking that maybe the best way of doing this would be by parsing the content of the pdf files and save it on a MySQL table. When the user would do the search, the script would look in the table and return the pdf file names relevant for the search.

I would need some guidance on how I could parse a PDF file since I've never done this before. Also, would this be the best way of achieving what I want?

Thanks!

Rfeio

dgoosens · April 2, 2009

hi Rfeio,

I have not been able to test it really, but what I read from it is rather promising.

Have a look at the html2pdf library

http://tufat.com/s_html2ps_html2pdf.htm

I am not quite sure it is possible to parse in the PDF...

I guess it all depends on the quality of the PDF...

bluejay002 · April 2, 2009

I don't know if this is the best but you can add tags for every PDF files. You can search then to this tags and possibly, be even easier to seen in the web searches.

rfeio · April 2, 2009

Thanks guys!

dgoosens, for the look of the site you've mentioned it looks more like converting the html into PDF. What I would need would be the opposite I think.

bluejay002, are you referring to HTML tags? That wouldn't suit me since I would like to be able to search the content of the PDF files.

dgoosens · April 3, 2009

dgoosens, for the look of the site you've mentioned it looks more like converting the html into PDF. What I would need would be the opposite I think.

hi Rfeio,

my mistake...

I thought one could edit the PDFs as well with html2pdf...

but I can't find any info about it...

you might want to have a look at FPDF then...

http://fpdf.org/

Sign In

PDF Parser

Recommended Posts

rfeio

Link to comment

Share on other sites

dgoosens

Link to comment

Share on other sites

bluejay002

Link to comment

Share on other sites

rfeio

Link to comment

Share on other sites

dgoosens

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information