rfeio Posted April 2, 2009 Share Posted April 2, 2009 Hi, My site has several PDF documents that users can download. However, I would like to be able to index the content of those pdf files, so that the users could do a search for a given argument, and the site would return which pdf files would be relevant. I was thinking that maybe the best way of doing this would be by parsing the content of the pdf files and save it on a MySQL table. When the user would do the search, the script would look in the table and return the pdf file names relevant for the search. I would need some guidance on how I could parse a PDF file since I've never done this before. Also, would this be the best way of achieving what I want? Thanks! Rfeio Link to comment https://forums.phpfreaks.com/topic/152257-pdf-parser/ Share on other sites More sharing options...
dgoosens Posted April 2, 2009 Share Posted April 2, 2009 hi Rfeio, I have not been able to test it really, but what I read from it is rather promising. Have a look at the html2pdf library http://tufat.com/s_html2ps_html2pdf.htm I am not quite sure it is possible to parse in the PDF... I guess it all depends on the quality of the PDF... Link to comment https://forums.phpfreaks.com/topic/152257-pdf-parser/#findComment-799563 Share on other sites More sharing options...
bluejay002 Posted April 2, 2009 Share Posted April 2, 2009 I don't know if this is the best but you can add tags for every PDF files. You can search then to this tags and possibly, be even easier to seen in the web searches. Link to comment https://forums.phpfreaks.com/topic/152257-pdf-parser/#findComment-799588 Share on other sites More sharing options...
rfeio Posted April 2, 2009 Author Share Posted April 2, 2009 Thanks guys! dgoosens, for the look of the site you've mentioned it looks more like converting the html into PDF. What I would need would be the opposite I think. bluejay002, are you referring to HTML tags? That wouldn't suit me since I would like to be able to search the content of the PDF files. Link to comment https://forums.phpfreaks.com/topic/152257-pdf-parser/#findComment-799623 Share on other sites More sharing options...
dgoosens Posted April 3, 2009 Share Posted April 3, 2009 dgoosens, for the look of the site you've mentioned it looks more like converting the html into PDF. What I would need would be the opposite I think. hi Rfeio, my mistake... I thought one could edit the PDFs as well with html2pdf... but I can't find any info about it... you might want to have a look at FPDF then... http://fpdf.org/ Link to comment https://forums.phpfreaks.com/topic/152257-pdf-parser/#findComment-800115 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.