Icebergness Posted January 20, 2012 Share Posted January 20, 2012 Hi, I currently have a 'Research' page on my intranet site. It works by displaying a list of files based on the date selected. The files are stored in a folder hierarchy on the same server, for example: 2012 <-Year --01 <-Month ----01 <-Day ----02 <-Day The folders contain a multitude of files that are dropped in by other members of staff, and 99% of the time consist of .msg, .doc and .pdf files. What I want to do is create a textbox which will allow the user to search through the files (as in the file contents, not just the file name). So far, the best thing I have found for this is the following code: <?php /** * powered by @cafewebmaster.com * free for private use * please support us with donations */ define("SLASH", stristr($_SERVER[sERVER_SOFTWARE], "win") ? "\\" : "/"); $path = ($_POST[path]) ? $_POST[path] : dirname(__FILE__) ; $q = $_POST[q]; function php_grep($q, $path){ $fp = opendir($path); while($f = readdir($fp)){ if( preg_match("#^\.+$#", $f) ) continue; // ignore symbolic links $file_full_path = $path.SLASH.$f; if(is_dir($file_full_path)) { $ret .= php_grep($q, $file_full_path); } else if( stristr(file_get_contents($file_full_path), $q) ) { $ret .= "$file_full_path\n"; } } return $ret; } if($q){ $results = php_grep($q, $path); } echo <<<HRD <pre > <form method=post> <input name=path size=100 value="$path" /> Path <input name=q size=100 value="$q" /> Query <input type=submit> </form> $results </pre > HRD; ?> This obviously uses GREP, which works well, albeit slow. However, it doesn't search through PDF's. I have contemplated several solutions, including finding a way to convert all the files in to text files or in to a mysql database, but I haven't found anything useful. I think the answer is going to be no, but I'm asking whether anybody knows, or has successfully implemented a similar system? Big big thanks in advance if you can help in any way! Dave Link to comment https://forums.phpfreaks.com/topic/255419-searching-through-multiple-file-types/ Share on other sites More sharing options...
thehippy Posted January 21, 2012 Share Posted January 21, 2012 The PDF file format is not a plain text markup, so opening the file raw and searching it isn't going to yield you reliable results. You'll need to interpret the file with something that understands the format. I needed to do something similar and found how sphider made use of xpdf and catdoc for pdf's and doc's respectively. xpdf has a couple utility programs, pdfinfo and pdftotext which you use to extract the metadata and text which you can in turn search. Link to comment https://forums.phpfreaks.com/topic/255419-searching-through-multiple-file-types/#findComment-1309782 Share on other sites More sharing options...
Icebergness Posted February 1, 2012 Author Share Posted February 1, 2012 Quote The PDF file format is not a plain text markup, so opening the file raw and searching it isn't going to yield you reliable results. You'll need to interpret the file with something that understands the format. I needed to do something similar and found how sphider made use of xpdf and catdoc for pdf's and doc's respectively. xpdf has a couple utility programs, pdfinfo and pdftotext which you use to extract the metadata and text which you can in turn search. Sorry, I forgot to check this post as I've been working on other projects. I had previously looked at xpdf, but hadn't gotten the results I was looking for. I'll give it a try with Sphider and let you know how I get on. Thanks for the suggestion Dave Link to comment https://forums.phpfreaks.com/topic/255419-searching-through-multiple-file-types/#findComment-1313271 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.