mikebyrne Posted July 8, 2009 Share Posted July 8, 2009 Are there any code examples to send the text of a pdf docment into a text file? The pdf is text only Quote Link to comment https://forums.phpfreaks.com/topic/165170-solved-code-to-put-contents-of-pdf-into-text-file/ Share on other sites More sharing options...
ignace Posted July 8, 2009 Share Posted July 8, 2009 http://be2.php.net/manual/en/book.pdf.php Are there any code examples to send the text of a pdf docment into a text file? Google knows all your answers (and secrets..) Quote Link to comment https://forums.phpfreaks.com/topic/165170-solved-code-to-put-contents-of-pdf-into-text-file/#findComment-870913 Share on other sites More sharing options...
mikebyrne Posted July 8, 2009 Author Share Posted July 8, 2009 Probably a stupid question but I found the below code to play with but how to I point it to my pdf file? <?php // Function : pdf2txt() // Arguments : $filename - Filename of the PDF you want to extract // Description : Reads a pdf file, extracts data streams, and manages // their translation to plain text - returning the plain // text at the end // Authors : Jonathan Beckett, 2005-05-02 // : Sven Schuberth, 2007-03-29 function pdf2txt($filename){ $data = getFileData($filename); $s=strpos($data,"%")+1; $version=substr($data,$s,strpos($data,"%",$s)-1); if(substr_count($version,"PDF-1.2")==0) return handleV3($data); else return handleV2($data); } // handles the verson 1.2 function handleV2($data){ // grab objects and then grab their contents (chunks) $a_obj = getDataArray($data,"obj","endobj"); foreach($a_obj as $obj){ $a_filter = getDataArray($obj,"<<",">>"); if (is_array($a_filter)){ $j++; $a_chunks[$j]["filter"] = $a_filter[0]; $a_data = getDataArray($obj,"stream\r\n","endstream"); if (is_array($a_data)){ $a_chunks[$j]["data"] = substr($a_data[0], strlen("stream\r\n"), strlen($a_data[0])-strlen("stream\r\n")-strlen("endstream")); } } } // decode the chunks foreach($a_chunks as $chunk){ // look at each chunk and decide how to decode it - by looking at the contents of the filter $a_filter = split("/",$chunk["filter"]); if ($chunk["data"]!=""){ // look at the filter to find out which encoding has been used if (substr($chunk["filter"],"FlateDecode")!==false){ $data =@ gzuncompress($chunk["data"]); if (trim($data)!=""){ $result_data .= ps2txt($data); } else { //$result_data .= "x"; } } } } return $result_data; } //handles versions >1.2 function handleV3($data){ // grab objects and then grab their contents (chunks) $a_obj = getDataArray($data,"obj","endobj"); $result_data=""; foreach($a_obj as $obj){ //check if it a string if(substr_count($obj,"/GS1")>0){ //the strings are between ( and ) preg_match_all("|\((.*?)\)|",$obj,$field,PREG_SET_ORDER); if(is_array($field)) foreach($field as $data) $result_data.=$data[1]; } } return $result_data; } function ps2txt($ps_data){ $result = ""; $a_data = getDataArray($ps_data,"[","]"); if (is_array($a_data)){ foreach ($a_data as $ps_text){ $a_text = getDataArray($ps_text,"(",")"); if (is_array($a_text)){ foreach ($a_text as $text){ $result .= substr($text,1,strlen($text)-2); } } } } else { // the data may just be in raw format (outside of [] tags) $a_text = getDataArray($ps_data,"(",")"); if (is_array($a_text)){ foreach ($a_text as $text){ $result .= substr($text,1,strlen($text)-2); } } } return $result; } function getFileData($filename){ $handle = fopen($filename,"rb"); $data = fread($handle, filesize($filename)); fclose($handle); return $data; } function getDataArray($data,$start_word,$end_word){ $start = 0; $end = 0; unset($a_result); while ($start!==false && $end!==false){ $start = strpos($data,$start_word,$end); if ($start!==false){ $end = strpos($data,$end_word,$start); if ($end!==false){ // data is between start and end $a_result[] = substr($data,$start,$end-$start+strlen($end_word)); } } } return $a_result; } ?> Quote Link to comment https://forums.phpfreaks.com/topic/165170-solved-code-to-put-contents-of-pdf-into-text-file/#findComment-870951 Share on other sites More sharing options...
shergold Posted July 8, 2009 Share Posted July 8, 2009 it actually tells you in the code..? // Arguments : $filename - Filename of the PDF you want to extract for e.g. $filename = "/folder/ebook.pdf"; Shergold. Quote Link to comment https://forums.phpfreaks.com/topic/165170-solved-code-to-put-contents-of-pdf-into-text-file/#findComment-870985 Share on other sites More sharing options...
mikebyrne Posted July 8, 2009 Author Share Posted July 8, 2009 Sorry I should have rephrased my question. Do i replace all $filename or can I code $filename="C:\file" Quote Link to comment https://forums.phpfreaks.com/topic/165170-solved-code-to-put-contents-of-pdf-into-text-file/#findComment-870987 Share on other sites More sharing options...
mikebyrne Posted July 8, 2009 Author Share Posted July 8, 2009 Do I replace $filename with the file location or is there an esier way?? Quote Link to comment https://forums.phpfreaks.com/topic/165170-solved-code-to-put-contents-of-pdf-into-text-file/#findComment-871148 Share on other sites More sharing options...
ignace Posted July 8, 2009 Share Posted July 8, 2009 You just call it and pass the location of the file as an argument: $pdftext = pdf2txt('/path/to/file.pdf'); Quote Link to comment https://forums.phpfreaks.com/topic/165170-solved-code-to-put-contents-of-pdf-into-text-file/#findComment-871160 Share on other sites More sharing options...
mikebyrne Posted July 8, 2009 Author Share Posted July 8, 2009 Where does the output of the file go? I just get a blank screen returned with no errors Quote Link to comment https://forums.phpfreaks.com/topic/165170-solved-code-to-put-contents-of-pdf-into-text-file/#findComment-871205 Share on other sites More sharing options...
ignace Posted July 8, 2009 Share Posted July 8, 2009 Where does the output of the file go? I just get a blank screen returned with no errors Anywhere you like file_put_contents('/path/to/file.txt', pdf2txt('/path/to/file.pdf')); I strongly advice reading the php manual Quote Link to comment https://forums.phpfreaks.com/topic/165170-solved-code-to-put-contents-of-pdf-into-text-file/#findComment-871218 Share on other sites More sharing options...
mikebyrne Posted July 8, 2009 Author Share Posted July 8, 2009 Do i place the "File_put_contents" at the end of each function? Quote Link to comment https://forums.phpfreaks.com/topic/165170-solved-code-to-put-contents-of-pdf-into-text-file/#findComment-871249 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.