-
Posts
780 -
Joined
-
Last visited
Posts posted by mikebyrne
-
-
So I should put the
file_put_contents('C:\Users\Mike\Desktop\txtfile.txt', pdf2txt('C:\Users\Mike\Desktop\Athy Database\Athy Register.pdf')); [\code] At the end of the handleV3 and handleV2 functions?
-
There's an If and an else statement in it
-
I'm using a piece of code to pass text from a pdf to a textfile but when it runs the text file isnt created and I'm not getting any error reports??
Any ideas why?
<?php // Function : pdf2txt() // Arguments : $filename - Filename of the PDF you want to extract // Description : Reads a pdf file, extracts data streams, and manages // their translation to plain text - returning the plain // text at the end // Authors : Jonathan Beckett, 2005-05-02 // : Sven Schuberth, 2007-03-29 function pdf2txt($filename){ file_put_contents('C:\Users\Mike\Desktop\txtfile.txt', pdf2txt('C:\Users\Mike\Desktop\Athy Database\Athy Register.pdf')); $pdftext = pdf2txt('C:\Users\Mike\Desktop\Athy Database\Athy Register.pdf'); $data = getFileData($filename); $s=strpos($data,"%")+1; $version=substr($data,$s,strpos($data,"%",$s)-1); if(substr_count($version,"PDF-1.2")==0) return handleV3($data); else return handleV2($data); } // handles the verson 1.2 function handleV2($data){ // grab objects and then grab their contents (chunks) $a_obj = getDataArray($data,"obj","endobj"); foreach($a_obj as $obj){ $a_filter = getDataArray($obj,"<<",">>"); if (is_array($a_filter)){ $j++; $a_chunks[$j]["filter"] = $a_filter[0]; $a_data = getDataArray($obj,"stream\r\n","endstream"); if (is_array($a_data)){ $a_chunks[$j]["data"] = substr($a_data[0], strlen("stream\r\n"), strlen($a_data[0])-strlen("stream\r\n")-strlen("endstream")); } } } // decode the chunks foreach($a_chunks as $chunk){ // look at each chunk and decide how to decode it - by looking at the contents of the filter $a_filter = split("/",$chunk["filter"]); if ($chunk["data"]!=""){ // look at the filter to find out which encoding has been used if (substr($chunk["filter"],"FlateDecode")!==false){ $data =@ gzuncompress($chunk["data"]); if (trim($data)!=""){ $result_data .= ps2txt($data); } else { //$result_data .= "x"; } } } } return $result_data; } //handles versions >1.2 function handleV3($data){ // grab objects and then grab their contents (chunks) $a_obj = getDataArray($data,"obj","endobj"); $result_data=""; foreach($a_obj as $obj){ //check if it a string if(substr_count($obj,"/GS1")>0){ //the strings are between ( and ) preg_match_all("|\((.*?)\)|",$obj,$field,PREG_SET_ORDER); if(is_array($field)) foreach($field as $data) $result_data.=$data[1]; } } return $result_data; file_put_contents('C:\Users\Mike\Desktop\txtfile.txt', pdf2txt('C:\Users\Mike\Desktop\Athy Database\Athy Register.pdf')); } function ps2txt($ps_data){ $result = ""; $a_data = getDataArray($ps_data,"[","]"); if (is_array($a_data)){ foreach ($a_data as $ps_text){ $a_text = getDataArray($ps_text,"(",")"); if (is_array($a_text)){ foreach ($a_text as $text){ $result .= substr($text,1,strlen($text)-2); } } } } else { // the data may just be in raw format (outside of [] tags) $a_text = getDataArray($ps_data,"(",")"); if (is_array($a_text)){ foreach ($a_text as $text){ $result .= substr($text,1,strlen($text)-2); } } } return $result; file_put_contents('C:\Users\Mike\Desktop\txtfile.txt', pdf2txt('C:\Users\Mike\Desktop\Athy Database\Athy Register.pdf')); } function getFileData($filename){ $handle = fopen($filename,"rb"); $data = fread($handle, filesize($filename)); fclose($handle); return $data; } function getDataArray($data,$start_word,$end_word){ $start = 0; $end = 0; unset($a_result); while ($start!==false && $end!==false){ $start = strpos($data,$start_word,$end); if ($start!==false){ $end = strpos($data,$end_word,$start); if ($end!==false){ // data is between start and end $a_result[] = substr($data,$start,$end-$start+strlen($end_word)); } } } return $a_result; file_put_contents('C:\Users\Mike\Desktop\txtfile.txt', pdf2txt('C:\Users\Mike\Desktop\Athy Database\Athy Register.pdf')); } ?>
-
So I just remove
$pdftext = pdf2txt('C:\Users\Mike\Desktop\Athy Database\Athy Register.pdf');
and replace it with
file_put_contents('/path/to/txtfile.txt', pdf2txt('/path/to/pdfile.pdf'));
??
-
Im using a piece of code I found to read the contents of a pdf file and put the output into a text file but I cant get the contents to pass??
My code is:
<?php // Function : pdf2txt() // Arguments : $filename - Filename of the PDF you want to extract // Description : Reads a pdf file, extracts data streams, and manages // their translation to plain text - returning the plain // text at the end // Authors : Jonathan Beckett, 2005-05-02 // : Sven Schuberth, 2007-03-29 function pdf2txt($filename){ $pdftext = pdf2txt('C:\Users\Mike\Desktop\Athy Database\Athy Register.pdf'); $data = getFileData($filename); $s=strpos($data,"%")+1; $version=substr($data,$s,strpos($data,"%",$s)-1); if(substr_count($version,"PDF-1.2")==0) return handleV3($data); else return handleV2($data); } // handles the verson 1.2 function handleV2($data){ // grab objects and then grab their contents (chunks) $a_obj = getDataArray($data,"obj","endobj"); foreach($a_obj as $obj){ $a_filter = getDataArray($obj,"<<",">>"); if (is_array($a_filter)){ $j++; $a_chunks[$j]["filter"] = $a_filter[0]; $a_data = getDataArray($obj,"stream\r\n","endstream"); if (is_array($a_data)){ $a_chunks[$j]["data"] = substr($a_data[0], strlen("stream\r\n"), strlen($a_data[0])-strlen("stream\r\n")-strlen("endstream")); } } } // decode the chunks foreach($a_chunks as $chunk){ // look at each chunk and decide how to decode it - by looking at the contents of the filter $a_filter = split("/",$chunk["filter"]); if ($chunk["data"]!=""){ // look at the filter to find out which encoding has been used if (substr($chunk["filter"],"FlateDecode")!==false){ $data =@ gzuncompress($chunk["data"]); if (trim($data)!=""){ $result_data .= ps2txt($data); } else { //$result_data .= "x"; } } } } return $result_data; } //handles versions >1.2 function handleV3($data){ // grab objects and then grab their contents (chunks) $a_obj = getDataArray($data,"obj","endobj"); $result_data=""; foreach($a_obj as $obj){ //check if it a string if(substr_count($obj,"/GS1")>0){ //the strings are between ( and ) preg_match_all("|\((.*?)\)|",$obj,$field,PREG_SET_ORDER); if(is_array($field)) foreach($field as $data) $result_data.=$data[1]; } } return $result_data; file_put_contents('C:\Users\Mike\Desktop\file.txt'); } function ps2txt($ps_data){ $result = ""; $a_data = getDataArray($ps_data,"[","]"); if (is_array($a_data)){ foreach ($a_data as $ps_text){ $a_text = getDataArray($ps_text,"(",")"); if (is_array($a_text)){ foreach ($a_text as $text){ $result .= substr($text,1,strlen($text)-2); } } } } else { // the data may just be in raw format (outside of [] tags) $a_text = getDataArray($ps_data,"(",")"); if (is_array($a_text)){ foreach ($a_text as $text){ $result .= substr($text,1,strlen($text)-2); } } } return $result; file_put_contents('C:\Users\Mike\Desktop\file.txt'); } function getFileData($filename){ $handle = fopen($filename,"rb"); $data = fread($handle, filesize($filename)); fclose($handle); return $data; } function getDataArray($data,$start_word,$end_word){ $start = 0; $end = 0; unset($a_result); while ($start!==false && $end!==false){ $start = strpos($data,$start_word,$end); if ($start!==false){ $end = strpos($data,$end_word,$start); if ($end!==false){ // data is between start and end $a_result[] = substr($data,$start,$end-$start+strlen($end_word)); } } } return $a_result; file_put_contents('C:\Users\Mike\Desktop\file.txt'); } ?>
-
Do i place the "File_put_contents" at the end of each function?
-
Where does the output of the file go? I just get a blank screen returned with no errors
-
Do I replace $filename with the file location or is there an esier way??
-
Sorry I should have rephrased my question. Do i replace all $filename or can I code $filename="C:\file"
-
Probably a stupid question but I found the below code to play with but how to I point it to my pdf file?
<?php // Function : pdf2txt() // Arguments : $filename - Filename of the PDF you want to extract // Description : Reads a pdf file, extracts data streams, and manages // their translation to plain text - returning the plain // text at the end // Authors : Jonathan Beckett, 2005-05-02 // : Sven Schuberth, 2007-03-29 function pdf2txt($filename){ $data = getFileData($filename); $s=strpos($data,"%")+1; $version=substr($data,$s,strpos($data,"%",$s)-1); if(substr_count($version,"PDF-1.2")==0) return handleV3($data); else return handleV2($data); } // handles the verson 1.2 function handleV2($data){ // grab objects and then grab their contents (chunks) $a_obj = getDataArray($data,"obj","endobj"); foreach($a_obj as $obj){ $a_filter = getDataArray($obj,"<<",">>"); if (is_array($a_filter)){ $j++; $a_chunks[$j]["filter"] = $a_filter[0]; $a_data = getDataArray($obj,"stream\r\n","endstream"); if (is_array($a_data)){ $a_chunks[$j]["data"] = substr($a_data[0], strlen("stream\r\n"), strlen($a_data[0])-strlen("stream\r\n")-strlen("endstream")); } } } // decode the chunks foreach($a_chunks as $chunk){ // look at each chunk and decide how to decode it - by looking at the contents of the filter $a_filter = split("/",$chunk["filter"]); if ($chunk["data"]!=""){ // look at the filter to find out which encoding has been used if (substr($chunk["filter"],"FlateDecode")!==false){ $data =@ gzuncompress($chunk["data"]); if (trim($data)!=""){ $result_data .= ps2txt($data); } else { //$result_data .= "x"; } } } } return $result_data; } //handles versions >1.2 function handleV3($data){ // grab objects and then grab their contents (chunks) $a_obj = getDataArray($data,"obj","endobj"); $result_data=""; foreach($a_obj as $obj){ //check if it a string if(substr_count($obj,"/GS1")>0){ //the strings are between ( and ) preg_match_all("|\((.*?)\)|",$obj,$field,PREG_SET_ORDER); if(is_array($field)) foreach($field as $data) $result_data.=$data[1]; } } return $result_data; } function ps2txt($ps_data){ $result = ""; $a_data = getDataArray($ps_data,"[","]"); if (is_array($a_data)){ foreach ($a_data as $ps_text){ $a_text = getDataArray($ps_text,"(",")"); if (is_array($a_text)){ foreach ($a_text as $text){ $result .= substr($text,1,strlen($text)-2); } } } } else { // the data may just be in raw format (outside of [] tags) $a_text = getDataArray($ps_data,"(",")"); if (is_array($a_text)){ foreach ($a_text as $text){ $result .= substr($text,1,strlen($text)-2); } } } return $result; } function getFileData($filename){ $handle = fopen($filename,"rb"); $data = fread($handle, filesize($filename)); fclose($handle); return $data; } function getDataArray($data,$start_word,$end_word){ $start = 0; $end = 0; unset($a_result); while ($start!==false && $end!==false){ $start = strpos($data,$start_word,$end); if ($start!==false){ $end = strpos($data,$end_word,$start); if ($end!==false){ // data is between start and end $a_result[] = substr($data,$start,$end-$start+strlen($end_word)); } } } return $a_result; } ?>
-
Are there any code examples to send the text of a pdf docment into a text file? The pdf is text only
-
Each page is in a fairly standard format for example
Polling Station: Athy Boys Nat. School -
Room 01
Ardreigh (ED Athy Rural) Athy
1 Callan, Cthy Wxxxide
2 Callan, Lam Waysxxxe
3 Callan, Marget Wayde
4 Callen, Cathne Bray Road
5 Callen, Tithy Bray Road
6 Carbery, Emma Farill
7 Carbery, Jeriah Farmhill
8 Carbery, Mry Farml
9 Carbery, Sarh Fahill
10 Cuy, Brian Bchlawn
11 Cully-Wall, Brda Beechlawn
There's 2 columns on each page
-
Im just wondering if anyone has any sample code to convert a pdf file to excel so I can edit it to suit my project?
Any help on this would be great!!
-
Would it be possible to put the contents of the pdf into an array?
-
I've found a free converter online but was wondering if there was a php version that I could play around with that I could tweek for my specfic pdf doc
-
Hi all,
Im just wondering if anyone has php code to convert a pdf document to the comma seperated file??
Could someone point me in the right direction
-
Im getting an error Parse error: parse error in C:\xampp\htdocs\remove.php on line 3
with
$currentline=preg_replace('/(,\n)$/''', $test)
-
Something like this?
<?php $test = file_get_contents("C:\Users\Mike\Desktop\AthyDB.txt"); $currentline=preg_replace('/,$/',\n'', $test); file_put_contents('C:\Users\Mike\Desktop\test.txt', $currentline); ?>
-
How would I apply
$currentline=preg_replace('/,$/','', $d);
when taking in a file?
-
Something like
<?php $test = file_get_contents("C:\Users\Mike\Desktop\AthyDB.txt"); $field=str_replace('Co.Kildare,','Co.Kildare'); file_put_contents('C:\Users\Mike\Desktop\test.txt', $test) ?>
-
I have a comma separated file with all rows ending with the word Co.Kildare but on a large number of theses rows the word appears with a comma at the end ie Co.Kildare,
How could I code the replacement of Co.Kildare, with Co.Kildare?
<?php $test = file_get_contents("C:\Users\Mike\Desktop\AthyDB.txt"); Replacement code?? file_put_contents('C:\Users\Mike\Desktop\test.txt', $test); ?>
-
That seems to have fixed it
Thanks for the help!
-
<?php $rewrote = ""; $handle = fopen("C:\Users\Mike\Desktop\AthyDB.txt", "r+"); // Open file to read it read. if ($handle) { while (!feof($handle)) // Loop til end of file. { $currentline = fgets($handle, 4096); // Read a line. $currentline=preg_replace('/^D$/','Dail European Parliament and Local Elections only', $currentline); $currentline=preg_replace('/^S$/','Post or special arrangement only', $currentline); $currentline=preg_replace('/^L$/','Local Elections only', $currentline); $currentline=preg_replace('/^E$/','European Parliament and Local Elections only', $currentline); $rewrote .= $currentline; $rewrote .= "\n"; } file_put_contents('C:\Users\Mike\Desktop\test.txt', $rewrote); fclose($handle); } ?>
That seems to fix the repeating problem but the code seems to stop replacing the letters after 2,000+
Very strange. Any idea why it stops replacing?
-
I have a file that looks something like the following:
D,2796,Son,Oler,13 Dun Bnn,Bch Road,Ahy,Co.Kire
S,2797,Gerty,Laurce,15 Dun Brn,Bleach ad,Ahy,Co.Kilde
L,2801,Mazse,Saras,17 Dn Brn,Blch Rod,Aty,Co.Kilre
E,2808,Esjo,Leel,21 Dun Bnn,Blach Road,Ay,Co.Kilre
What I am trying to do is replace the single character letters "D","S","L" & "E" with the following sentences
Dail European Parliament and Local Elections only (instead of D)
European Parliament and Local Elections only (instead of E)
Local Elections only (instead of L)
Post or special arrangement only (instead of S)
with lots of help we compiled the following
<?php $rewrote = ""; $handle = fopen("C:\Users\Mike\Desktop\AthyDB.txt", "r+"); // Open file to read it read. if ($handle) { while (!feof($handle)) // Loop til end of file. { $currentline = fgets($handle, 4096); // Read a line. $currentline=preg_replace('/^D/','Dail European Parliament and Local Elections only', $currentline); $currentline=preg_replace('/^S/','Post or special arrangement only', $currentline); $currentline=preg_replace('/^L/','Local Elections only', $currentline); $currentline=preg_replace('/^E/','European Parliament and Local Elections only', $currentline); $rewrote .= $currentline; $rewrote .= "\n"; } file_put_contents('C:\Users\Mike\Desktop\test.txt', $rewrote); fclose($handle); } ?>
It seems that the replacing of "D" produces "Dail European Parliament and Local Elections onlyail European Parliament and Local Elections only"
Any idea what's gone wrong?
It also seems to happen for "L" a few times but not everytime?? (Strange)[/code]
Pdf data not passing to textfile??
in PHP Coding Help
Posted
I've tried the following code but still not getting any results