mikebyrne

July 9, 2009

I've tried the following code but still not getting any results

<?php
// Function    : pdf2txt()
// Arguments   : $filename - Filename of the PDF you want to extract
// Description : Reads a pdf file, extracts data streams, and manages
//               their translation to plain text - returning the plain
//               text at the end
// Authors      : Jonathan Beckett, 2005-05-02
//                            : Sven Schuberth, 2007-03-29

function pdf2txt($filename){    

    $pdftext = pdf2txt('C:\Users\Mike\Desktop\Athy Database\Athy Register.pdf');

    $data = getFileData($filename);
   
    $s=strpos($data,"%")+1;
   
    $version=substr($data,$s,strpos($data,"%",$s)-1);
    if(substr_count($version,"PDF-1.2")==0)
        return handleV3($data);
    else
        return handleV2($data);

   
}
// handles the verson 1.2
function handleV2($data){
       
    // grab objects and then grab their contents (chunks)
    $a_obj = getDataArray($data,"obj","endobj");
   
    foreach($a_obj as $obj){
       
        $a_filter = getDataArray($obj,"<<",">>");
   
        if (is_array($a_filter)){
            $j++;
            $a_chunks[$j]["filter"] = $a_filter[0];

            $a_data = getDataArray($obj,"stream\r\n","endstream");
            if (is_array($a_data)){
                $a_chunks[$j]["data"] = substr($a_data[0],
strlen("stream\r\n"),
strlen($a_data[0])-strlen("stream\r\n")-strlen("endstream"));
            }
        }
    }

    // decode the chunks
    foreach($a_chunks as $chunk){

        // look at each chunk and decide how to decode it - by looking at the contents of the filter
        $a_filter = split("/",$chunk["filter"]);
       
        if ($chunk["data"]!=""){
            // look at the filter to find out which encoding has been used           
            if (substr($chunk["filter"],"FlateDecode")!==false){
                $data =@ gzuncompress($chunk["data"]);
                if (trim($data)!=""){
                    $result_data .= ps2txt($data);
                } else {
               
                    //$result_data .= "x";
                }
            }
        }
    }
   
    return $result_data;
    file_put_contents('C:\Users\Mike\Desktop\txtfile.txt', pdf2txt('C:\Users\Mike\Desktop\Athy Database\Athy Register.pdf'));
}

//handles versions >1.2
function handleV3($data){
    // grab objects and then grab their contents (chunks)
    $a_obj = getDataArray($data,"obj","endobj");
    $result_data="";
    foreach($a_obj as $obj){
        //check if it a string
        if(substr_count($obj,"/GS1")>0){
            //the strings are between ( and )
            preg_match_all("|\((.*?)\)|",$obj,$field,PREG_SET_ORDER);
            if(is_array($field))
                foreach($field as $data)
                    $result_data.=$data[1];
        }
    }
    return $result_data;
    file_put_contents('C:\Users\Mike\Desktop\txtfile.txt', pdf2txt('C:\Users\Mike\Desktop\Athy Database\Athy Register.pdf'));
}

function ps2txt($ps_data){
    $result = "";
    $a_data = getDataArray($ps_data,"[","]");
    if (is_array($a_data)){
        foreach ($a_data as $ps_text){
            $a_text = getDataArray($ps_text,"(",")");
            if (is_array($a_text)){
                foreach ($a_text as $text){
                    $result .= substr($text,1,strlen($text)-2);
                }
            }
        }
    } else {
        // the data may just be in raw format (outside of [] tags)
        $a_text = getDataArray($ps_data,"(",")");
        if (is_array($a_text)){
            foreach ($a_text as $text){
                $result .= substr($text,1,strlen($text)-2);
            }
        }
    }
    return $result;
    file_put_contents('C:\Users\Mike\Desktop\txtfile.txt', pdf2txt('C:\Users\Mike\Desktop\Athy Database\Athy Register.pdf'));	
}

function getFileData($filename){
    $handle = fopen($filename,"rb");
    $data = fread($handle, filesize($filename));
    fclose($handle);
    return $data;
}

function getDataArray($data,$start_word,$end_word){

    $start = 0;
    $end = 0;
    unset($a_result);
   
    while ($start!==false && $end!==false){
        $start = strpos($data,$start_word,$end);
        if ($start!==false){
            $end = strpos($data,$end_word,$start);
            if ($end!==false){
                // data is between start and end
                $a_result[] = substr($data,$start,$end-$start+strlen($end_word));
            }
        }
    }
    return $a_result;
    file_put_contents('C:\Users\Mike\Desktop\txtfile.txt', pdf2txt('C:\Users\Mike\Desktop\Athy Database\Athy Register.pdf'));
}
?>

July 9, 2009

So I should put the

file_put_contents('C:\Users\Mike\Desktop\txtfile.txt', pdf2txt('C:\Users\Mike\Desktop\Athy Database\Athy Register.pdf'));
[\code]

At the end of the handleV3 and handleV2 functions?

July 9, 2009

There's an If and an else statement in it

July 9, 2009

I'm using a piece of code to pass text from a pdf to a textfile but when it runs the text file isnt created and I'm not getting any error reports??

Any ideas why?

<?php
// Function    : pdf2txt()
// Arguments   : $filename - Filename of the PDF you want to extract
// Description : Reads a pdf file, extracts data streams, and manages
//               their translation to plain text - returning the plain
//               text at the end
// Authors      : Jonathan Beckett, 2005-05-02
//                            : Sven Schuberth, 2007-03-29

function pdf2txt($filename){

    file_put_contents('C:\Users\Mike\Desktop\txtfile.txt', pdf2txt('C:\Users\Mike\Desktop\Athy Database\Athy Register.pdf'));    

    $pdftext = pdf2txt('C:\Users\Mike\Desktop\Athy Database\Athy Register.pdf');

    $data = getFileData($filename);
   
    $s=strpos($data,"%")+1;
   
    $version=substr($data,$s,strpos($data,"%",$s)-1);
    if(substr_count($version,"PDF-1.2")==0)
        return handleV3($data);
    else
        return handleV2($data);

   
}
// handles the verson 1.2
function handleV2($data){
       
    // grab objects and then grab their contents (chunks)
    $a_obj = getDataArray($data,"obj","endobj");
   
    foreach($a_obj as $obj){
       
        $a_filter = getDataArray($obj,"<<",">>");
   
        if (is_array($a_filter)){
            $j++;
            $a_chunks[$j]["filter"] = $a_filter[0];

            $a_data = getDataArray($obj,"stream\r\n","endstream");
            if (is_array($a_data)){
                $a_chunks[$j]["data"] = substr($a_data[0],
strlen("stream\r\n"),
strlen($a_data[0])-strlen("stream\r\n")-strlen("endstream"));
            }
        }
    }

    // decode the chunks
    foreach($a_chunks as $chunk){

        // look at each chunk and decide how to decode it - by looking at the contents of the filter
        $a_filter = split("/",$chunk["filter"]);
       
        if ($chunk["data"]!=""){
            // look at the filter to find out which encoding has been used           
            if (substr($chunk["filter"],"FlateDecode")!==false){
                $data =@ gzuncompress($chunk["data"]);
                if (trim($data)!=""){
                    $result_data .= ps2txt($data);
                } else {
               
                    //$result_data .= "x";
                }
            }
        }
    }
   
    return $result_data;
}

//handles versions >1.2
function handleV3($data){
    // grab objects and then grab their contents (chunks)
    $a_obj = getDataArray($data,"obj","endobj");
    $result_data="";
    foreach($a_obj as $obj){
        //check if it a string
        if(substr_count($obj,"/GS1")>0){
            //the strings are between ( and )
            preg_match_all("|\((.*?)\)|",$obj,$field,PREG_SET_ORDER);
            if(is_array($field))
                foreach($field as $data)
                    $result_data.=$data[1];
        }
    }
    return $result_data;
    file_put_contents('C:\Users\Mike\Desktop\txtfile.txt', pdf2txt('C:\Users\Mike\Desktop\Athy Database\Athy Register.pdf'));
}

function ps2txt($ps_data){
    $result = "";
    $a_data = getDataArray($ps_data,"[","]");
    if (is_array($a_data)){
        foreach ($a_data as $ps_text){
            $a_text = getDataArray($ps_text,"(",")");
            if (is_array($a_text)){
                foreach ($a_text as $text){
                    $result .= substr($text,1,strlen($text)-2);
                }
            }
        }
    } else {
        // the data may just be in raw format (outside of [] tags)
        $a_text = getDataArray($ps_data,"(",")");
        if (is_array($a_text)){
            foreach ($a_text as $text){
                $result .= substr($text,1,strlen($text)-2);
            }
        }
    }
    return $result;
    file_put_contents('C:\Users\Mike\Desktop\txtfile.txt', pdf2txt('C:\Users\Mike\Desktop\Athy Database\Athy Register.pdf'));	
}

function getFileData($filename){
    $handle = fopen($filename,"rb");
    $data = fread($handle, filesize($filename));
    fclose($handle);
    return $data;
}

function getDataArray($data,$start_word,$end_word){

    $start = 0;
    $end = 0;
    unset($a_result);
   
    while ($start!==false && $end!==false){
        $start = strpos($data,$start_word,$end);
        if ($start!==false){
            $end = strpos($data,$end_word,$start);
            if ($end!==false){
                // data is between start and end
                $a_result[] = substr($data,$start,$end-$start+strlen($end_word));
            }
        }
    }
    return $a_result;
    file_put_contents('C:\Users\Mike\Desktop\txtfile.txt', pdf2txt('C:\Users\Mike\Desktop\Athy Database\Athy Register.pdf'));
}
?>

July 9, 2009

So I just remove

$pdftext = pdf2txt('C:\Users\Mike\Desktop\Athy Database\Athy Register.pdf');

and replace it with

file_put_contents('/path/to/txtfile.txt', pdf2txt('/path/to/pdfile.pdf'));

??

July 8, 2009

Im using a piece of code I found to read the contents of a pdf file and put the output into a text file but I cant get the contents to pass??

My code is:

<?php
// Function    : pdf2txt()
// Arguments   : $filename - Filename of the PDF you want to extract
// Description : Reads a pdf file, extracts data streams, and manages
//               their translation to plain text - returning the plain
//               text at the end
// Authors      : Jonathan Beckett, 2005-05-02
//                            : Sven Schuberth, 2007-03-29

function pdf2txt($filename){

    $pdftext = pdf2txt('C:\Users\Mike\Desktop\Athy Database\Athy Register.pdf');

    $data = getFileData($filename);
   
    $s=strpos($data,"%")+1;
   
    $version=substr($data,$s,strpos($data,"%",$s)-1);
    if(substr_count($version,"PDF-1.2")==0)
        return handleV3($data);
    else
        return handleV2($data);

   
}
// handles the verson 1.2
function handleV2($data){
       
    // grab objects and then grab their contents (chunks)
    $a_obj = getDataArray($data,"obj","endobj");
   
    foreach($a_obj as $obj){
       
        $a_filter = getDataArray($obj,"<<",">>");
   
        if (is_array($a_filter)){
            $j++;
            $a_chunks[$j]["filter"] = $a_filter[0];

            $a_data = getDataArray($obj,"stream\r\n","endstream");
            if (is_array($a_data)){
                $a_chunks[$j]["data"] = substr($a_data[0],
strlen("stream\r\n"),
strlen($a_data[0])-strlen("stream\r\n")-strlen("endstream"));
            }
        }
    }

    // decode the chunks
    foreach($a_chunks as $chunk){

        // look at each chunk and decide how to decode it - by looking at the contents of the filter
        $a_filter = split("/",$chunk["filter"]);
       
        if ($chunk["data"]!=""){
            // look at the filter to find out which encoding has been used           
            if (substr($chunk["filter"],"FlateDecode")!==false){
                $data =@ gzuncompress($chunk["data"]);
                if (trim($data)!=""){
                    $result_data .= ps2txt($data);
                } else {
               
                    //$result_data .= "x";
                }
            }
        }
    }
   
    return $result_data;
}

//handles versions >1.2
function handleV3($data){
    // grab objects and then grab their contents (chunks)
    $a_obj = getDataArray($data,"obj","endobj");
    $result_data="";
    foreach($a_obj as $obj){
        //check if it a string
        if(substr_count($obj,"/GS1")>0){
            //the strings are between ( and )
            preg_match_all("|\((.*?)\)|",$obj,$field,PREG_SET_ORDER);
            if(is_array($field))
                foreach($field as $data)
                    $result_data.=$data[1];
        }
    }
    return $result_data;
    file_put_contents('C:\Users\Mike\Desktop\file.txt');
}

function ps2txt($ps_data){
    $result = "";
    $a_data = getDataArray($ps_data,"[","]");
    if (is_array($a_data)){
        foreach ($a_data as $ps_text){
            $a_text = getDataArray($ps_text,"(",")");
            if (is_array($a_text)){
                foreach ($a_text as $text){
                    $result .= substr($text,1,strlen($text)-2);
                }
            }
        }
    } else {
        // the data may just be in raw format (outside of [] tags)
        $a_text = getDataArray($ps_data,"(",")");
        if (is_array($a_text)){
            foreach ($a_text as $text){
                $result .= substr($text,1,strlen($text)-2);
            }
        }
    }
    return $result;
    file_put_contents('C:\Users\Mike\Desktop\file.txt');	
}

function getFileData($filename){
    $handle = fopen($filename,"rb");
    $data = fread($handle, filesize($filename));
    fclose($handle);
    return $data;
}

function getDataArray($data,$start_word,$end_word){

    $start = 0;
    $end = 0;
    unset($a_result);
   
    while ($start!==false && $end!==false){
        $start = strpos($data,$start_word,$end);
        if ($start!==false){
            $end = strpos($data,$end_word,$start);
            if ($end!==false){
                // data is between start and end
                $a_result[] = substr($data,$start,$end-$start+strlen($end_word));
            }
        }
    }
    return $a_result;
    file_put_contents('C:\Users\Mike\Desktop\file.txt');
}
?>

July 8, 2009

Do i place the "File_put_contents" at the end of each function?

July 8, 2009

Where does the output of the file go? I just get a blank screen returned with no errors

July 8, 2009

Do I replace $filename with the file location or is there an esier way??

July 8, 2009

Sorry I should have rephrased my question. Do i replace all $filename or can I code $filename="C:\file"

July 8, 2009

Probably a stupid question but I found the below code to play with but how to I point it to my pdf file?

<?php
// Function    : pdf2txt()
// Arguments   : $filename - Filename of the PDF you want to extract
// Description : Reads a pdf file, extracts data streams, and manages
//               their translation to plain text - returning the plain
//               text at the end
// Authors      : Jonathan Beckett, 2005-05-02
//                            : Sven Schuberth, 2007-03-29

function pdf2txt($filename){

    $data = getFileData($filename);
   
    $s=strpos($data,"%")+1;
   
    $version=substr($data,$s,strpos($data,"%",$s)-1);
    if(substr_count($version,"PDF-1.2")==0)
        return handleV3($data);
    else
        return handleV2($data);

   
}
// handles the verson 1.2
function handleV2($data){
       
    // grab objects and then grab their contents (chunks)
    $a_obj = getDataArray($data,"obj","endobj");
   
    foreach($a_obj as $obj){
       
        $a_filter = getDataArray($obj,"<<",">>");
   
        if (is_array($a_filter)){
            $j++;
            $a_chunks[$j]["filter"] = $a_filter[0];

            $a_data = getDataArray($obj,"stream\r\n","endstream");
            if (is_array($a_data)){
                $a_chunks[$j]["data"] = substr($a_data[0],
strlen("stream\r\n"),
strlen($a_data[0])-strlen("stream\r\n")-strlen("endstream"));
            }
        }
    }

    // decode the chunks
    foreach($a_chunks as $chunk){

        // look at each chunk and decide how to decode it - by looking at the contents of the filter
        $a_filter = split("/",$chunk["filter"]);
       
        if ($chunk["data"]!=""){
            // look at the filter to find out which encoding has been used           
            if (substr($chunk["filter"],"FlateDecode")!==false){
                $data =@ gzuncompress($chunk["data"]);
                if (trim($data)!=""){
                    $result_data .= ps2txt($data);
                } else {
               
                    //$result_data .= "x";
                }
            }
        }
    }
   
    return $result_data;
}

//handles versions >1.2
function handleV3($data){
    // grab objects and then grab their contents (chunks)
    $a_obj = getDataArray($data,"obj","endobj");
    $result_data="";
    foreach($a_obj as $obj){
        //check if it a string
        if(substr_count($obj,"/GS1")>0){
            //the strings are between ( and )
            preg_match_all("|\((.*?)\)|",$obj,$field,PREG_SET_ORDER);
            if(is_array($field))
                foreach($field as $data)
                    $result_data.=$data[1];
        }
    }
    return $result_data;
}

function ps2txt($ps_data){
    $result = "";
    $a_data = getDataArray($ps_data,"[","]");
    if (is_array($a_data)){
        foreach ($a_data as $ps_text){
            $a_text = getDataArray($ps_text,"(",")");
            if (is_array($a_text)){
                foreach ($a_text as $text){
                    $result .= substr($text,1,strlen($text)-2);
                }
            }
        }
    } else {
        // the data may just be in raw format (outside of [] tags)
        $a_text = getDataArray($ps_data,"(",")");
        if (is_array($a_text)){
            foreach ($a_text as $text){
                $result .= substr($text,1,strlen($text)-2);
            }
        }
    }
    return $result;
}

function getFileData($filename){
    $handle = fopen($filename,"rb");
    $data = fread($handle, filesize($filename));
    fclose($handle);
    return $data;
}

function getDataArray($data,$start_word,$end_word){

    $start = 0;
    $end = 0;
    unset($a_result);
   
    while ($start!==false && $end!==false){
        $start = strpos($data,$start_word,$end);
        if ($start!==false){
            $end = strpos($data,$end_word,$start);
            if ($end!==false){
                // data is between start and end
                $a_result[] = substr($data,$start,$end-$start+strlen($end_word));
            }
        }
    }
    return $a_result;
}
?>

July 8, 2009

Are there any code examples to send the text of a pdf docment into a text file? The pdf is text only

July 7, 2009

Each page is in a fairly standard format for example

Polling Station: Athy Boys Nat. School -

Room 01

Ardreigh (ED Athy Rural) Athy

1 Callan, Cthy Wxxxide

2 Callan, Lam Waysxxxe

3 Callan, Marget Wayde

4 Callen, Cathne Bray Road

5 Callen, Tithy Bray Road

6 Carbery, Emma Farill

7 Carbery, Jeriah Farmhill

8 Carbery, Mry Farml

9 Carbery, Sarh Fahill

10 Cuy, Brian Bchlawn

11 Cully-Wall, Brda Beechlawn

There's 2 columns on each page

July 7, 2009

Im just wondering if anyone has any sample code to convert a pdf file to excel so I can edit it to suit my project?

Any help on this would be great!!

July 7, 2009

Would it be possible to put the contents of the pdf into an array?

July 7, 2009

I've found a free converter online but was wondering if there was a php version that I could play around with that I could tweek for my specfic pdf doc

July 7, 2009

Hi all,

Im just wondering if anyone has php code to convert a pdf document to the comma seperated file??

Could someone point me in the right direction

April 8, 2009

Im getting an error Parse error: parse error in C:\xampp\htdocs\remove.php on line 3

with

$currentline=preg_replace('/(,\n)$/''', $test)

April 8, 2009

Something like this?

<?php
$test = file_get_contents("C:\Users\Mike\Desktop\AthyDB.txt");
$currentline=preg_replace('/,$/',\n'', $test);
file_put_contents('C:\Users\Mike\Desktop\test.txt', $currentline);
?>

April 8, 2009

How would I apply

$currentline=preg_replace('/,$/','', $d);

when taking in a file?

April 8, 2009

Something like

<?php
$test = file_get_contents("C:\Users\Mike\Desktop\AthyDB.txt");
$field=str_replace('Co.Kildare,','Co.Kildare');
file_put_contents('C:\Users\Mike\Desktop\test.txt', $test)
?>

April 8, 2009

I have a comma separated file with all rows ending with the word Co.Kildare but on a large number of theses rows the word appears with a comma at the end ie Co.Kildare,

How could I code the replacement of Co.Kildare, with Co.Kildare?

<?php
$test = file_get_contents("C:\Users\Mike\Desktop\AthyDB.txt");

Replacement code??

file_put_contents('C:\Users\Mike\Desktop\test.txt', $test);
?>

April 8, 2009

That seems to have fixed it

Thanks for the help!

April 8, 2009

<?php
$rewrote = "";
$handle = fopen("C:\Users\Mike\Desktop\AthyDB.txt", "r+"); // Open file to read it read.

if ($handle) {
while (!feof($handle)) // Loop til end of file.
{
$currentline = fgets($handle, 4096); // Read a line.
$currentline=preg_replace('/^D$/','Dail European Parliament and Local Elections only', $currentline);
$currentline=preg_replace('/^S$/','Post or special arrangement only', $currentline);
$currentline=preg_replace('/^L$/','Local Elections only', $currentline);
$currentline=preg_replace('/^E$/','European Parliament and Local Elections only', $currentline);
$rewrote .= $currentline;
$rewrote .= "\n";
}
file_put_contents('C:\Users\Mike\Desktop\test.txt', $rewrote);
fclose($handle);
}
?>

That seems to fix the repeating problem but the code seems to stop replacing the letters after 2,000+

Very strange. Any idea why it stops replacing?

April 8, 2009

I have a file that looks something like the following:

D,2796,Son,Oler,13 Dun Bnn,Bch Road,Ahy,Co.Kire

S,2797,Gerty,Laurce,15 Dun Brn,Bleach ad,Ahy,Co.Kilde

L,2801,Mazse,Saras,17 Dn Brn,Blch Rod,Aty,Co.Kilre

E,2808,Esjo,Leel,21 Dun Bnn,Blach Road,Ay,Co.Kilre

What I am trying to do is replace the single character letters "D","S","L" & "E" with the following sentences

Dail European Parliament and Local Elections only (instead of D)

European Parliament and Local Elections only (instead of E)

Local Elections only (instead of L)

Post or special arrangement only (instead of S)

with lots of help we compiled the following

<?php
$rewrote = "";
$handle = fopen("C:\Users\Mike\Desktop\AthyDB.txt", "r+"); // Open file to read it read.

if ($handle) {
while (!feof($handle)) // Loop til end of file.
{
$currentline = fgets($handle, 4096); // Read a line.
$currentline=preg_replace('/^D/','Dail European Parliament and Local Elections only', $currentline);
$currentline=preg_replace('/^S/','Post or special arrangement only', $currentline);
$currentline=preg_replace('/^L/','Local Elections only', $currentline);
$currentline=preg_replace('/^E/','European Parliament and Local Elections only', $currentline);
$rewrote .= $currentline;
$rewrote .= "\n";
}
file_put_contents('C:\Users\Mike\Desktop\test.txt', $rewrote);
fclose($handle);
}
?>

It seems that the replacing of "D" produces "Dail European Parliament and Local Elections onlyail European Parliament and Local Elections only"

Any idea what's gone wrong?

It also seems to happen for "L" a few times but not everytime?? (Strange)[/code]

Sign In

mikebyrne

Posts

Joined

Last visited

Content Type

Profiles

Forums

Posts posted by mikebyrne

Pdf data not passing to textfile??

Pdf data not passing to textfile??

Pdf data not passing to textfile??

Pdf data not passing to textfile??

Output into text file?

Output into text file?

[SOLVED] Code to put contents of pdf into text file?

[SOLVED] Code to put contents of pdf into text file?

[SOLVED] Code to put contents of pdf into text file?

[SOLVED] Code to put contents of pdf into text file?

[SOLVED] Code to put contents of pdf into text file?

[SOLVED] Code to put contents of pdf into text file?

php code to convert pdf to excel?

php code to convert pdf to excel?

[SOLVED] PDF to CSF converter??

[SOLVED] PDF to CSF converter??

[SOLVED] PDF to CSF converter??

[SOLVED] Replacing "Co.Kildare," with "Co.Kildare"

[SOLVED] Replacing "Co.Kildare," with "Co.Kildare"

[SOLVED] Replacing "Co.Kildare," with "Co.Kildare"

[SOLVED] Replacing "Co.Kildare," with "Co.Kildare"

[SOLVED] Replacing "Co.Kildare," with "Co.Kildare"

[SOLVED] Strange, values repeating??

[SOLVED] Strange, values repeating??

[SOLVED] Strange, values repeating??

Browse

Activity

Important Information