Is there a command to save a pdf file as text in php

samona · January 22, 2009

Is there a command to save a pdf as text file so that i can extract data from it?

premiso · January 22, 2009

If it is an image you would need to use OCR to transfer it, and that is shaky at best. So there probably is, but as to whether it will work is a toss up.

Someone else may have done this and know, but as far as I know it is not possible.

samona · January 22, 2009

It's just a report. I can open it in adobe and Save As... a text file. I was just wondering if i can open it in php and save it as a text file.

Mchl · January 22, 2009

If it's not encrypted or compressed, it's a text file anyway. You just need to drop PS control characters.

samona · January 22, 2009

How would I do that. I dont understand what PS controls are.

Mchl · January 22, 2009

See the attached file (hello.pdf)

It is an example of uncompressed pdf file. If you open it in acrobat, it displays 'Hello World'

If you open it in Notepad (or other text editor) you'll see lots of control characters, and in the line 35 the words "Hello World".

If you pdf files look like this, you can actually process them in PHP to remove all those control characters and leave just the content.

If however your pdf files, when open in Notepad, look nothing like this, they're probably compressed, in which case I'm afraid you can't extract text from them with PHP alone.

[attachment deleted by admin]

samona · January 22, 2009

Yes, it does look like that. Even when I save it as text file its all just characters. Theres no images in the file.

Mchl · January 22, 2009

If so, extracting the text should be possible. You would need to take a look at PDF Reference Book and see what control characters are enclosing text objects. Or look for someone who can do this script for you. (I'm not going into it, too busy)

Sign In

Is there a command to save a pdf file as text in php

Recommended Posts

samona

Link to comment

Share on other sites

premiso

Link to comment

Share on other sites

samona

Link to comment

Share on other sites

Mchl

Link to comment

Share on other sites

samona

Link to comment

Share on other sites

Mchl

Link to comment

Share on other sites

samona

Link to comment

Share on other sites

Mchl

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information