Jump to content

Is there a command to save a pdf file as text in php


samona

Recommended Posts

If it is an image you would need to use OCR to transfer it, and that is shaky at best. So there probably is, but as to whether it will work is a toss up.

 

Someone else may have done this and know, but as far as I know it is not possible.

See the attached file (hello.pdf)

It is an example of uncompressed pdf file. If you open it in acrobat, it displays 'Hello World'

If you open it in Notepad (or other text editor) you'll see lots of control characters, and in the line 35 the words "Hello World".

If you pdf files look like this, you can actually process them in PHP to remove all those control characters and leave just the content.

If however your pdf files, when open in Notepad, look nothing like this, they're probably compressed, in which case I'm afraid you can't extract text from them with PHP alone.

 

[attachment deleted by admin]

If so, extracting the text should be possible. You would need to take a look at PDF Reference Book and see what control characters are enclosing text objects. Or look for someone who can do this script for you. (I'm not going into it, too busy)

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.