Jump to content

Recommended Posts

If it is an image you would need to use OCR to transfer it, and that is shaky at best. So there probably is, but as to whether it will work is a toss up.

 

Someone else may have done this and know, but as far as I know it is not possible.

See the attached file (hello.pdf)

It is an example of uncompressed pdf file. If you open it in acrobat, it displays 'Hello World'

If you open it in Notepad (or other text editor) you'll see lots of control characters, and in the line 35 the words "Hello World".

If you pdf files look like this, you can actually process them in PHP to remove all those control characters and leave just the content.

If however your pdf files, when open in Notepad, look nothing like this, they're probably compressed, in which case I'm afraid you can't extract text from them with PHP alone.

 

[attachment deleted by admin]

If so, extracting the text should be possible. You would need to take a look at PDF Reference Book and see what control characters are enclosing text objects. Or look for someone who can do this script for you. (I'm not going into it, too busy)

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.