Jump to content


Photo

PDF's - Is there a way to read a PDF and pull text from it?


  • Please log in to reply
4 replies to this topic

#1 Nimwei

Nimwei
  • New Members
  • Pip
  • Newbie
  • 4 posts

Posted 11 August 2006 - 01:28 PM

I've got a PDF document that is generated every week by a third-party that I need to go out, download, and parse out the text.  It is a simple word document that has a generic table and then converted to a PDF so the parsing won't be bad.

I've searched around and I can't seem to find any libraries or examples of how to do this.

Anyone help me?

Thanks.

#2 effigy

effigy
  • Staff Alumni
  • Advanced Member
  • 3,600 posts
  • LocationIL

Posted 11 August 2006 - 01:45 PM

Search results for PHP. There are lots of other tools out there to do this, such as pdf2txt.
Regexp | Unicode Article | Letter Database
/\A(e)?((1)?ff(?:(?:ig)?y)?|f(?:ig)?)\z/

#3 Nimwei

Nimwei
  • New Members
  • Pip
  • Newbie
  • 4 posts

Posted 11 August 2006 - 01:57 PM

I don't need a tool to do it.  I need to write a script to do it because I don't want to have to manually go out and convert it every week.  Thanks for the search link though. I've been through them but I'll look further.

#4 mainewoods

mainewoods
  • Members
  • PipPipPip
  • Advanced Member
  • 685 posts
  • LocationMaine

Posted 11 August 2006 - 04:25 PM

php has PDF functions which you sould be able to use if installed:
http://www.php.net/m.../en/ref.pdf.php


#5 Nimwei

Nimwei
  • New Members
  • Pip
  • Newbie
  • 4 posts

Posted 11 August 2006 - 09:30 PM

Yes, I'm aware of the functions for PDFs.  Unfortunately, they are not documented so I was hoping someone could help me out and point me int eh right way.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users