Jump to content

PHP - Reading Field Values within a PDF document


labmixz

Recommended Posts

I've been trying to find a solution (library) that enables me to open a .pdf document (which is already filled out by personal) and then parse through the fields and reading the values of the fields...

 

However it seems most everything I find is about PDF Generation... PDFLib was something I first looked at and it is 90% about PDF generation, however the PPS product they offer will allow me to add values to the fields of the PDF, which is not really what I'm looking for, I can make this work, but it's not the ideal solution for what I'm working with.

 

I also looked at Adobe LiveCycle Reader Exntensions ES, however that needs a Java Application Server to even run, so in my opinion that's ot of the question...

 

The ideal solution is:

 

Worker will pull up an online library of PDF's, then they will save that PDF to their system, do a series of tasks to complete the checklist on the PDF, then upload the complete PDF via: web interface, from there I want to parse through the data, if the data validates tell them so and save the uploaded PDF in a specified directory pertaining to the ticket ID, if the data doesn't validate then tell them so and reject the document.

 

--

 

All of the above is easy, except the reading of the PDF, which I can't find out if the built-in PDF functions can actually READ an already filled out PDF document...

 

Right now the only solution I have is to duplicate every PDF form in a web-form, validate the fields server side with business rules on submit, if valid create a .FDF, then from there after all the .FDF's are made (probably will be several PDF's/FDF) merge all the FDF's into PDF's, then merge all the PDF's together as one document in order.

 

Doing it this way is not the ideal solution because the PDF documents can change, rather if I'm just parsing for certain field names and reading the values, that would be ideal. Of course the complete ideal solution is to do away with PDF's and make everything stored into a database, however I can't do that as this is an already on going process, we are just trying to add validation and tracking to an already existing process.

 

I hope someone has an answer for me...

I have no idea; however, if you go to http://www.php.net/manual/en/ref.pdf.php and scroll down to luc at phpt dot org, it looks like he has some information about what you are trying to do. I think he uses code from the poster directly under him, though, so you might want to check both.

I tried a few methods on there, doesn't seem to work if you open the PDF document, type in the fields, then save the document, even though when you bring the PDF up in say, Adobe Reader, the values don't show up in the pdftotext functions...

 

Likewise within Adobe Reader OR Adobe Acrobat Pro 8, if you take the saved PDF with the values and export it straight from Adobe to a XML document, it also will not show the values...

 

However I did open up the PDF document in notpad and found a lot of garbage, but what I did find is that each field with it's value started like:

 

R/T(field_name) /V(value)

 

Which means, I could open up the file with a simple fopen and parse thruogh the data that way, kind of an annoyance with all of the trash in the file, but would be a way to do it. Just don't understand even exporting it from Adobe Pro 8, it still won't export the values out... kind of stupid...

 

If anyone has a better solution for this please let me know.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.