Jump to content

PDF file converted to csv


quasiman

Recommended Posts

The company I work for uses a PDF file stored on our intranet to display statistics for each of the employees.  It's a massive file, and a pain to sort through, so I want to convert it into csv for a mysql database lookup.

 

Is there a way already implemented that I could adjust for my needs?  This is the basic data that would be used, of course the actual numbers are made up, and I'm leaving out all the heading information (page number, title, date, etc, etc) but the necessary info looks like this:

 

1629 Doe, John - Adherence Summary

-------- Adherence --------- ------------------ Conformance ------------------

Scheduled Scheduled Actual Min. In Min.Out Perc. In +/- Min. Perc. In Percent of Percent of

Activities Time Time Adhere Adhere Adhere Conform Conform Total Sched. Total Actual

-------------------- ---------- ---------- -------- -------- ------ --------- -------- ------------ ------------

Logged On 143:50 113:06 6506 2124 75 % -1844 79 % 74 % 58 %

Logged Off 39:33 39:24 2107 305 89 % -9 100 % 20 % 20 %

AUX 0:00 17:44 0 0 0 % +1064 999 % 0 % 9 %

AUX1 12:07 25:55 548 179 75 % +828 214 % 6 % 13 %

==================== ========== ========== ======== ======== ====== =========

Total 195:30 196:09 9161 2608 78 % +39

 

It doesn't format well just cut/pasted here, but you can see the column headings starting with "Scheduled Activities", then the rows defined going down from "Logged On"....

 

Anyway, this is probably more info than anyone needed, I'm just trying to figure out how to scrape out only the info I want from it, so if there's something out there that would do a similar job I'd really appreciate the pointer!

Link to comment
Share on other sites

I have never done reading of a pdf file with PHP before and I think it's a bit tricky since it can contain for example pictures but never the less it seems to be possible. I did a quick search for "php read pdf" and found this in the manual:

 

http://no.php.net/pdf

 

Especially this comment talks about reading:

 

http://no.php.net/manual/en/ref.pdf.php#49690

 

I assume the tricky part is the reading of the pdf file but if it is to convert the actual data into csv please tell us :)

Link to comment
Share on other sites

I'd like to convert the actual data - there aren't any pictures, so I could probably convert it to text first...hmm...

 

This is all windows based, and I have a local group website on a XAMPP install, so maybe I could use a batch file in a schedule event to download it, then save it as text file, then read that to csv in the PHP?

 

So I guess I need to look at converting that text file into a csv - somehow pulling the fields out of it without all the extra info.

Link to comment
Share on other sites

while its a pain to transfer it has to be created through some source, maybe you should look into the creation (if this is a continuous thing) to create a secondary storage as flat files so you can transfer all future ones quickly, and manually do the past ones. PDF file structure is complex, while its not impossible to figure out if the amount of historical data is limited but the future data is great this will save you a lot of headaches.

Link to comment
Share on other sites

I agree with cooldude832. The smartest thing to do would be to look into the creation of the PDF file and get a flat text file created. As I said I don't know much about PDF and PHP but the links I gave you should provide a starting point for your project since they talk about reading from a PDF file... if you can read all the text and store it as a variable or each line as an entry in an array it should be pretty straight forward :)

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.