Staggan Posted April 11, 2012 Share Posted April 11, 2012 Hello I am after some help with a concept to allow us to import some data into our database. I have a PDF which I have OCR'd and now have a word document... I then manually clean up the document to remove spurious characters, that part is fine. Now, the word document contains an unknown number of lines and an unknown number of records... and within those records an uknown number of fields... Here is an example of a single record: Mr G Aldred, 26/11/08, Canvardine Chance, Wolfies Dawn Tilley, KENINE SECRET DESTINY, d, c. Grey & White, KENINE SHADES OF THE NIGHT, d, c. Grey & White, KENINE SILENT WHISPER, b, c. Seal & White, KENINE SOFT KISSES, b, c. Grey & White, KENINE SPIRIT OF THE STORM, d, c. Seal & White KENINE STAR QUALITY, d, c. Seal & White And I could have multiple records like this, but with different number of names toward the end So, if I manually put some delimiting character instead of coma's , can I get PHP to read the WHOLE document in, split the document into records and then split those records into subfields to populate a form which I can then automatically submit to my database? Not sure if that is clear.... Thanks Quote Link to comment https://forums.phpfreaks.com/topic/260738-splitting-file-data-into-strings-to-populate-form/ Share on other sites More sharing options...
wigwambam Posted April 11, 2012 Share Posted April 11, 2012 You could read the whole line in (as per your example record) and use the php explode function to populate an array:- $arr = explode(",",$record); Commas aren't the best delimiter as there could be commas in the data. Use a character you know won't be used in the data, like a tilda ~ Quote Link to comment https://forums.phpfreaks.com/topic/260738-splitting-file-data-into-strings-to-populate-form/#findComment-1336486 Share on other sites More sharing options...
litebearer Posted April 11, 2012 Share Posted April 11, 2012 1. consider each line as a record, use the carriage return as separator for each line 2. As mentioned above use the tilde ~ as the delimiter between 'fields' 3. make sure each 'record' has the same number of 'fields'; even if they are blank 4. save as txt file 5. read into php via file() as this will create an array in which each element is a line from your file. Quote Link to comment https://forums.phpfreaks.com/topic/260738-splitting-file-data-into-strings-to-populate-form/#findComment-1336488 Share on other sites More sharing options...
Staggan Posted April 11, 2012 Author Share Posted April 11, 2012 The problem is the file does not have consistent fields... Let me explain This is a record of dog registrations, where each record would contain an owner, date of birth, mother, father and then each of the offspring and their sexes. I have something working now which works from a simple string taken from the OCR'd file. This is what I have: <?php $page = "!Mr G Aldred, 26/11/08, Canvardine Chance, Wolfies Dawn Tilley, KENINE SECRET DESTINY, d, Grey & White, KENINE SHADES OF THE NIGHT, d, Grey & White, KENINE SILENT WHISPER, b, Seal & White, KENINE SOFT KISSES, b, Grey & White, KENINE SPIRIT OF THE STORM, d, Seal & White, KENINE STAR QUALITY, d, Seal & White, !Mrs S L Bartlett, 10/11/08, Engbull Big Boy, Savannahs Snow At Delimit, ENGBULL AKERIA, b, Red & White, ENGBULL BLAZE, b, Seal & White ENGBULL ELSKA, b, Red & White, ENGBULL, HECTOR, d, Red & White, ENGBULL TALA, b, Red & White, ENGBULL TUCKER, d, Red & White, ENGBULL ZEUS, d, Red & White"; $contents = explode("!", $page); //echo $records[1]; //echo $contents[2]; //echo print_r ($records); $records = explode(",", $contents[1]); $count = count($records); $record_owner = $records[0]; $record_dob = $records[1]; $record_sire = $records[2]; $record_dam = $records[3]; echo $record_owner; echo $record_dob; echo $record_sire; echo $record_dam; $count = count($records); $loop = ($count-2) / 3; $position = 3; for ($i = 1; $i <= $loop-1; $i++) { $position +=1; $record_dog[$i] = $records[$position]; $position += 1; $record_sex[$i] = $records[$position]; $position += 1; $record_colour[$i] = $records[$position]; } echo print_r ($record_dog); echo print_r ($record_sex); echo print_r ($record_colour); ?> It's very hacky but it gives the correct results... I now need to read a file rather than enter the text as a string and then I need to automate entry into dbase in some way Quote Link to comment https://forums.phpfreaks.com/topic/260738-splitting-file-data-into-strings-to-populate-form/#findComment-1336506 Share on other sites More sharing options...
litebearer Posted April 11, 2012 Share Posted April 11, 2012 who/what creates the ORIGINAL document? Quote Link to comment https://forums.phpfreaks.com/topic/260738-splitting-file-data-into-strings-to-populate-form/#findComment-1336527 Share on other sites More sharing options...
Staggan Posted April 11, 2012 Author Share Posted April 11, 2012 A PDF is created of a scan of pages from a book. The PDF is then OCR'd into the document. Quote Link to comment https://forums.phpfreaks.com/topic/260738-splitting-file-data-into-strings-to-populate-form/#findComment-1336533 Share on other sites More sharing options...
litebearer Posted April 11, 2012 Share Posted April 11, 2012 In order achieve what you seeking to do, you need some sort of 'order' to the data how will your script know if a 'piece of data' is missing in a particular line? ie as you now have it line 1 = name, date, father, mother line 2 = date, mother as you should have it line 1 = name, date, father, mother line 2 = ,date,,mother EDIT: use tilde rather than commas Quote Link to comment https://forums.phpfreaks.com/topic/260738-splitting-file-data-into-strings-to-populate-form/#findComment-1336535 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.