Jump to content

Splitting file data into strings to populate form


Staggan

Recommended Posts

Hello

 

I am after some help with a concept to allow us to import some data into our database.

 

I have a PDF which I have OCR'd and now have a word document... I then manually clean up the document to remove spurious characters, that part is fine.

 

Now, the word document contains an unknown number of lines and an unknown number of records... and within those records an uknown number of fields...

 

Here is an example of a single record:

 

 

Mr G Aldred, 26/11/08, Canvardine Chance, Wolfies Dawn Tilley, KENINE SECRET DESTINY, d, c. Grey & White, KENINE SHADES OF THE NIGHT, d, c. Grey & White, KENINE SILENT WHISPER, b, c. Seal & White, KENINE SOFT KISSES, b, c. Grey & White, KENINE SPIRIT OF THE STORM, d, c. Seal & White KENINE STAR QUALITY, d, c. Seal & White

 

And I could have multiple records like this, but with different number of names toward the end

 

So, if I manually put some delimiting character instead of coma's , can I get PHP to read the WHOLE document in, split the document into records and then split those records into subfields to populate a form which I can then automatically submit to my database?

 

Not sure if that is clear....

 

Thanks

 

 

 

Link to comment
Share on other sites

You could read the whole line in (as per your example record) and use the php explode function to populate an array:-

 

$arr = explode(",",$record);

 

Commas aren't the best delimiter as there could be commas in the data. Use a character you know won't be used in the data, like a tilda ~

Link to comment
Share on other sites

1. consider each line as a record, use the carriage return as separator for each line

2. As mentioned above use the tilde ~ as the delimiter between 'fields'

3. make sure each 'record' has the same number of 'fields'; even if they are blank

4. save as txt file

5. read into php via file() as this will create an array in which each element is a line from your file.

 

Link to comment
Share on other sites

The problem is the file does not have consistent fields...

 

Let me explain

 

This is a record of dog registrations, where each record would contain an owner, date of birth, mother, father and then each of the offspring and their sexes.

 

I have something working now which works from a simple string taken from the OCR'd file.

 

This is what I have:

 


<?php

$page  = "!Mr G Aldred, 26/11/08, Canvardine Chance, Wolfies Dawn Tilley, KENINE SECRET DESTINY, d, Grey & White, KENINE SHADES OF THE NIGHT, d, Grey & White, KENINE SILENT WHISPER, b, Seal & White, KENINE SOFT KISSES, b, Grey & White, KENINE SPIRIT OF THE STORM, d, Seal & White, KENINE STAR QUALITY, d, Seal & White, !Mrs S L Bartlett, 10/11/08, Engbull Big Boy, Savannahs Snow At Delimit, ENGBULL AKERIA, b, Red & White, ENGBULL BLAZE, b, Seal & White ENGBULL ELSKA, b, Red & White, ENGBULL, HECTOR, d, Red & White, ENGBULL TALA, b, Red & White, ENGBULL TUCKER, d, Red & White, ENGBULL ZEUS, d, Red & White";

$contents = explode("!", $page);

//echo $records[1]; 
//echo $contents[2]; 
//echo print_r ($records);

$records = explode(",", $contents[1]);


$count = count($records);

   
$record_owner = $records[0];
$record_dob = $records[1];
$record_sire = $records[2];
$record_dam = $records[3];


echo	$record_owner;
echo	$record_dob;
echo 	$record_sire;
echo	$record_dam;


$count = count($records);

$loop = ($count-2) / 3;
$position = 3;
for ($i = 1; $i <= $loop-1; $i++) {

$position +=1;
$record_dog[$i] = $records[$position];
$position += 1;
$record_sex[$i] = $records[$position];
$position += 1;
$record_colour[$i] = $records[$position];

}

echo print_r ($record_dog);
echo print_r ($record_sex);
echo print_r ($record_colour);

?> 


 

It's very hacky but it gives the correct results... I now need to read a file rather than enter the text as a string and then I need to automate entry into dbase in some way

 

 

Link to comment
Share on other sites

In order achieve what you seeking to do, you need some sort of 'order' to the data

how will your script know if a 'piece of data' is missing in a particular line?

ie

as you now have it

line 1 = name, date, father, mother

line 2 = date, mother

 

as you should have it

 

line 1 = name, date, father, mother

line 2 = ,date,,mother

EDIT: use tilde rather than commas

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.