Jump to content

Copy data from a website or from Word or from a PDF and place in database


bschultz

Recommended Posts

I am a radio sports announcer by trade...just happen to dabble in php coding.

 

A few years ago, I developed a system to put football (American) statistics into Excel and upload that (CSV) into a database to ease in preparing for a football broadcast.

 

These stats can either be found on the schools website, or they may send me a Word Document, or a PDF...or maybe even an Excel spreadsheet with their stats...might be a table, might not...that then needs to be entered into "my" Excel spreadsheet (with my formatting, and correct column layout) usually by hand. 

 

I'd like to develop a system to "copy and paste" into a web form, and then upload the data to the database.

 

Here's the format of the stats from the web (in an html table)

 

Rushing Stats

 

Leonhardt, B.  11  35  483  13.8  2  34  43.9

Gandrud, Matt  11  25  294  11.8  2  37  26.7

Lee, Justin  11  18  191  10.6  0  40  17.4

Abram, Cameron  11  15  234  15.6  2  56  21.3

Kondziolka, B.  11  9  52  5.8  2  11  4.7

Metz, Christian  11  7  45  6.4  0  16  4.1

Miles, Brandon  11  5  27  5.4  0  9  2.5

Hemling, Brad  10  3  11  3.7  0  6  1.1

Smith, Jon  9  2  10  5.0  0  6  1.1

Dale, Troy  9  1  7  7.0  0  7  0.8

Kroeplin,Dustin  10  1  4  4.0  0  4  0.4

 

 

 

Passing Stats

 

Rongstad, Lance  11  113.5  117-211-6  55.5  1286  8  56  116.9

Hass, Alex  6  122.7  2-3-0  66.7  20  0  16  3.3

Lee, Justin  11  318.4  2-2-0  100.0  52  0  37  4.7

Sundby, Jesse  11  0.0  0-1-0  0.0  0  0  0  0.0

Miles, Brandon  11  0.0  0-1-0  0.0  0  0  0  0.0

 

 

 

 

Receiving Stats

 

Leonhardt, B.  11  35  483  13.8  2  34  43.9

Gandrud, Matt  11  25  294  11.8  2  37  26.7

Lee, Justin  11  18  191  10.6  0  40  17.4

Abram, Cameron  11  15  234  15.6  2  56  21.3

Kondziolka, B.  11  9  52  5.8  2  11  4.7

Metz, Christian  11  7  45  6.4  0  16  4.1

Miles, Brandon  11  5  27  5.4  0  9  2.5

Hemling, Brad  10  3  11  3.7  0  6  1.1

Smith, Jon  9  2  10  5.0  0  6  1.1

Dale, Troy  9  1  7  7.0  0  7  0.8

Kroeplin,Dustin  10  1  4  4.0  0  4  0.4

 

 

 

 

 

 

 

Defensive Stats

 

32  Scheff, Brody  11  42  54  96  3.0 - 9   2.0 - 8 6 - 53 2  8  .  2 - 0 1  .  .

45  Crosby, Cory  11  28  46  74  9.5 - 28   2.0 - 14 1 - 0 5  6  .    . 1  .  .

5  Sluzewicz,Dusty  11  36  32  68  4.5 - 18   1.0 - 9 4 - 88 3  7  .    . 1  .  .

42  Bromm, Jason  11  19  44  63  4.5 - 15   2.0 - 8   . 1  1  3    . 2  .  .

47  Wright, Buckley  11  24  35  59  9.0 - 29   2.5 - 16   . 2  2  3  2 - 12 1  .  .

1  Hrdlicka, C.  11  23  22  45  0.5 - 1 .   1 - 8 5  6  .  1 - 0 .  .  .

96  Beste, Josh  11  15  24  39  3.5 - 3 .     . .  .  .    . .  .  .

35  Mock, Alex  11  11  26  37  2.5 - 6 .     . 1  1  1    . .  .  .

24  Miller, Larry  11  27  8  35  2.0 - 8   1.0 - 7 1 - 5 4  5  .  1 - 0 1  .  .

48  Young, Gary  11  12  18  30  7.0 - 20   1.0 - 6   . 2  2  .    . .  .  .

95  Shaver, Matthew  10  4  20  24  0.5 - 1 .     . .  .  .    . .  .  .

15  Knutson, Luke  9  7  15  22  4.0 - 13   2.0 - 10   . 1  1  1    . .  .  .

16  Valentine, D.  11  11  8  19  2.0 - 3 .     . 2  2  .    . .  .  .

20  Sczepanski, B.  6  5  13  18  2.5 - 11   1.0 - 8   . 1  1  .    . .  .  .

19  Christensen, D  10  7  10  17  1.5 - 4 .     . .  .  .    . .  .  .

17  Schmidt, Jake  10  6  10  16  0.5 - 1 .     . .  .  .    . 1  1  .

99  Kiebler, Dan  6  6  9  15  2.5 - 8 .     . .  .  .    . .  .  .

 

I'm going into this kinda blind...not knowing which direction to go.

 

Should I strip out the whitespace and replace with a comma and continue to upload using CSV?  How should I go about maintaining newlines...and knowing which player I'm supposed to update info for?  Is there a "better" way to get this info into the DB?

 

What should my database structure be?  How should I get the roster imported first...before the stats.

 

Is there a way to leave the column layout of what they're copying and pasting up to them?  (There really isn't a "standard" for what order the stats are in or the roster columns.  Some will have number, first name, last name, height, weight, hometown...  some will have number, last name, first name, hometown, weight, height). 

 

Let me know if this doesn't make sense to you...I'm not sure it does to me!

 

Thanks for your time!

 

Brian

 

 

Link to comment
Share on other sites

Are the spaces tabbed spaces or just always 5 spaces?

You probably will want to use explode.

http://www.php.net/manual/en/function.explode.php

 

Maybe first explode each line into an array like this:

$array = explode("\n", $text)

Then loop through it like this:

foreach($array AS $value){

}

If you want the lines to be read differently, then have some intelligent way of telling them apart, like if you read a line that contains only "Rushing Stats", then the next will contain some data you need to treat in a certain way. Remember to check if a line is empty too, and just ignore it then. To work with each line just split the data into an array like this:

$valuearray = explode('     ', $value);

or

$valuearray = explode("\t", $value);

You can then put this line into some other array where you want it to be stored for later to be used to for example write some query to update your database. Which is why you probably want to treat and store you data differently.

$dataarray['Receiving Stats'][] = explode("\t", $value);

$valuearray = explode('     ', $value);
$dataarray['Defensive Stats'][$valuearray[0]] = $valuearray;

^ just some examples.

 

Hopefully this make some kind of sense.

 

When you've made the array, that contains all the data you want to use in some kind of logic way, you will want to loop through it and create the mysql query, for then to use it to update the database.

Link to comment
Share on other sites

Makes sense...do you know, does Word do anything funky with a table if you copy and paste?  Just want to know what I need to strip out.

 

Also, is there a way to have the end user define what order their info is in?  Maybe another table that has "their" order so that I can put it into the db in the right order.

 

 

Link to comment
Share on other sites

You could have some line before reading the actual data where you specify what order the data is in. By this I mean that you got a line where it say for example:

name    goals    chances

then you know the first column contains names, the second goals, the third chances etc

Link to comment
Share on other sites

Alright...I have most of this figured out...except for how to remove periods...WHEN THERE ISN'T A NUMBER IN FRONT OF OR AFTER IT.

 

For instance

 

32  Scheff, Brody  11  42  54  96  3.0 - 9   2.0 - 8 6 - 53 2  8  .  2 - 0 1  .  .

 

would keep the 2.0...but replace the period in 8  .  2 with a comma.

 

Any ideas?  Also, I need to remove the dashes too...and I HATE regex. 

 

Here's the code

 

<?php
if ($_POST['submit'] == 'submit')
{
$array = explode("\n", $_POST['stats']);


foreach($array as $value)
{
$step1 = str_replace(',', ' ', $value);
$step2 = preg_replace('/\s+/',',',$step1);

echo $step2 . "<br />";
}
}
?>

 

 

Link to comment
Share on other sites

remove dashes:

str = str_replace(chr(150), '-?', $value);    // endash
$str = str_replace(chr(151), '--', $value);   // emdash
$str = str_replace('?', '–', $value); // dash

here is a snippet i found which i think is what you need.

 

 

$string = "3.142";        // The source number, as a string
$a = explode(".",$string);            // Split the string, using the decimal point as separator
$int = $a[0];                        // The section before the decimal point
$dec = $a[1];                        // The section after the decimal point
$lengthofnum=strlen($dec);            // Get the num of characters after the decimal point
$divider="1";                        // This sets the divider at 1
$i=0;
while($i < ($lengthofnum)) {
$divider.="0";                    // Adds a zero to the divider for every char after the decimal point
$i++;
}
$divider=(int)$divider;                // Converts the divider (currently a string) to an integer
$total=$int+($dec/$divider);        // compiles the total back as a numeric value

Link to comment
Share on other sites

I think you are going to have a tough time of this since you are getting data from so many sources and you can't control heir format. But, it may work out for you. I do have some suggestions though:

 

Since the columns in the input data will not be consistent, you could rely upon the first row headers (if there are any) as another poster suggested. But, if there are no headers or there is no consistent format to them, you can submit the data and specify the columns after you have preprocessed the data. Basically, after they submit the form do your processing and present the data back to the user (yourself) and then select for each column which field in the database it will belong to.

 

So, it is basically a three step process. Step 1 is submitting the data via a form. Step two is processing the data into columns for verification and column selection, and step three is submitting that additional data for input into the database. You could also combine functionality of reading the column headers to preselect the fields in step 2. But, because the data will likely be very "dirty" you definitely want a step to validate the data before actually saving it. Otherwise cleanup will be an arduous task. And, if the data is processed correctly you can easily build in a process to allow you to manually fix it first.

 

EDIT: Doesn't the decimal in your example above represent a 0? If you removed those you would have the wrong number of columns in the data. And there is no need to explode the data before modifying the characters. Do that first - it is more efficient.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.