seany1234 Posted December 14, 2019 Share Posted December 14, 2019 (edited) Hi, So i have HTML table which i'm attempting to turn into executable SQL code which i can then insert into a SQL database table. I'm wondering if anyone has any good ideas for the best way to get this done? I need to extract the data from each of the <tr bgcolor="#EEEEEE">, scrape the text within each <td></td> and then format it in sql. so far i have removed the html not required and created an array by exploding "<tr bgcolor="#EEEEEE">", however, its creating an empty element in the first array index [0], to fix this i have tried $players = array_shift($players); however it removed everything from the array. next i was thinking i would need to remove all html within each array element and add a comma to the end, then i would need to compress it all back together again? would this be the best way to do this? source html <html> <head> <title><br/></title> <style type ="text/css"> body,td,th { font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 12px; } th { padding: 5px; text-align: left; } td { padding: 2px; border-style: solid; border-width: 1px } </style> </head> <body> <center><B><br/> </center></B> <p> <table bordercolor="#000000" width="90%" align="center"> <tr bgcolor="#EEEEEE"> <th>UID</th> <th>Name</th> <th>Nat</th> <th>Position</th> <th>Best Pos</th> <th>Age</th> <th>DoB</th> <th>Height</th> <th>Weight</th> <th>Expires</th> <th>Wage</th> <th>Value</th> <th>Aer</th> <th>Cmd</th> <th>Com</th> <th>Ecc</th> <th>Han</th> <th>Kic</th> <th>1v1</th> <th>Pun</th> <th>Ref</th> <th>TRO</th> <th>Thr</th> <th>Cor</th> <th>Cro</th> <th>Dri</th> <th>Fin</th> <th>Fir</th> <th>Fre</th> <th>Hea</th> <th>Lon</th> <th>L Th</th> <th>Mar</th> <th>Pas</th> <th>Pen</th> <th>Tck</th> <th>Tec</th> <th>Agg</th> <th>Ant</th> <th>Bra</th> <th>Cmp</th> <th>Cnt</th> <th>Dec</th> <th>Det</th> <th>Fla</th> <th>Ldr</th> <th>OtB</th> <th>Pos</th> <th>Tea</th> <th>Vis</th> <th>Wor</th> <th>Acc</th> <th>Agi</th> <th>Bal</th> <th>Jum</th> <th>Nat</th> <th>Pac</th> <th>Sta</th> <th>Str</th> </tr> <tr bgcolor="#EEEEEE"> <td>859596</td> <td>Tim Krul</td> <td>NED</td> <td>GK</td> <td>GK</td> <td>31</td> <td>3/4/1988 (31 years old)</td> <td>194 cm</td> <td>83 kg</td> <td>30/6/2022</td> <td>£25,000 p/w</td> <td>£12.75M</td> <td>15</td> <td>13</td> <td>14</td> <td>8</td> <td>13</td> <td>13</td> <td>15</td> <td>12</td> <td>16</td> <td>15</td> <td>12</td> <td>1</td> <td>2</td> <td>1</td> <td>3</td> <td>8</td> <td>4</td> <td>5</td> <td>3</td> <td>2</td> <td>2</td> <td>9</td> <td>4</td> <td>3</td> <td>8</td> <td>11</td> <td>12</td> <td>14</td> <td>13</td> <td>13</td> <td>12</td> <td>14</td> <td>2</td> <td>14</td> <td>2</td> <td>15</td> <td>10</td> <td>5</td> <td>13</td> <td>11</td> <td>13</td> <td>14</td> <td>17</td> <td>13</td> <td>8</td> <td>14</td> <td>10</td> </tr> <tr bgcolor="#EEEEEE"> <td>5204730</td> <td>Michael McGovern</td> <td>NIR</td> <td>GK</td> <td>GK</td> <td>34</td> <td>12/7/1984 (34 years old)</td> <td>188 cm</td> <td>86 kg</td> <td>30/6/2021</td> <td>£8,000 p/w</td> <td>£875K</td> <td>12</td> <td>11</td> <td>10</td> <td>2</td> <td>12</td> <td>12</td> <td>12</td> <td>11</td> <td>13</td> <td>8</td> <td>10</td> <td>5</td> <td>2</td> <td>1</td> <td>3</td> <td>9</td> <td>7</td> <td>6</td> <td>3</td> <td>3</td> <td>2</td> <td>7</td> <td>1</td> <td>2</td> <td>5</td> <td>7</td> <td>12</td> <td>14</td> <td>13</td> <td>13</td> <td>11</td> <td>17</td> <td>1</td> <td>12</td> <td>7</td> <td>12</td> <td>15</td> <td>10</td> <td>15</td> <td>10</td> <td>13</td> <td>12</td> <td>12</td> <td>12</td> <td>8</td> <td>15</td> <td>14</td> </tr> </table> <p> <p align="center"> </body> </html> code i've created so far: <?php // Defining the basic scraping function function scrape_between($data, $start, $end) { $data = stristr($data, $start); // Stripping all data from before $start $data = substr($data, strlen($start)); // Stripping $start $stop = stripos($data, $end); // Getting the position of the $end of the data to scrape $data = substr($data, 0, $stop); // Stripping all data from after and including the $end of the data to scrape return $data; // Returning the scraped data from the function } $myfile = fopen("source.html", "r") or die("Unable to open file!"); $page = fread($myfile,filesize("source.html")); $page = scrape_between($page, '</tr>', '</table>'); $players = explode('<tr bgcolor="#EEEEEE">', $page); echo '<pre>'; print_r($players); echo '</pre>'; fclose($myfile); ?> Edited December 14, 2019 by seany1234 Quote Link to comment https://forums.phpfreaks.com/topic/309683-creating-executable-sql-code-from-html-tables/ Share on other sites More sharing options...
Barand Posted December 14, 2019 Share Posted December 14, 2019 (edited) Try using domDocument or simple_html_dom. EG include("/path/to/simple_html_dom.php"); $html = file_get_html('my_source.html'); $rows = $html->find('tr[bgcolor="#EEEEEE"]'); $csvFile = fopen('mydata.csv', 'w'); foreach ($rows as $row) { $data = []; foreach ($row->children() as $itm) { $data[] = $itm->plaintext; } fputcsv($csvFile, $data); } fclose($csvFile); The csv file can then be loaded into a sql table using a LOAD DATA INFILE statement. Edited December 15, 2019 by Barand 1 Quote Link to comment https://forums.phpfreaks.com/topic/309683-creating-executable-sql-code-from-html-tables/#findComment-1572520 Share on other sites More sharing options...
Barand Posted December 15, 2019 Share Posted December 15, 2019 P.S. Some of the data is unsuitable for data storage purposes and needs to be reformatted before storing in a database table +------------+--------------------------------+----------------+----------------------------------+ | Column | WRONG | CORRECT | Comments | +------------+--------------------------------+----------------+----------------------------------+ | Age | 31 | Not required | DoB tells you the age | | DoB | 3/4/1988 (31 years old) | 1988-04-03 | Store in correct date format | | Height | 194 cm | 194 | Store unformatted numeric value | | Weight | 83 kg | 83 | Store unformatted numeric value | | Expires | 30/6/2022 | 2022-06-30 | Store in correct date format | | Wage | £25,000 p/w | 25000 | Store unformatted numeric value | | Value | £12.75M | 12750000 | Store unformatted numeric value | +------------+--------------------------------+----------------+----------------------------------+ Quote Link to comment https://forums.phpfreaks.com/topic/309683-creating-executable-sql-code-from-html-tables/#findComment-1572525 Share on other sites More sharing options...
seany1234 Posted December 15, 2019 Author Share Posted December 15, 2019 Thanks for your reply, I've never worked with CSV files in PHP, how would it be possible to format data, remove the first row? or would i still need to put the data into an array? Quote Link to comment https://forums.phpfreaks.com/topic/309683-creating-executable-sql-code-from-html-tables/#findComment-1572536 Share on other sites More sharing options...
Barand Posted December 15, 2019 Share Posted December 15, 2019 (edited) 26 minutes ago, seany1234 said: how would it be possible to format data You would do the required formatting before writing to the csv file. 26 minutes ago, seany1234 said: remove the first row? LOAD DATA INFILE command has a "IGNORE N rows" option for ignoring headings. If you don't want to use that LOAD DATA option (which is certainly the fastest and will load hundreds of records per second) then , with a PDO connection and a prepared statement you could change the loop to $stmt = $conn->prepare("INSERT INTO thetable (uid, ... , etc)" VALUES(?,?,?, ...., etc)); foreach ($rows as $row) { $data = []; foreach ($row->children() as $itm) { $val = $itm->plaintext; // do any required reformatting here $data[] = $val; } $stmt->execute($data); } Edited December 15, 2019 by Barand Quote Link to comment https://forums.phpfreaks.com/topic/309683-creating-executable-sql-code-from-html-tables/#findComment-1572537 Share on other sites More sharing options...
seany1234 Posted December 16, 2019 Author Share Posted December 16, 2019 Thanks for your reply, i think i will go with the second option, these html tables will only be 50 rows max, so speed isn't particularly required. i'm getting a little confused now as i haven't used ->children or ->plaintext before. how can i do the formatting on a specific data field? i was originally thinking i could do something like $val[1] = formatted text; but that wouldn't work. Quote Link to comment https://forums.phpfreaks.com/topic/309683-creating-executable-sql-code-from-html-tables/#findComment-1572540 Share on other sites More sharing options...
Barand Posted December 16, 2019 Share Posted December 16, 2019 (edited) Something along these lines... include("path/to/simple_html_dom.php"); $html = file_get_html('mysource.html'); $rows = $html->find('tr[bgcolor="#EEEEEE"]'); $stmt = $conn->prepare("INSERT INTO thetable (uid, ... , etc)" VALUES(?,?,?, ...., etc)); foreach ($rows as $row) { if (strpos($row->children(0), '<th>')!==false) continue; $data = []; foreach ($row->children() as $k => $itm) { $val = $itm->plaintext; switch ($k) { case 6: // DoB case 9: // Expires $data[] = reformatDate($val); break; case 7: // Height case 8: // Weight case 10: // Wage $data[] = numOnly($val); break; case 11: // Value $data[] = reformatValue($val); break; default: $data[] = $val; } } $stmt->execute($data); } function reformatDate($str) { list($dob) = explode(' ', $str); $res = (DateTime::createFromFormat('d/m/Y', $dob))->format('Y-m-d') ; return $res; } function numOnly($str) { $res = ''; for($i=0, $k=strlen($str); $i<$k; $i++) { $res .= ctype_digit($str[$i]) ? $str[$i] : ''; } return $res; } function reformatValue($str) { // // This is left as an exercise for the reader! // return $str; // leave unchanged for now } Edited December 16, 2019 by Barand Quote Link to comment https://forums.phpfreaks.com/topic/309683-creating-executable-sql-code-from-html-tables/#findComment-1572541 Share on other sites More sharing options...
seany1234 Posted December 17, 2019 Author Share Posted December 17, 2019 Thanks for your reply, I'm thinking looking at the code you have written this seems above my head, so i think i would be best to hire someone to do this for me. many thanks for your help. Sean Quote Link to comment https://forums.phpfreaks.com/topic/309683-creating-executable-sql-code-from-html-tables/#findComment-1572567 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.