unistake Posted December 13, 2015 Share Posted December 13, 2015 Hi guys, I am trying to handle a html form that users are copying and pasting to a form on my website that contains their schedule for a particular day. Generally the format is the same so I am trying to handle the data with a preg_match however sometimes white spaces or blank fields do not upload to my mysql db correctly. An example of the html code from the html table is below: <tr class="row-border-top marked-row "> <td>13 Dec 15, Sun</td> // always the same format <td>OFF(Z)</td> // can vary to contain [a-zA-Z], numericals, brackets, forward slash and possible whitespaces (no guarantee of how many if they are present) <td>DOZ</td> // only three letter characters, empty, or white spaces <td>: </td> // generally contains a time in format hh:mm however could be a) blank, b) only contain ':', c) contain just blank spaces <td>: </td> // generally contains a time in format hh:mm however could be a) blank, b) only contain ':', c) contain just blank spaces <td></td> // only three letter characters or blank </tr> <tr class="row-border-top marked-row "> <td>12 Dec 15, Sat</td> // another row example <td>B/HOL(Z)</td> <td>DOZ</td> <td>: </td> <td>: </td> <td></td> </tr> <tr class="marked-row "> // another row example <td>8 Dec 15, Tue</td> <td>3329</td> <td>RZS</td> <td>17:20</td> <td>21:55</td> <td>DOZ</td> </tr> so far I have the code as follows: else { if (preg_match("#([^,]+), \w{3}[\s]+(.+?)[\s]+(\w{3})[\s]+(\d\d:\d\d) Z[\s]+(\d\d:\d\d) Z[\s]+(.*)#", $line, $match)) { $code = $_SESSION['Code']; $date = date('d/m/Y', strtotime($match[1])); $sectordate = date('Y/m/d', strtotime($match[1])); $duty = $match[2]; $dep = $match[3]; $begin = $match[4]; $end = $match[5]; $arr = $match[6]; $output .= "<tr><td>{$date}</td><td>{$duty}</td><td>{$dep}</td><td>{$begin}</td><td>{$end}</td><td>{$arr}</td></tr>\n"; $addedon = date('Y-m-d H:i:s'); $today = date('Y-m-d'); $sql= "INSERT INTO table (RostersID,Code,Ffname,Fuid,SectorDate,Duty,Dep,BeginTime,EndTime,Arr,AddedOn,Random) VALUES('','$code','{$_SESSION['FULLNAME']}','{$_SESSION['FBID']}','$sectordate','$duty','$dep','$begin','$end','$arr','$addedon','$random')"; $result = mysqli_query($cxn,$sql) or die ("Can not upload the new roster!"); $sql = "SELECT * FROM table WHERE Code = '$code' AND Random !='$random'"; $result = mysqli_query($cxn,$sql) or die ("Can not find the old roster duties to delete then update with the new."); while($row=mysqli_fetch_assoc($result)) { $sql1 = "DELETE FROM table WHERE SectorDate='$sectordate' AND Code='$code' AND Random !='$random'"; $result1 = mysqli_query($cxn,$sql1) or die ("Can't execute!"); } } } Quote Link to comment Share on other sites More sharing options...
Ch0cu3r Posted December 13, 2015 Share Posted December 13, 2015 Rather than use regex to try to and parse the HTML table to find your values, instead use DOMXpath to get the data inside the table $html = <<<HTML <table> <tr class="row-border-top marked-row"> <td>13 Dec 15, Sun</td> <td>OFF(Z)</td> <td>DOZ</td> <td>: </td> <td>: </td> <td></td> </tr> <tr class="row-border-top marked-row"> <td>12 Dec 15, Sat</td> <td>B/HOL(Z)</td> <td>DOZ</td> <td>: </td> <td>: </td> <td></td> </tr> <tr class="marked-row"> <td>8 Dec 15, Tue</td> <td>3329</td> <td>RZS</td> <td>17:20</td> <td>21:55</td> <td>DOZ</td> </tr> </table> HTML; $doc = new DOMDocument(); $doc->loadHTML($html); $xpath = new DOMXPath($doc); // use xpath to find all <tr> tags that has the class "marked-row" $rows = $xpath->query('//tr[contains(@class,"marked-row")]'); // fill an array with 6 empty values $defaults = array_fill(0, 6, ''); // loop over each row found by xpath foreach($rows as $row) { // explode on each newline character in nodeValue (note the HTML tags are stripped out, leaving just the text inside the td tags) // and merge with the $defaults array $values = array_replace($defaults, explode("\n", trim($row->nodeValue))); // remove any whitespace before or after each value $values = array_map('trim', $values); // use dateTime to parse the date $dt = new DateTime($values[0]); // convert date to d/m/Y format $date = $dt->format('d/m/Y'); // using the same dateTime object to convert the date to Y/m/d format $sectordate = $dt->format('Y/m/d'); // set rest of values $duty = $values[1]; $dep = $values[2]; $begin = $values[3]; $end = $values[4]; $arr = $values[5]; // insert values into database here } 2 Quote Link to comment Share on other sites More sharing options...
unistake Posted December 15, 2015 Author Share Posted December 15, 2015 Thanks Ch0cu3r, So question is now how can I get the html code from a webpage to create the variable $html you created above? Quote Link to comment Share on other sites More sharing options...
Ch0cu3r Posted December 15, 2015 Share Posted December 15, 2015 You would use $dom->loadHTMLFile() in place of $dom->loadHTML() 1 Quote Link to comment Share on other sites More sharing options...
unistake Posted December 15, 2015 Author Share Posted December 15, 2015 (edited) Also I forgot to mention there are another two html tables in the same page I want to parse. How can I use the code above but select only the table with table id="r"? I tried many things such as: $doc = new DOMDocument(); $doc->loadHTML($html); $xpath = new DOMXPath($doc); // use xpath to find all <tr> tags that has the class "marked-row" $rows = $xpath->query('//table[@id="r"]//*tr[contains(@class,"marked-row")]'); // fill an array with 6 empty values $defaults = array_fill(0, 6, ''); // loop over each row found by xpath foreach($rows as $row) { } thanks! Edited December 15, 2015 by unistake Quote Link to comment Share on other sites More sharing options...
unistake Posted December 15, 2015 Author Share Posted December 15, 2015 An example of the full html page I am trying to parse is here: http://pastebin.com/uLxwaeXM Note, the html layout is an external website! Quote Link to comment Share on other sites More sharing options...
Solution Ch0cu3r Posted December 15, 2015 Solution Share Posted December 15, 2015 (edited) First you need to use the following xpath query. $rows = $xpath->query('//table[@id="r"]/tbody[@class="table-text"]/tr'); As your HTML table is not syntax formatted then the following lines will no longer work. // explode on each newline character in nodeValue (note the HTML tags are stripped out, leaving just the text inside the td tags) // and merge with the $defaults array $values = array_replace($defaults, explode("\n", trim($row->nodeValue))); // remove any whitespace before or after each value $values = array_map('trim', $values); Instead replace them with the following // get the values for the child "td" tags in the row $values = getChildNodeValues($row->childNodes, 'td'); And add the following function before the foreach loop function getChildNodeValues(&$nodeList, $nodeName) { $values = array(); foreach($nodeList as $node) { if($node->nodeName == $nodeName) $values[] = trim($node->nodeValue); } return $values; } Edited December 15, 2015 by Ch0cu3r 1 Quote Link to comment Share on other sites More sharing options...
unistake Posted December 15, 2015 Author Share Posted December 15, 2015 Cheers Ch0cu3r! Works amazingly!! The only thing left is in one case the table might contain one row in the middle as below. The script so far stops once it hits this row. Is there anyway to skip this row and allow the script to run until the end of the table for example? <tr class="row-border-top marked-row "> <td>20 Dec 15, Sun</td> <td>OFF(Z)</td><td>CAD</td> <td>00:00 Z</td> <td>21:00 Z</td> <td></td> </tr> <tr> <td colspan="6" class="break-row">**THE REST OF THIS IS NOT VALID**</td> </tr> <tr class="row-border-top simple-row "> <td>21 Dec 15, Mon</td> <td>OFF(Z)</td> <td>CAD</td> <td>00:00 Z</td> <td>21:00 Z</td> <td></td> </tr> Thanks Quote Link to comment Share on other sites More sharing options...
Ch0cu3r Posted December 15, 2015 Share Posted December 15, 2015 Try adjusting the xpath query to be $rows = $xpath->query('//table[@id="r"]/tbody[@class="table-text"]/tr[not(descendant::td[contains(@class, "break-row")])]'); It should ignore any table row that contains a <td> tag with the class "break-row" 1 Quote Link to comment Share on other sites More sharing options...
unistake Posted December 15, 2015 Author Share Posted December 15, 2015 Absolute genius!! Thanks man! Quote Link to comment Share on other sites More sharing options...
unistake Posted December 15, 2015 Author Share Posted December 15, 2015 (edited) Final thing, I am trying to improve my UI by creating a <iframe> with a floating button over the iframe that the user can click. This will then parse all the html code from the website where the html table is contained and will send to my website to include in the mysql db.. Ive tried to find a good tutorial but no luck. Edited December 15, 2015 by unistake Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.