Jump to content

handling html form fields with preg_match


Go to solution Solved by Ch0cu3r,

Recommended Posts

Hi guys,

 

I am trying to handle a html form that users are copying and pasting to a form on my website that contains their schedule for a particular day.

 

Generally the format is the same so I am trying to handle the data with a preg_match however sometimes white spaces or blank fields do not upload to my mysql db correctly.

 

An example of the html code from the html table is below:

<tr class="row-border-top marked-row ">
  <td>13 Dec 15, Sun</td>        // always the same format  
  <td>OFF(Z)</td>		 // can vary to contain [a-zA-Z], numericals, brackets, forward slash and possible whitespaces (no guarantee of how many if they are present) 
  <td>DOZ</td>			 // only three letter characters, empty, or white spaces
  <td>:    </td>	         // generally contains a time in format hh:mm however could be a) blank, b) only contain ':', c) contain just blank spaces 
  <td>:    </td>	         // generally contains a time in format hh:mm however could be a) blank, b) only contain ':', c) contain just blank spaces
  <td></td>			 // only three letter characters or blank
</tr>
<tr class="row-border-top marked-row ">  
  <td>12 Dec 15, Sat</td>        // another row example
  <td>B/HOL(Z)</td>
  <td>DOZ</td>
  <td>:    </td>
  <td>:    </td>
  <td></td>
</tr>
<tr class="marked-row ">         // another row example
  <td>8 Dec 15, Tue</td>
  <td>3329</td>
  <td>RZS</td>	
  <td>17:20</td>
  <td>21:55</td>
  <td>DOZ</td>
</tr>

so far I have the code as follows: 

	else {
		if (preg_match("#([^,]+), \w{3}[\s]+(.+?)[\s]+(\w{3})[\s]+(\d\d:\d\d) Z[\s]+(\d\d:\d\d) Z[\s]+(.*)#", $line, $match))	
		{
			$code = $_SESSION['Code'];
			$date = date('d/m/Y', strtotime($match[1]));
			$sectordate = date('Y/m/d', strtotime($match[1]));
			$duty = $match[2];
			$dep = $match[3];
			$begin = $match[4];
			$end = $match[5];
			$arr = $match[6];
			$output .= "<tr><td>{$date}</td><td>{$duty}</td><td>{$dep}</td><td>{$begin}</td><td>{$end}</td><td>{$arr}</td></tr>\n";
			$addedon = date('Y-m-d H:i:s');
			$today = date('Y-m-d');

			$sql= "INSERT INTO table (RostersID,Code,Ffname,Fuid,SectorDate,Duty,Dep,BeginTime,EndTime,Arr,AddedOn,Random) VALUES('','$code','{$_SESSION['FULLNAME']}','{$_SESSION['FBID']}','$sectordate','$duty','$dep','$begin','$end','$arr','$addedon','$random')";
			$result = mysqli_query($cxn,$sql)
				or die ("Can not upload the new roster!");

			$sql = "SELECT * FROM table WHERE Code = '$code' AND Random !='$random'";
			$result = mysqli_query($cxn,$sql)
				or die ("Can not find the old roster duties to delete then update with the new.");
			while($row=mysqli_fetch_assoc($result)) {
					$sql1 = "DELETE FROM table WHERE SectorDate='$sectordate' AND Code='$code' AND Random !='$random'";
					$result1 = mysqli_query($cxn,$sql1)
						or die ("Can't execute!");	
			}
		}
	}

Rather than use regex to try to and parse the HTML table to find your values, instead use DOMXpath to get the data inside the table

$html = <<<HTML
<table>
<tr class="row-border-top marked-row">
  <td>13 Dec 15, Sun</td>
  <td>OFF(Z)</td>
  <td>DOZ</td>
  <td>:    </td>
  <td>:    </td>
  <td></td>
</tr>
<tr class="row-border-top marked-row">  
  <td>12 Dec 15, Sat</td>
  <td>B/HOL(Z)</td>
  <td>DOZ</td>
  <td>:    </td>
  <td>:    </td>
  <td></td>
</tr>
<tr class="marked-row">
  <td>8 Dec 15, Tue</td>
  <td>3329</td>
  <td>RZS</td>  
  <td>17:20</td>
  <td>21:55</td>
  <td>DOZ</td>
</tr>
</table>
HTML;


$doc = new DOMDocument();
$doc->loadHTML($html);

$xpath = new DOMXPath($doc);
// use xpath to find all <tr> tags that has the class "marked-row"
$rows = $xpath->query('//tr[contains(@class,"marked-row")]');

// fill an array with 6 empty values
$defaults = array_fill(0, 6, '');

// loop over each row found by xpath
foreach($rows as $row)
{
    // explode on each newline character in nodeValue (note the HTML tags are stripped out, leaving just the text inside the td tags)
    // and merge with the $defaults array
    $values = array_replace($defaults, explode("\n", trim($row->nodeValue)));
    // remove any whitespace before or after each value
    $values = array_map('trim', $values);

    // use dateTime to parse the date
    $dt = new DateTime($values[0]);

    // convert date to d/m/Y format
    $date = $dt->format('d/m/Y');

    // using the same dateTime object to convert the date to Y/m/d format
    $sectordate = $dt->format('Y/m/d');

    // set rest of values 
    $duty = $values[1];
    $dep = $values[2];
    $begin = $values[3];
    $end = $values[4];
    $arr = $values[5];

    // insert values into database here
}
  • Like 2

Also I forgot to mention there are another two html tables in the same page I want to parse. How can I use the code above but select only the table with table id="r"?

 

I tried many things such as:

$doc = new DOMDocument();
$doc->loadHTML($html);

$xpath = new DOMXPath($doc);

// use xpath to find all <tr> tags that has the class "marked-row"
$rows = $xpath->query('//table[@id="r"]//*tr[contains(@class,"marked-row")]');

// fill an array with 6 empty values
$defaults = array_fill(0, 6, '');

// loop over each row found by xpath
foreach($rows as $row)
{
}

thanks!

Edited by unistake
  • Solution

First you need to use the following xpath query.

$rows = $xpath->query('//table[@id="r"]/tbody[@class="table-text"]/tr');

As your HTML table is not syntax formatted then the following lines will no longer work.

    // explode on each newline character in nodeValue (note the HTML tags are stripped out, leaving just the text inside the td tags)
    // and merge with the $defaults array
    $values = array_replace($defaults, explode("\n", trim($row->nodeValue)));
    // remove any whitespace before or after each value
    $values = array_map('trim', $values); 

Instead replace them with the following

    // get the values for the child "td" tags in the row
    $values = getChildNodeValues($row->childNodes, 'td');

And add the following function before the foreach loop

function getChildNodeValues(&$nodeList, $nodeName)
{
    $values = array();
    foreach($nodeList as $node)
    {
        if($node->nodeName == $nodeName)
            $values[] = trim($node->nodeValue);
    }

    return $values;
}
Edited by Ch0cu3r
  • Like 1

Cheers Ch0cu3r!

 

Works amazingly!!

 

The only thing left is in one case the table might contain one row in the middle as below. The script so far stops once it hits this row. Is there anyway to skip this row and allow the script to run until the end of the table for example?

<tr class="row-border-top marked-row ">
	<td>20 Dec 15, Sun</td>
	<td>OFF(Z)</td><td>CAD</td>
	<td>00:00 Z</td>
	<td>21:00 Z</td>
	<td></td>
</tr>
<tr>
	<td colspan="6" class="break-row">**THE REST OF THIS IS NOT VALID**</td>
</tr>
<tr class="row-border-top simple-row ">
	<td>21 Dec 15, Mon</td>
	<td>OFF(Z)</td>
	<td>CAD</td>
	<td>00:00 Z</td>
	<td>21:00 Z</td>
	<td></td>
</tr>

Thanks

Try adjusting the xpath query to be

$rows = $xpath->query('//table[@id="r"]/tbody[@class="table-text"]/tr[not(descendant::td[contains(@class, "break-row")])]');

It should ignore any table row that contains a <td> tag with the class "break-row"

  • Like 1

Final thing, 

 

I am trying to improve my UI by creating a <iframe> with a floating button over the iframe that the user can click. This will then parse all the html code from the website where the html table is contained and will send to my website to include in the mysql db..

 

Ive tried to find a good tutorial but no luck.

Edited by unistake
This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.