bundyxc Posted August 6, 2009 Share Posted August 6, 2009 I have 1,000+ rows of data that are all in the exact same format: <tr> <td>lastName, firstName</td> <td>email</td> <td>var1</td> <td>var2</td> </tr> I need to be able to extract the data from that, so that I have five variables: $lastName $firstName $email $var1 $var2 How would I go about extracting this data? Is this a regex problem, or something that could just be solved with string functions? Thanks for your time. Quote Link to comment https://forums.phpfreaks.com/topic/169157-solved-extracting-data-from-html-table/ Share on other sites More sharing options...
TeNDoLLA Posted August 6, 2009 Share Posted August 6, 2009 You should probably use php's DOMDocument to extract the data from the HTML since you have so much data instead of regexps. http://us3.php.net/manual/en/class.domdocument.php Quote Link to comment https://forums.phpfreaks.com/topic/169157-solved-extracting-data-from-html-table/#findComment-892554 Share on other sites More sharing options...
bundyxc Posted August 6, 2009 Author Share Posted August 6, 2009 Time isn't a problem, so whatever's easier is better. I took a look at the page you linked to, and I don't get a thing. haha. How would you do it with regular expressions? Or I mean, if you have an example o the DOM function, it would be appreciated. Quote Link to comment https://forums.phpfreaks.com/topic/169157-solved-extracting-data-from-html-table/#findComment-892556 Share on other sites More sharing options...
slapdashwebdesigner Posted August 6, 2009 Share Posted August 6, 2009 you might try something like this. <?php $data=" <table> <tr> <td>lastName, firstName</td> <td>email</td> <td>var1</td> <td>var2</td> </tr> </table>"; $patterns[0] = '/<table>/'; $patterns[1] = '/</table>/'; $patterns[2] = '/</tr>/'; $patterns[3] = '/</td>/'; $replacements[0] = ''; $replacements[1] = ''; $replacements[2] = ''; $replacements[3] = ''; $data = preg_replace($patterns, $replacements, $data); $rows = explode('<tr>', $data); foreach($rows as $row){ list($lastName, $firstName, $email, $var1, $var2) = explode("<td>", $row); } ?> Quote Link to comment https://forums.phpfreaks.com/topic/169157-solved-extracting-data-from-html-table/#findComment-892557 Share on other sites More sharing options...
slapdashwebdesigner Posted August 6, 2009 Share Posted August 6, 2009 just a small correction i forgot and explanation. Step 1 = set $data to your table information Step 2 = strip <table>, </table>, </tr>, and </td> out of the $data string Step 3 = creates an array with each item as each row of the table Step 4 = loops through array separating each <td> element and creating a variable with its contents Step 5 = split the first <td> again because both names are held in the first <td> then <?php //Step 1 $data=" <table> <tr> <td>lastName, firstName</td> <td>email</td> <td>var1</td> <td>var2</td> </tr> </table>"; //Step 2 $patterns[0] = '/<table>/'; $patterns[1] = '/</table>/'; $patterns[2] = '/</tr>/'; $patterns[3] = '/</td>/'; $replacements[0] = ''; $replacements[1] = ''; $replacements[2] = ''; $replacements[3] = ''; $data = preg_replace($patterns, $replacements, $data); //step 3 $rows = explode('<tr>', $data); //step 4 foreach($rows as $row){ list($Name, $email, $var1, $var2) = explode("<td>", $row); //step 5 list($lastName, $firstName) = explode(",", $Name); } ?> Quote Link to comment https://forums.phpfreaks.com/topic/169157-solved-extracting-data-from-html-table/#findComment-892575 Share on other sites More sharing options...
Psycho Posted August 7, 2009 Share Posted August 7, 2009 Hmm, well since you have thousands of records I'm assuming you need put put the results into an array as you can't store multiple names into a single variable (i.e. $lastName). No offense, but I see a couple problems with slapdashwebdesigner's code. For example, the regex expressions will fail since the forward slashes are not escaped. But, more importantly, the code assumes that ALL the text on the page is in fact part of the data. After all the table tags are stripped you would much of the structure. The following code is more verbose, but has more logic in it. For example, you can set it to only look for data in a specific table - or leave as is and it will process all tables, but only the tables. The script will process an entire 'page' and put the results into a multidimensional array. Each element is a different record. See an example of the output at the end. Just add more options/conditions to the switch() as needed. <?php //Read the file as an array $html = file('test.htm'); //Output for the resuls $results = array(); //Vars for tracking the data $inTable = false; $inRecord = false; $recordIdx = 0; $dataIdx = 0; foreach($html as $line) { //echo "1"; //Determine if inside of table if (!$inTable) { //If looking for a SPECIFIC table, add add'l verification //for example, you can check table name if (strpos($line, '<table')!==false) { $inTable = true; } } if ($inTable) { //Determine if in a new row/record if (!$inRecord && strpos($line, '<tr')!==false) { $inRecord = true; } //Look for a data line if ($inRecord && strpos($line, '<td')!==false) { preg_match('/<td>(.*)<\/td>/', $line, $match); $data = trim($match[1]); switch($dataIdx) { case 0: //Last, First names $results[$recordIdx]['lastName'] = trim(substr($data, 0, strpos($data, ','))); $results[$recordIdx]['firstName'] = trim(substr($data, strpos($data, ',')+1)); break; case 1: //email $results[$recordIdx]['email'] = $data; break; case 2: //var1 $results[$recordIdx]['var1'] = $data; break; case 3: //var2 $results[$recordIdx]['var2'] = $data; break; } $dataIdx++; } //Determine if end of row/record if ($inRecord && strpos($line, '</tr')!==false) { $inRecord = false; $recordIdx++; $dataIdx = 0; } } //Determine if end of row/record if ($inTable && strpos($line, '</table')!==false) { $inTable = false; } } echo "<pre>"; print_r($results); echo "</pre>"; ?> Example output Array ( [0] => Array ( [lastName] => Smith [firstName] => Bob [email] => bob@smith.com [var1] => male [var2] => 32 ) [1] => Array ( [lastName] => jackson [firstName] => Michael [email] => michael@damato.net [var1] => pedo@death.com [var2] => 50 ) [2] => Array ( [lastName] => Hayak [firstName] => Selma [email] => hottie@latin.com [var1] => female [var2] => 38 ) [3] => Array ( [lastName] => Moore [firstName] => Demi [email] => demi@something.com [var1] => female [var2] => 46 ) ) Quote Link to comment https://forums.phpfreaks.com/topic/169157-solved-extracting-data-from-html-table/#findComment-892586 Share on other sites More sharing options...
Mardoxx Posted August 7, 2009 Share Posted August 7, 2009 [var1] => pedo@death.com HAHAHAHAHAHAHAHHAHAHA but yeah, mjdamato, p usefull code there might use it myself on something Quote Link to comment https://forums.phpfreaks.com/topic/169157-solved-extracting-data-from-html-table/#findComment-893004 Share on other sites More sharing options...
watsmyname Posted August 7, 2009 Share Posted August 7, 2009 This can be achieved in a very simple steps, download the class from http://simplehtmldom.sourceforge.net/ and go thru examples there. Quote Link to comment https://forums.phpfreaks.com/topic/169157-solved-extracting-data-from-html-table/#findComment-893011 Share on other sites More sharing options...
bundyxc Posted August 7, 2009 Author Share Posted August 7, 2009 Wow, the level of logic in there is incredible. Maybe a bit over my head. Thanks for the help though mjdamato. I haven't tested the code, as I've found a solution (OutWit Hub Addon for Firefox), but I'll definitely use this in the future. Thanks so much for your help. Quote Link to comment https://forums.phpfreaks.com/topic/169157-solved-extracting-data-from-html-table/#findComment-893088 Share on other sites More sharing options...
TeNDoLLA Posted August 7, 2009 Share Posted August 7, 2009 Here is also the example how to do it with DOMDocument. <?php $html = '<html> <head> <title> </title> </head> <body> <table name="data"> <tr> <td>LastName, FirstName</td> <td>email</td> <td>var1</td> <td>var2</td> </tr> <tr> <td>LastName_2, FirstName_2</td> <td>email_2</td> <td>var1_2</td> <td>var2_2</td> </tr> </table> </body> </html>'; // Create DOMDocument $dom = new DOMDocument(); // Load html string $dom->loadHTML($html); // Get tables from html $tables = $dom->getElementsByTagName('table'); // Get rows from tables $rows = $tables->item(0)->getElementsByTagName('tr'); // Loop over each row foreach ($rows as $row) { // Get each column by tag name $cols = $row->getElementsByTagName('td'); // Echo values (here you can assign them in array for example) echo $cols->item(0)->nodeValue.'<br />'; echo $cols->item(1)->nodeValue.'<br />'; echo $cols->item(2)->nodeValue.'<br />'; echo $cols->item(3)->nodeValue; echo '<hr />'; } Quote Link to comment https://forums.phpfreaks.com/topic/169157-solved-extracting-data-from-html-table/#findComment-893326 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.