preg_match_all everything between <tr> and </tr> tags

funstein · October 18, 2011

I have <tr> tags that contain extra attributes, like <tr style=blabla>, and </tr> tags. I want to make PHP grab the data in between. Please check the example.

The Data :

<tr style="font-weight: bold; background-color: #aaa;">
<td>School</td><td>Position</td><td>Name</td><td>Surname</td><td>Delegation</td><td>Commitee</td>
</tr>
<tr style="font-weight: bold; background-color: #aaa;">
<td>School1</td><td>Position1</td><td>Name1</td><td>Surname1</td><td>Delegation1</td><td>Commitee1</td>
</tr>

It should return me with these :

$array[0] will be <td>School</td><td>Position</td><td>Name</td><td>Surname</td><td>Delegation</td><td>Commitee</td>
$array[1] will be <td>School1</td><td>Position1</td><td>Name1</td><td>Surname1</td><td>Delegation1</td><td>Commitee1</td>

xyph · October 18, 2011

Did you want to support nested <tr>'s as well?

<table>
<tr>
	<td>
		<table>
			<tr>
				<td></td>
			</tr>
		</table>
	</td>
</tr>
</table>

If not, you probably want something like this. Keep in mind, there will be a lot of backtracking, as you have to use a lazy quantifier which has to verify the next part of the expression can't be matches at each character.

%<tr[^>]++>(.*?)</tr>%s

Options: dot matches newline (s)

Match the characters “<tr” literally «<tr»
Match any character that is NOT a “>” «[^>]++»
   Between one and unlimited times, as many times as possible, without giving back (possessive) «++»
Match the character “>” literally «>»
Match the regular expression below and capture its match into backreference number 1 «(.*?)»
   Match any single character «.*?»
      Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the characters “</tr>” literally «</tr>»

funstein · October 18, 2011

I tried, for some reason it wont work and it says Warning: preg_match_all() [function.preg-match-all]: Unknown modifier '<' in C:\blabla\test.php on line 7

And not all <tr> tags are on new lines.

Can you please send me another working one?

xyph · October 18, 2011

A working one? Really? My expression works fine. It's not my fault you don't know how to use the preg_match_all function. Whether it's ignorance or laziness, perhaps you should check the manual - or even copy the error into a search engine.

I handed you the solution to the harder part of the problem. I'm going to leave the rest up to you.

silkfire · October 18, 2011

Seriously xyph you should know better than give him a preg solution to a problem that should be solved with DOM.

Mate someone should teach you how to use the marvelous DOM parser included in PHP:

function innerHTML($node, $escape = false) {
   $innerHTML = '';

   $children = $node->childNodes;
   foreach ($children as $child) {
      $dom = new DOMDocument();
      $dom->appendChild($dom->importNode($child, true));
      $innerHTML .= ($escape ? htmlspecialchars($dom->saveHTML()) : $dom->saveHTML());
   }

   return trim($innerHTML) . "\r\n\r\n";
}

$dom = new DOMDocument();
@$dom->loadHTML($html);                             // Put your HTML in this variable first!

$xpath = new DOMXPath($dom);
$trs = $xpath->query('//tr');

$rows = array();

foreach($trs as $tr)
   $rows[] = innerHTML($tr, true);

print_r($rows);

funstein · October 18, 2011

I seriously sound like a n00b here xyph, but what I meant by it doesn't work was actually it returns associative arrays. I have no idea about why that is happening, all I know is that it should be returning an array that has the first match as $array[0] and the second one as $array[1] and it doesn't. And silkfire, what does that do, and how do I run a regex on that?

cags · October 19, 2011

Quote

And silkfire, what does that do, and how do I run a regex on that?

You don't. That's the point. You are attempting to parse data out of an HTML element, something that regular expressions are not really suited to. The solution shown by silkfire loads the html into a DOM object which are built specifically for handling html.

funstein · October 19, 2011

OK, I see. But can you point out how I can import the HTML and get an array of <tr> tags?

Thanks

silkfire · October 19, 2011

Depends where you're getting the HTML from. Is it you own site or are you scraping? What is the address?

funstein · October 19, 2011

I'm getting an HTML input from Google Spreadsheets Visualizations. It's a pretty simple one actually. I get the data using file_get_contents().

silkfire · October 19, 2011

You have the answer, mate...

funstein · October 19, 2011

Which is?

funstein · October 19, 2011

Oh, I really am sorry My browser didn't display the iframe scroll bar Thanks for everything!

Sign In

preg_match_all everything between <tr> and </tr> tags

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Important Information