Jump to content

preg_match_all everything between <tr> and </tr> tags


funstein

Recommended Posts

I have <tr> tags that contain extra attributes, like <tr style=blabla>, and </tr> tags. I want to make PHP grab the data in between. Please check the example.

 

The Data :

<tr style="font-weight: bold; background-color: #aaa;">
<td>School</td><td>Position</td><td>Name</td><td>Surname</td><td>Delegation</td><td>Commitee</td>
</tr>
<tr style="font-weight: bold; background-color: #aaa;">
<td>School1</td><td>Position1</td><td>Name1</td><td>Surname1</td><td>Delegation1</td><td>Commitee1</td>
</tr>

 

It should return me with these :

 

$array[0] will be <td>School</td><td>Position</td><td>Name</td><td>Surname</td><td>Delegation</td><td>Commitee</td>
$array[1] will be <td>School1</td><td>Position1</td><td>Name1</td><td>Surname1</td><td>Delegation1</td><td>Commitee1</td>

Link to comment
Share on other sites

Did you want to support nested <tr>'s as well?

 

<table>
<tr>
	<td>
		<table>
			<tr>
				<td></td>
			</tr>
		</table>
	</td>
</tr>
</table>

 

If not, you probably want something like this. Keep in mind, there will be a lot of backtracking, as you have to use a lazy quantifier which has to verify the next part of the expression can't be matches at each character.

 


%<tr[^>]++>(.*?)</tr>%s

Options: dot matches newline (s)

Match the characters “<tr” literally «<tr»
Match any character that is NOT a “>” «[^>]++»
   Between one and unlimited times, as many times as possible, without giving back (possessive) «++»
Match the character “>” literally «>»
Match the regular expression below and capture its match into backreference number 1 «(.*?)»
   Match any single character «.*?»
      Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the characters “</tr>” literally «</tr>»

Link to comment
Share on other sites

A working one? Really? My expression works fine. It's not my fault you don't know how to use the preg_match_all function. Whether it's ignorance or laziness, perhaps you should check the manual - or even copy the error into a search engine.

 

I handed you the solution to the harder part of the problem. I'm going to leave the rest up to you.

Link to comment
Share on other sites

Seriously xyph you should know better than give him a preg solution to a problem that should be solved with DOM.

 

Mate someone should teach you how to use the marvelous DOM parser included in PHP:

 

function innerHTML($node, $escape = false) {
   $innerHTML = '';

   $children = $node->childNodes;
   foreach ($children as $child) {
      $dom = new DOMDocument();
      $dom->appendChild($dom->importNode($child, true));
      $innerHTML .= ($escape ? htmlspecialchars($dom->saveHTML()) : $dom->saveHTML());
   }

   return trim($innerHTML) . "\r\n\r\n";
}

$dom = new DOMDocument();
@$dom->loadHTML($html);                             // Put your HTML in this variable first!

$xpath = new DOMXPath($dom);
$trs = $xpath->query('//tr');

$rows = array();

foreach($trs as $tr)
   $rows[] = innerHTML($tr, true);

print_r($rows);

Link to comment
Share on other sites

I seriously sound like a n00b here xyph, but what I meant by it doesn't work was actually it returns associative arrays. I have no idea about why that is happening, all I know is that it should be returning an array that has the first match as $array[0] and the second one as $array[1] and it doesn't. And silkfire, what does that do, and how do I run a regex on that?

Link to comment
Share on other sites

And silkfire, what does that do, and how do I run a regex on that?

 

You don't. That's the point. You are attempting to parse data out of an HTML element, something that regular expressions are not really suited to. The solution shown by silkfire loads the html into a DOM object which are built specifically for handling html.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.