[SOLVED] regex - preg_match'in html

DamienRoche · October 14, 2008

I have this simple preg_match:

<?php

$str = "</td> </tr> </table>";

preg_match("/<\/td> <\/tr> <\/table>/", $str, $match);
echo "Match:".$match[0]."<br> EVEN:".$match[1];
print_r($match);

?>

I have tried so many ways to match the above string. NOTHING is working for me. The best I can do is match a word- without tags- like 'table'. That's it. As soon as I try to match anything with tags it doesn't work.

I can't even match <table>:

- <\/table>

- \<\/table\>

- <\/table\>

- </table>

I have tried escaping the slashes in both the regex and the string using different combos. Still not getting this. I'm in the middle of reading "Mastering Regex" so hopefully something'll click sooner or later.

Any input is welcomed. Thanks.

effigy · October 14, 2008

You're matching HTML, so you're not going to see it if you print it--the browser is parsing it.

<pre>
<?php
$str = '</td> </tr> </table>';
preg_match('%</td>\s*</tr>\s*</table>%', $str, $matches);
foreach ($matches as &$match) {
	$match = htmlspecialchars($match);
}
print_r($matches);
?>
</pre>

nrg_alpha · October 14, 2008

You are trying to ouput $match[1], which does not exist, as you need a first set of capturing parenthesis to equal $match[1]; SO at this point, you should only have $match[0];

So you need to know what you want '$match[1]' to be (in the pattern that is), and encapsulate that section with parenthesis.

-or-

ditch the $match[1] aspect and you simply have $match[0] (which you already have... and along the lines of what Effigy said, you will need to right-click and view source to see what you matched, as this is HTML tags.. which is parsed by the browser obviously.

DamienRoche · October 14, 2008

I think I'm getting the just of it. Thanks.

I've finally matched the tags- I was using htmlentities to view results before but I still couldn't match using the escape sequence. The % delimiters seem to helped me there though.

Here is my other issue.

I am trying to match everything inbetween the table tags below:

$html = '

<table class="myclass" attrib1="blah" attrib2="blah">

<tr><td>random though</td></tr>
<tr><td>random though</td></tr>
<tr><td class="randclass">random though</td></tr>
<tr><td>random though</td></tr>

<tr><td>random though </td> </tr> </table>
';

I have been able to match the beginning table tag and the last, separately, but can't match what's inbetween using wildcards.

Any ideas?

here's my current code:

<?php

preg_match('%<table class="myclass".*>.*</td>\s*</tr>\s*</table>%',  $html, $results2);

Again, I've tried escaping different things, using different delimeters. I just can't suss it.

Thanks again.

nrg_alpha · October 14, 2008

Is this what you are looking for?

<?php
$str = <<<DATA
<table class="myclass" attrib1="blah" attrib2="blah">

<tr><td>random though</td></tr>
<tr><td>random though</td></tr>
<tr><td class="randclass">random though</td></tr>
<tr><td>random though</td></tr>

<tr><td>random though </td> </tr> </table>
DATA;
preg_match('#<table[^>]*>(.+?)</table>#is', $str, $match);
echo $match[1];

EDIT: Again, you'll have to right-click and view the source to see $match[1].

DamienRoche · October 14, 2008

Kind of. Is there a way to match that particular table based on the class?

like:

preg_match('#<table class="myclass"[^>]*>(.+?)</table>#is', $str, $match);

Thanks.

nrg_alpha · October 14, 2008

Kind of. Is there a way to match that particular table based on the class?

like:

preg_match('#<table class="myclass"[^>]*>(.+?)</table>#is', $str, $match);

Thanks.

Yep, that should do it (assuming that after the <table part, there is a space followed by class="myclass" after it).

EDIT, if you don't care about the class name, but just want to match tables that have a class of some sort, you could also use:

preg_match('#<table class="[^"]+"[^>]*>(.+?)</table>#is', $str, $match);

DamienRoche · October 14, 2008

Thank you very much!- it works perfectly now.

I have one last question. I have done this before but completely forgot how.

How do I match things using a wildcard and have them go into an array:

like:

<?php

$html = as above;

preg_match('#<table (.*?)="(.*)"[^>]*>(.+?)</table>#is', $str, $match);

?>

Notice the (.*) in the code above where 'class' and 'myclass' would be..how do I put that into an array? I have done this before but have completely forgot.

Thanks again for all your help!

effigy · October 14, 2008

The function automatically arrays the captures; observe print_r($match);.

DamienRoche · October 14, 2008

I'll give you an example of what I can't get to work.


$str = "1249|33182|33182|9333|3981847";

preg_match("#(.*)|(.*)|(.*)|(.*)|(.*)#is", $str, $matches);

print_r($matches);

number 1249:

$matches[0][1] (1)

$matches[0][2] (2)

$matches[0][3] (4)

$matches[0][4] (9)

Is this the best way to do it? Is there not a way to capture the complete (.*) in an a single instance in the array?

Thanks.

effigy · October 14, 2008

| is a metacharacter in regex. Use \| to match a literal pipe. explode would be better in this instance.

DamienRoche · October 14, 2008

I've finally got somewhere with all this stuff. Thank you very much for all your help. Thanks effigy for pointing out explode- that func has helped a lot for this. Thanks again everybody!!

nrg_alpha · October 14, 2008

This might be a good time to warn about the usage of wildcards.

When you have patterns with .* by example, this becomes inefficient (especially when it appears early on in a pattern that is being matched against a large chunk of data). Every time the regex engine encounters something like .* it ends up matching everything remaining in the string (issues of newlines aside, as by default, the dot wildcard does not match newlines). Then, if there is more stuff after the .* in the pattern, the regex engine has to start backtracking, relinquishing characters in reverse order (one character at a time), checking those relinquished characters against what is after .* to see if it matches.

Depending on the location of .* in the pattern, and depending on the size of data being matched against, wildcards can become a speed hinderance. At the very least, I would personally resort to using lazy modifiers .*? This way, the system is first lazy and passes control to check the character that comes after .*? in the pattern, and if not matched, match the current character to .*? , advance forward a character and the cycle starts over again (as opposed to matching everything and then having character backtracking and checking). It is most advisable to use negated character classes (if possible) instead. This makes things much more efficient and speedy. Example: class="[^"]+" instead of class=".*"

Regex patterns, while powerful, can hinder speed / performance if not written well. I would suggest Jeff Friedl's book Mastering Regular Expressions if you are really interested in learning how regex engines actually *think*. It will make you rethink how patterns are written, and can lead to some good speed / performance increases, as well as give you a much larger understanding of regex in general.

Cheers,

NRG

ghostdog74 · October 15, 2008

Any input is welcomed. Thanks.

i know its been solve, nevertheless for just this case, no regex needed.

$str = "</td> </tr> </table>";
if ( strpos($str,"</td>")!==FALSE &&
     strpos($str,"</tr>")!==FALSE &&
     strpos($str,"</table>")!==FALSE ){
    echo "yes";
}

Sign In

[SOLVED] regex - preg_match'in html

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Archived

Important Information