DamienRoche Posted October 14, 2008 Share Posted October 14, 2008 I have this simple preg_match: <?php $str = "</td> </tr> </table>"; preg_match("/<\/td> <\/tr> <\/table>/", $str, $match); echo "Match:".$match[0]."<br> EVEN:".$match[1]; print_r($match); ?> I have tried so many ways to match the above string. NOTHING is working for me. The best I can do is match a word- without tags- like 'table'. That's it. As soon as I try to match anything with tags it doesn't work. I can't even match <table>: - <\/table> - \<\/table\> - <\/table\> - </table> I have tried escaping the slashes in both the regex and the string using different combos. Still not getting this. I'm in the middle of reading "Mastering Regex" so hopefully something'll click sooner or later. Any input is welcomed. Thanks. Quote Link to comment https://forums.phpfreaks.com/topic/128368-solved-regex-preg_matchin-html/ Share on other sites More sharing options...
effigy Posted October 14, 2008 Share Posted October 14, 2008 You're matching HTML, so you're not going to see it if you print it--the browser is parsing it. <pre> <?php $str = '</td> </tr> </table>'; preg_match('%</td>\s*</tr>\s*</table>%', $str, $matches); foreach ($matches as &$match) { $match = htmlspecialchars($match); } print_r($matches); ?> </pre> Quote Link to comment https://forums.phpfreaks.com/topic/128368-solved-regex-preg_matchin-html/#findComment-664975 Share on other sites More sharing options...
nrg_alpha Posted October 14, 2008 Share Posted October 14, 2008 You are trying to ouput $match[1], which does not exist, as you need a first set of capturing parenthesis to equal $match[1]; SO at this point, you should only have $match[0]; So you need to know what you want '$match[1]' to be (in the pattern that is), and encapsulate that section with parenthesis. -or- ditch the $match[1] aspect and you simply have $match[0] (which you already have... and along the lines of what Effigy said, you will need to right-click and view source to see what you matched, as this is HTML tags.. which is parsed by the browser obviously. Quote Link to comment https://forums.phpfreaks.com/topic/128368-solved-regex-preg_matchin-html/#findComment-664976 Share on other sites More sharing options...
DamienRoche Posted October 14, 2008 Author Share Posted October 14, 2008 I think I'm getting the just of it. Thanks. I've finally matched the tags- I was using htmlentities to view results before but I still couldn't match using the escape sequence. The % delimiters seem to helped me there though. Here is my other issue. I am trying to match everything inbetween the table tags below: $html = ' <table class="myclass" attrib1="blah" attrib2="blah"> <tr><td>random though</td></tr> <tr><td>random though</td></tr> <tr><td class="randclass">random though</td></tr> <tr><td>random though</td></tr> <tr><td>random though </td> </tr> </table> '; I have been able to match the beginning table tag and the last, separately, but can't match what's inbetween using wildcards. Any ideas? here's my current code: <?php preg_match('%<table class="myclass".*>.*</td>\s*</tr>\s*</table>%', $html, $results2); Again, I've tried escaping different things, using different delimeters. I just can't suss it. Thanks again. Quote Link to comment https://forums.phpfreaks.com/topic/128368-solved-regex-preg_matchin-html/#findComment-665000 Share on other sites More sharing options...
nrg_alpha Posted October 14, 2008 Share Posted October 14, 2008 Is this what you are looking for? <?php $str = <<<DATA <table class="myclass" attrib1="blah" attrib2="blah"> <tr><td>random though</td></tr> <tr><td>random though</td></tr> <tr><td class="randclass">random though</td></tr> <tr><td>random though</td></tr> <tr><td>random though </td> </tr> </table> DATA; preg_match('#<table[^>]*>(.+?)</table>#is', $str, $match); echo $match[1]; EDIT: Again, you'll have to right-click and view the source to see $match[1]. Quote Link to comment https://forums.phpfreaks.com/topic/128368-solved-regex-preg_matchin-html/#findComment-665014 Share on other sites More sharing options...
DamienRoche Posted October 14, 2008 Author Share Posted October 14, 2008 Kind of. Is there a way to match that particular table based on the class? like: preg_match('#<table class="myclass"[^>]*>(.+?)</table>#is', $str, $match); Thanks. Quote Link to comment https://forums.phpfreaks.com/topic/128368-solved-regex-preg_matchin-html/#findComment-665027 Share on other sites More sharing options...
nrg_alpha Posted October 14, 2008 Share Posted October 14, 2008 Kind of. Is there a way to match that particular table based on the class? like: preg_match('#<table class="myclass"[^>]*>(.+?)</table>#is', $str, $match); Thanks. Yep, that should do it (assuming that after the <table part, there is a space followed by class="myclass" after it). EDIT, if you don't care about the class name, but just want to match tables that have a class of some sort, you could also use: preg_match('#<table class="[^"]+"[^>]*>(.+?)</table>#is', $str, $match); Quote Link to comment https://forums.phpfreaks.com/topic/128368-solved-regex-preg_matchin-html/#findComment-665030 Share on other sites More sharing options...
DamienRoche Posted October 14, 2008 Author Share Posted October 14, 2008 Thank you very much!- it works perfectly now. I have one last question. I have done this before but completely forgot how. How do I match things using a wildcard and have them go into an array: like: <?php $html = as above; preg_match('#<table (.*?)="(.*)"[^>]*>(.+?)</table>#is', $str, $match); ?> Notice the (.*) in the code above where 'class' and 'myclass' would be..how do I put that into an array? I have done this before but have completely forgot. Thanks again for all your help! Quote Link to comment https://forums.phpfreaks.com/topic/128368-solved-regex-preg_matchin-html/#findComment-665053 Share on other sites More sharing options...
effigy Posted October 14, 2008 Share Posted October 14, 2008 The function automatically arrays the captures; observe print_r($match);. Quote Link to comment https://forums.phpfreaks.com/topic/128368-solved-regex-preg_matchin-html/#findComment-665061 Share on other sites More sharing options...
DamienRoche Posted October 14, 2008 Author Share Posted October 14, 2008 I'll give you an example of what I can't get to work. $str = "1249|33182|33182|9333|3981847"; preg_match("#(.*)|(.*)|(.*)|(.*)|(.*)#is", $str, $matches); print_r($matches); number 1249: $matches[0][1] (1) $matches[0][2] (2) $matches[0][3] (4) $matches[0][4] (9) Is this the best way to do it? Is there not a way to capture the complete (.*) in an a single instance in the array? Thanks. Quote Link to comment https://forums.phpfreaks.com/topic/128368-solved-regex-preg_matchin-html/#findComment-665085 Share on other sites More sharing options...
effigy Posted October 14, 2008 Share Posted October 14, 2008 | is a metacharacter in regex. Use \| to match a literal pipe. explode would be better in this instance. Quote Link to comment https://forums.phpfreaks.com/topic/128368-solved-regex-preg_matchin-html/#findComment-665093 Share on other sites More sharing options...
DamienRoche Posted October 14, 2008 Author Share Posted October 14, 2008 I've finally got somewhere with all this stuff. Thank you very much for all your help. Thanks effigy for pointing out explode- that func has helped a lot for this. Thanks again everybody!! Quote Link to comment https://forums.phpfreaks.com/topic/128368-solved-regex-preg_matchin-html/#findComment-665116 Share on other sites More sharing options...
nrg_alpha Posted October 14, 2008 Share Posted October 14, 2008 This might be a good time to warn about the usage of wildcards. When you have patterns with .* by example, this becomes inefficient (especially when it appears early on in a pattern that is being matched against a large chunk of data). Every time the regex engine encounters something like .* it ends up matching everything remaining in the string (issues of newlines aside, as by default, the dot wildcard does not match newlines). Then, if there is more stuff after the .* in the pattern, the regex engine has to start backtracking, relinquishing characters in reverse order (one character at a time), checking those relinquished characters against what is after .* to see if it matches. Depending on the location of .* in the pattern, and depending on the size of data being matched against, wildcards can become a speed hinderance. At the very least, I would personally resort to using lazy modifiers .*? This way, the system is first lazy and passes control to check the character that comes after .*? in the pattern, and if not matched, match the current character to .*? , advance forward a character and the cycle starts over again (as opposed to matching everything and then having character backtracking and checking). It is most advisable to use negated character classes (if possible) instead. This makes things much more efficient and speedy. Example: class="[^"]+" instead of class=".*" Regex patterns, while powerful, can hinder speed / performance if not written well. I would suggest Jeff Friedl's book Mastering Regular Expressions if you are really interested in learning how regex engines actually *think*. It will make you rethink how patterns are written, and can lead to some good speed / performance increases, as well as give you a much larger understanding of regex in general. Cheers, NRG Quote Link to comment https://forums.phpfreaks.com/topic/128368-solved-regex-preg_matchin-html/#findComment-665615 Share on other sites More sharing options...
ghostdog74 Posted October 15, 2008 Share Posted October 15, 2008 Any input is welcomed. Thanks. i know its been solve, nevertheless for just this case, no regex needed. $str = "</td> </tr> </table>"; if ( strpos($str,"</td>")!==FALSE && strpos($str,"</tr>")!==FALSE && strpos($str,"</table>")!==FALSE ){ echo "yes"; } Quote Link to comment https://forums.phpfreaks.com/topic/128368-solved-regex-preg_matchin-html/#findComment-665695 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.