mab Posted November 18, 2008 Share Posted November 18, 2008 Hello @all, I currently struggle with RegEx and got stuck in this little problem. I have a sourcecode with nested tables and text between separated tables. I want to remove all tables, but all text, that is not within the tables should stay. I am using the preg_replace function (PCRE-function) to do this. At the moment it's working with non-nested tables. But as soon there are nested tables it doesn't work properly. Hope my explanation is understandable. Anybody out there who can help? I appreciate any suggestions ... Thanks so much in advance!! Well, to explain a bit more, here's some code: $pattern = '{(<[ \\n\\r\\t]*(table)(>|[^>]*>))(.*?)(<[ \\n\\r\\t]*/[ \\n\\r\\t]*(\2)(>|[^>]*>))}is' $replacement = ''; echo preg_replace($pattern, $replacement, $subject); And here is an example for the 'subject': <h1>Some tables and text</h1> <table> <tr> <th>England</th> <th>Paris</th> <th>Munich</th> </tr> <tr> <td> <table> <tr> <th>London</th> <th>Brighton</th> <th>Cambridge</th> </tr> <tr> <td>rain</td> <td>sun</td> <td>wind</td> </tr> </table> </td> <td>sun</td> <td>wind</td> </tr> </table> This is a text between the tables wich should not be removed. <table> <tr> <th>London</th> <th>Paris</th> <th>Munich</th> </tr> <tr> <td>rain</td> <td>sun</td> <td>wind</td> </tr> </table> This is a text after the tables which should also not be removed. Quote Link to comment https://forums.phpfreaks.com/topic/133191-solved-help-nested-tags/ Share on other sites More sharing options...
ddrudik Posted November 18, 2008 Share Posted November 18, 2008 <?php $html='<h1>Some tables and text</h1> <table> <tr> <th>England</th> <th>Paris</th> <th>Munich</th> </tr> <tr> <td> <table> <tr> <th>London</th> <th>Brighton</th> <th>Cambridge</th> </tr> <tr> <td>rain</td> <td>sun</td> <td>wind</td> </tr> </table> </td> <td>sun</td> <td>wind</td> </tr> </table> This is a text between the tables wich should not be removed. <table> <tr> <th>London</th> <th>Paris</th> <th>Munich</th> </tr> <tr> <td>rain</td> <td>sun</td> <td>wind</td> </tr> </table>'; $html=preg_replace('~<table[^>]*>(??>(??!</?table[^>]*>).)+)|(?0))*</table>~is','',$html); echo $html; ?> Quote Link to comment https://forums.phpfreaks.com/topic/133191-solved-help-nested-tags/#findComment-692715 Share on other sites More sharing options...
sasa Posted November 18, 2008 Share Posted November 18, 2008 try <?php $test = '<h1>Some tables and text</h1> <table> <tr> <th>England</th> <th>Paris</th> <th>Munich</th> </tr> <tr> <td> <table> <tr> <th>London</th> <th>Brighton</th> <th>Cambridge</th> </tr> <tr> <td>rain</td> <td>sun</td> <td>wind</td> </tr> </table> </td> <td>sun</td> <td>wind</td> </tr> </table> This is a text between the tables wich should not be removed. <table> <tr> <th>London</th> <th>Paris</th> <th>Munich</th> </tr> <tr> <td>rain</td> <td>sun</td> <td>wind</td> </tr> </table> This is a text after the tables which should also not be removed.'; $out = ''; $start = 0; $a = strpos($test, '<table'); $b = strpos($test, '</table'); $open_tag = 0; while ($a !== false or $b !== false){ if ($a < $b and $a !== false){ if ($open_tag == 0){ $out .= substr($test, $start, $a - $start); } $open_tag++; $a = strpos($test, '<table', $a + 1); } else { $open_tag--; $start = strpos($test, '>', $b+1) + 1; $b = strpos($test, '</table', $start); } } if ($open_tag) die('HTML error!'); $out .= substr($test, $start); echo $out; ?> Quote Link to comment https://forums.phpfreaks.com/topic/133191-solved-help-nested-tags/#findComment-692850 Share on other sites More sharing options...
ddrudik Posted November 19, 2008 Share Posted November 19, 2008 That last code brings up a good point, for every regex function there's a string function that can do the same operation faster and with less overhead. Quote Link to comment https://forums.phpfreaks.com/topic/133191-solved-help-nested-tags/#findComment-693113 Share on other sites More sharing options...
mab Posted November 19, 2008 Author Share Posted November 19, 2008 thanks for all your help. I tried the solutions and both are working for me. I really appreciate your help. Using Regex is quite new for me and I don't understand the whole part of the first solution. The difficult part seems to be the one in parentheses: (??>(??!</?table[^>]*>).)+)|(?0))* Well, I recognized the atomic group, negative lookahead and that there is an alternation. So what does the first alternation do? (?>(??!</?table[^>]*>).)+) Am I right that it's matching everything, but looking first ahead if there is no opening or closing table-tag? And the atomic group keeps the match as a whole and can only be given back as a whole. And the second alternation? (?0) What is this doing? And to which part of the regex does this refer? I would be really happy if you could give a short explanation, so that I understand to create such a pattern on my own the next time. So thanks again for all the nice and fast help! Quote Link to comment https://forums.phpfreaks.com/topic/133191-solved-help-nested-tags/#findComment-693587 Share on other sites More sharing options...
ddrudik Posted November 19, 2008 Share Posted November 19, 2008 A more common way of seeing that is with (?R) instead of (?0) although (?0) helps to illustrate that you could incorporate lookahead and lookbehind and use a capture group 1 as the nested pattern (?1). A complete background is in Friedl's "Mastering Regular Expressions" but for a quick PHP regex syntax overview: http://us3.php.net/manual/en/reference.pcre.pattern.syntax.php Search for "Recursive Patterns" on that page and you will see the discussion of the general pattern, although instead their example matches nested/non-nested parens groups. It is simpler to construct a pattern with a single bounding character such as ( ) versus the table tags but the theory is the same. Quote Link to comment https://forums.phpfreaks.com/topic/133191-solved-help-nested-tags/#findComment-693599 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.