mab Posted November 18, 2008 Share Posted November 18, 2008 Hello @all, I currently struggle with RegEx and got stuck in this little problem. I have a sourcecode with nested tables and text between separated tables. I want to remove all tables, but all text, that is not within the tables should stay. I am using the preg_replace function (PCRE-function) to do this. At the moment it's working with non-nested tables. But as soon there are nested tables it doesn't work properly. Hope my explanation is understandable. Anybody out there who can help? I appreciate any suggestions ... Thanks so much in advance!! Well, to explain a bit more, here's some code: $pattern = '{(<[ \\n\\r\\t]*(table)(>|[^>]*>))(.*?)(<[ \\n\\r\\t]*/[ \\n\\r\\t]*(\2)(>|[^>]*>))}is' $replacement = ''; echo preg_replace($pattern, $replacement, $subject); And here is an example for the 'subject': <h1>Some tables and text</h1> <table> <tr> <th>England</th> <th>Paris</th> <th>Munich</th> </tr> <tr> <td> <table> <tr> <th>London</th> <th>Brighton</th> <th>Cambridge</th> </tr> <tr> <td>rain</td> <td>sun</td> <td>wind</td> </tr> </table> </td> <td>sun</td> <td>wind</td> </tr> </table> This is a text between the tables wich should not be removed. <table> <tr> <th>London</th> <th>Paris</th> <th>Munich</th> </tr> <tr> <td>rain</td> <td>sun</td> <td>wind</td> </tr> </table> This is a text after the tables which should also not be removed. Link to comment https://forums.phpfreaks.com/topic/133191-solved-help-nested-tags/ Share on other sites More sharing options...
ddrudik Posted November 18, 2008 Share Posted November 18, 2008 <?php $html='<h1>Some tables and text</h1> <table> <tr> <th>England</th> <th>Paris</th> <th>Munich</th> </tr> <tr> <td> <table> <tr> <th>London</th> <th>Brighton</th> <th>Cambridge</th> </tr> <tr> <td>rain</td> <td>sun</td> <td>wind</td> </tr> </table> </td> <td>sun</td> <td>wind</td> </tr> </table> This is a text between the tables wich should not be removed. <table> <tr> <th>London</th> <th>Paris</th> <th>Munich</th> </tr> <tr> <td>rain</td> <td>sun</td> <td>wind</td> </tr> </table>'; $html=preg_replace('~<table[^>]*>(??>(??!</?table[^>]*>).)+)|(?0))*</table>~is','',$html); echo $html; ?> Link to comment https://forums.phpfreaks.com/topic/133191-solved-help-nested-tags/#findComment-692715 Share on other sites More sharing options...
sasa Posted November 18, 2008 Share Posted November 18, 2008 try <?php $test = '<h1>Some tables and text</h1> <table> <tr> <th>England</th> <th>Paris</th> <th>Munich</th> </tr> <tr> <td> <table> <tr> <th>London</th> <th>Brighton</th> <th>Cambridge</th> </tr> <tr> <td>rain</td> <td>sun</td> <td>wind</td> </tr> </table> </td> <td>sun</td> <td>wind</td> </tr> </table> This is a text between the tables wich should not be removed. <table> <tr> <th>London</th> <th>Paris</th> <th>Munich</th> </tr> <tr> <td>rain</td> <td>sun</td> <td>wind</td> </tr> </table> This is a text after the tables which should also not be removed.'; $out = ''; $start = 0; $a = strpos($test, '<table'); $b = strpos($test, '</table'); $open_tag = 0; while ($a !== false or $b !== false){ if ($a < $b and $a !== false){ if ($open_tag == 0){ $out .= substr($test, $start, $a - $start); } $open_tag++; $a = strpos($test, '<table', $a + 1); } else { $open_tag--; $start = strpos($test, '>', $b+1) + 1; $b = strpos($test, '</table', $start); } } if ($open_tag) die('HTML error!'); $out .= substr($test, $start); echo $out; ?> Link to comment https://forums.phpfreaks.com/topic/133191-solved-help-nested-tags/#findComment-692850 Share on other sites More sharing options...
ddrudik Posted November 19, 2008 Share Posted November 19, 2008 That last code brings up a good point, for every regex function there's a string function that can do the same operation faster and with less overhead. Link to comment https://forums.phpfreaks.com/topic/133191-solved-help-nested-tags/#findComment-693113 Share on other sites More sharing options...
mab Posted November 19, 2008 Author Share Posted November 19, 2008 thanks for all your help. I tried the solutions and both are working for me. I really appreciate your help. Using Regex is quite new for me and I don't understand the whole part of the first solution. The difficult part seems to be the one in parentheses: (??>(??!</?table[^>]*>).)+)|(?0))* Well, I recognized the atomic group, negative lookahead and that there is an alternation. So what does the first alternation do? (?>(??!</?table[^>]*>).)+) Am I right that it's matching everything, but looking first ahead if there is no opening or closing table-tag? And the atomic group keeps the match as a whole and can only be given back as a whole. And the second alternation? (?0) What is this doing? And to which part of the regex does this refer? I would be really happy if you could give a short explanation, so that I understand to create such a pattern on my own the next time. So thanks again for all the nice and fast help! Link to comment https://forums.phpfreaks.com/topic/133191-solved-help-nested-tags/#findComment-693587 Share on other sites More sharing options...
ddrudik Posted November 19, 2008 Share Posted November 19, 2008 A more common way of seeing that is with (?R) instead of (?0) although (?0) helps to illustrate that you could incorporate lookahead and lookbehind and use a capture group 1 as the nested pattern (?1). A complete background is in Friedl's "Mastering Regular Expressions" but for a quick PHP regex syntax overview: http://us3.php.net/manual/en/reference.pcre.pattern.syntax.php Search for "Recursive Patterns" on that page and you will see the discussion of the general pattern, although instead their example matches nested/non-nested parens groups. It is simpler to construct a pattern with a single bounding character such as ( ) versus the table tags but the theory is the same. Link to comment https://forums.phpfreaks.com/topic/133191-solved-help-nested-tags/#findComment-693599 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.