funkyres Posted March 5, 2009 Share Posted March 5, 2009 I'm using the following as part of a filter - <?php $forbidden[] = '/<!\[CDATA\[[^(\]\]>)]*\]\]>/'; $sanitized[] = '<!-- cdata section removed -->'; // then processed via return preg_replace($forbidden, $sanitized, $buffer); ?> It works so long as there is not a ] or > anywhere in the the cdata. Since ]]> is illegal in a cdata block I want to match any cdata that is NOT the three character string [[> I can't seem to figure out how to get regex to match something that is NOT a particular string. I can get it to match a particular string, or match NOT a particular character, but matching NOT a particular string - I can't seem to figure out the syntax for that. [^(\]\]>)] is my most recent attempt. Anyone know how to do this? Quote Link to comment https://forums.phpfreaks.com/topic/148056-removing-cdata-sections/ Share on other sites More sharing options...
funkyres Posted March 5, 2009 Author Share Posted March 5, 2009 This seems to work - $forbidden[] = '/<!\[CDATA\[.*\]\]>/s'; $sanitized[] = '<!-- cdata section removed -->'; though I still want to know how make a pattern that says "match unless it has this particular multi-character phrase in it" I can't seem to find a way to do it via google, all (and I mean all) the regex tutorials seem to silently ignore it, but it can't be that uncommon of a thing to want to do. Quote Link to comment https://forums.phpfreaks.com/topic/148056-removing-cdata-sections/#findComment-777132 Share on other sites More sharing options...
nrg_alpha Posted March 5, 2009 Share Posted March 5, 2009 If you want to strip out the complete CDATA section, perhaps something along these lines? Example: $buffer = <<<DATA <script type="text/javascript"> //<![CDATA[ var nodeList = document.getElementsByTagName('A'); for(var i = 0; i < nodeList.length; i++){ if(nodeList[i].className == 'whatever'){ nodeList[i].style.display = "inline"; } } //]]> </script> DATA; echo $buffer . "<br />\n\n"; $forbidden[] = '#(?://)?<!\[CDATA\[.+?(?://)?\]\]>#s'; // then processed via $buffer = preg_replace($forbidden, '<!-- cdata section removed -->', $buffer); echo $buffer; Output (via right-click view source): <script type="text/javascript"> //<![CDATA[ var nodeList = document.getElementsByTagName('A'); for(var i = 0; i >< nodeList.length; i++){ if(nodeList[i].className == 'dateIcons bookmark'){ nodeList[i].style.display = "inline"; } } //]]> </script><br /> <script type="text/javascript"> <!-- cdata section removed --> </script> Quote Link to comment https://forums.phpfreaks.com/topic/148056-removing-cdata-sections/#findComment-777438 Share on other sites More sharing options...
.josh Posted March 7, 2009 Share Posted March 7, 2009 though I still want to know how make a pattern that says "match unless it has this particular multi-character phrase in it" I can't seem to find a way to do it via google, all (and I mean all) the regex tutorials seem to silently ignore it, but it can't be that uncommon of a thing to want to do. google positive and negative lookaheads and lookbehinds. Quote Link to comment https://forums.phpfreaks.com/topic/148056-removing-cdata-sections/#findComment-778665 Share on other sites More sharing options...
funkyres Posted March 7, 2009 Author Share Posted March 7, 2009 I definitely need to learn more about lookaheads and lookbehinds - seems every time I try to use one, there can't be a variable length expression before or after it. Either it is severely limited in current implementation or I'm doing it wrong. Quote Link to comment https://forums.phpfreaks.com/topic/148056-removing-cdata-sections/#findComment-778671 Share on other sites More sharing options...
.josh Posted March 7, 2009 Share Posted March 7, 2009 you can have variable length in lookaheads but not lookbehinds. Quote Link to comment https://forums.phpfreaks.com/topic/148056-removing-cdata-sections/#findComment-778673 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.