efficacious Posted May 29, 2011 Share Posted May 29, 2011 Hello, I'm trying to make a regular expression pattern to find and remove code in a string of html code. I came up with this pattern but it doesn't seem to be working.. I'm using preg_replace() function and its returing untouched html. Heres my current pattern: $Pattern = '/^<!-- MYTAG \/\/-->.+<!-- MYTAG \/\/-->$/s'; I'm trying to match HTML comment tags with MYTAG inbetween them and remove the tags and all the HTML between them. <!-- MYTAG //--><html>MORE HTML CODE</html><!-- MYTAG //--> Can any1 help me with this? ~Thanks all Quote Link to comment Share on other sites More sharing options...
efficacious Posted May 29, 2011 Author Share Posted May 29, 2011 I even tried just a simple expression and its still not working.. here is how I have it used. $string = <<<END <html> <head><title></title> </head> <body> <!-- MYTAG //--> <div id='1'> MYTAG </div> <!-- MYTAG //--> <!-- MYTAG2 //--> <div id='2'> MYTAG2 </div> <!-- MYTAG2 //--> </body> </html> END; $NewString = preg_replace("/^MYTAG$/", 'REPLACED', $string); echo($NewString); It still doesn't replace anything.. I'm not sure what I'm doing wrong.. Quote Link to comment Share on other sites More sharing options...
efficacious Posted May 29, 2011 Author Share Posted May 29, 2011 nvm i figured it out.. i just needed to removed the start and end meta characters. I assume the expression is in multi-line mode and they were being considered lines instead of beginning/end markers. Quote Link to comment Share on other sites More sharing options...
efficacious Posted June 1, 2011 Author Share Posted June 1, 2011 I thought I had this cracked but I recently hit a snag with it.. it does work but only once.. if I use the "tags" more than one time than the pattern ends up catching everything from the first tag to the last tag. Do I need a subpattern or something that says not to match if there is a match within the match .. this ones got me confused. Quote Link to comment Share on other sites More sharing options...
efficacious Posted June 1, 2011 Author Share Posted June 1, 2011 I have an idea i'm going to try but it seems kinda of cumbersome.. please post if you have a better way.. My idea is to use strpos to find where each of the occurrences of the ending tags are.. record those positions in an array and then using the offeset in preg_replace to force it to remove each tag start from the last set back to the first set. I'm thinking that will get what I want accomplished.. EDIT: nevermind this won't work either.. I can't tell preg_replace where to start its search from. Quote Link to comment Share on other sites More sharing options...
requinix Posted June 1, 2011 Share Posted June 1, 2011 Just make the quantifier ungreedy: .+? Quote Link to comment Share on other sites More sharing options...
xyph Posted June 1, 2011 Share Posted June 1, 2011 Use back references %(<!-- [A-z0-9]++ //-->)(.*?)\1%s English - (<!-- [A-z0-9]++ //-->)(.*?)\1 Options: dot matches newline Match the regular expression below and capture its match into backreference number 1 «(<!-- [A-z0-9]++ //-->)» Match the characters “<!-- ” literally «<!-- » Match a single character present in the list below «[A-z0-9]++» Between one and unlimited times, as many times as possible, without giving back (possessive) «++» A character in the range between “A” and “z” «A-z» A character in the range between “0” and “9” «0-9» Match the characters “ //-->” literally « //-->» Match the regular expression below and capture its match into backreference number 2 «(.*?)» Match any single character «.*?» Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?» Match the same text as most recently matched by capturing group number 1 «\1» Quote Link to comment Share on other sites More sharing options...
xyph Posted June 2, 2011 Share Posted June 2, 2011 Here's how I might implement it. I don't usually provide code so complete, but you've obviously put a ton of effort into it already. Let me know if you don't understand any of it <?php $string = getData(); print_r( parseCustomTags($string) ); function parseCustomTags( $html ) { $r = array(); // This will hold the data for all of our tags preg_match_all('%(<!-- ([A-z0-9]++) //-->)(.*?)\1%s',$html,$ms,PREG_SET_ORDER); // print_r($ms); // uncomment this to see what your script is doing. if( empty($ms) ) return $r; // if no custom tags, return empty array foreach( $ms as $m ) { // loop through matches $m[3] = trim($m[3]); // get rid of extra whitespace around data $r[] = array($m[2],$m[3]); // add an array to $r containing tag and data $inners = parseCustomTags($m[3]); // check if there are nested tags if( !empty($inners) ) // if there are nested tags foreach( $inners as $inner ) // loop through nests $r[] = $inner; // add nests to array } return $r; // return the populated array } function getData() { return <<<END <html> <head><title></title> </head> <body> <!-- MYTAG //--> <div id='1'> MYTAG </div> <!-- MYTAG //--> <!-- MYTAG2 //--> <div id='2'> MYTAG2 <!-- EMBEDTAG //--> morestuff <b>with html encoding!<b> <!-- EMBEDTAG //--> </div> <!-- MYTAG2 //--> </body> </html> END; } ?> Quote Link to comment Share on other sites More sharing options...
efficacious Posted June 2, 2011 Author Share Posted June 2, 2011 Here's how I might implement it. I don't usually provide code so complete, but you've obviously put a ton of effort into it already. Let me know if you don't understand any of it <?php $string = getData(); print_r( parseCustomTags($string) ); function parseCustomTags( $html ) { $r = array(); // This will hold the data for all of our tags preg_match_all('%(<!-- ([A-z0-9]++) //-->)(.*?)\1%s',$html,$ms,PREG_SET_ORDER); // print_r($ms); // uncomment this to see what your script is doing. if( empty($ms) ) return $r; // if no custom tags, return empty array foreach( $ms as $m ) { // loop through matches $m[3] = trim($m[3]); // get rid of extra whitespace around data $r[] = array($m[2],$m[3]); // add an array to $r containing tag and data $inners = parseCustomTags($m[3]); // check if there are nested tags if( !empty($inners) ) // if there are nested tags foreach( $inners as $inner ) // loop through nests $r[] = $inner; // add nests to array } return $r; // return the populated array } function getData() { return <<<END <html> <head><title></title> </head> <body> <!-- MYTAG //--> <div id='1'> MYTAG </div> <!-- MYTAG //--> <!-- MYTAG2 //--> <div id='2'> MYTAG2 <!-- EMBEDTAG //--> morestuff <b>with html encoding!<b> <!-- EMBEDTAG //--> </div> <!-- MYTAG2 //--> </body> </html> END; } ?> wow xyph thanks that is a very nifty script.. the issue I think i have with it is that it doesn't make the changes to the searched string itself.. It finds the matches and records them but I need them removed, maybe I'm just not following it properly.. I'll continue to study it.. I do like the fact that it actually will find the names of the tag dynamically. I may need that later on. In the meantime I did find requinix's suggestion to work quite well, I'll have to stress it some more to be sure but this is solved again for now guys thanks very much! Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.