need some help making regex pattern

efficacious · May 29, 2011

Hello,

I'm trying to make a regular expression pattern to find and remove code in a string of html code.

I came up with this pattern but it doesn't seem to be working.. I'm using preg_replace() function and its returing untouched html.

Heres my current pattern:

$Pattern = '/^<!-- MYTAG \/\/-->.+<!-- MYTAG \/\/-->$/s';

I'm trying to match HTML comment tags with MYTAG inbetween them and remove the tags and all the HTML between them.

<!-- MYTAG //--><html>MORE HTML CODE</html><!-- MYTAG //-->

Can any1 help me with this?

~Thanks all

efficacious · May 29, 2011

I even tried just a simple expression and its still not working.. here is how I have it used.

$string = <<<END
<html>
<head><title></title>
</head>
<body>

<!-- MYTAG //-->
<div id='1'>
MYTAG
</div>
<!-- MYTAG //-->

<!-- MYTAG2 //-->
<div id='2'>
MYTAG2
</div>	
<!-- MYTAG2 //-->

</body>
</html>
END;

$NewString = preg_replace("/^MYTAG$/", 'REPLACED', $string);

echo($NewString);

It still doesn't replace anything.. I'm not sure what I'm doing wrong..

efficacious · May 29, 2011

nvm i figured it out.. i just needed to removed the start and end meta characters. I assume the expression is in multi-line mode and they were being considered lines instead of beginning/end markers.

efficacious · June 1, 2011

I thought I had this cracked but I recently hit a snag with it.. it does work but only once.. if I use the "tags" more than one time than the pattern ends up catching everything from the first tag to the last tag. Do I need a subpattern or something that says not to match if there is a match within the match .. this ones got me confused.

efficacious · June 1, 2011

I have an idea i'm going to try but it seems kinda of cumbersome.. please post if you have a better way..

My idea is to use strpos to find where each of the occurrences of the ending tags are.. record those positions in an array

and then using the offeset in preg_replace to force it to remove each tag start from the last set back to the first set.

I'm thinking that will get what I want accomplished..

EDIT: nevermind this won't work either.. I can't tell preg_replace where to start its search from.

requinix · June 1, 2011

Just make the quantifier ungreedy:

.+?

xyph · June 1, 2011

Use back references

%(<!-- [A-z0-9]++ //-->)(.*?)\1%s

English -


(<!-- [A-z0-9]++ //-->)(.*?)\1

Options: dot matches newline

Match the regular expression below and capture its match into backreference number 1 «(<!-- [A-z0-9]++ //-->)»
   Match the characters “<!-- ” literally «<!-- »
   Match a single character present in the list below «[A-z0-9]++»
      Between one and unlimited times, as many times as possible, without giving back (possessive) «++»
      A character in the range between “A” and “z” «A-z»
      A character in the range between “0” and “9” «0-9»
   Match the characters “ //-->” literally « //-->»
Match the regular expression below and capture its match into backreference number 2 «(.*?)»
   Match any single character «.*?»
      Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the same text as most recently matched by capturing group number 1 «\1»

xyph · June 2, 2011

Here's how I might implement it. I don't usually provide code so complete, but you've obviously put a ton of effort into it already.

Let me know if you don't understand any of it

<?php

$string = getData();

print_r( parseCustomTags($string) );

function parseCustomTags( $html ) {

$r = array(); // This will hold the data for all of our tags
preg_match_all('%(<!-- ([A-z0-9]++) //-->)(.*?)\1%s',$html,$ms,PREG_SET_ORDER);
// print_r($ms); // uncomment this to see what your script is doing.
if( empty($ms) ) return $r; // if no custom tags, return empty array
foreach( $ms as $m ) { // loop through matches
	$m[3] = trim($m[3]); // get rid of extra whitespace around data
	$r[] = array($m[2],$m[3]); // add an array to $r containing tag and data
	$inners = parseCustomTags($m[3]); // check if there are nested tags
	if( !empty($inners) ) // if there are nested tags
		foreach( $inners as $inner ) // loop through nests
			$r[] = $inner; // add nests to array
}
return $r; // return the populated array
}

function getData() {

return <<<END
<html>
<head><title></title>
</head>
<body>

<!-- MYTAG //-->
<div id='1'>
MYTAG
</div>
<!-- MYTAG //-->

<!-- MYTAG2 //-->
<div id='2'>
MYTAG2
<!-- EMBEDTAG //-->
	morestuff
	<b>with html encoding!<b>
<!-- EMBEDTAG //-->
</div>	
<!-- MYTAG2 //-->

</body>
</html>
END;

}

?>

efficacious · June 2, 2011

Here's how I might implement it. I don't usually provide code so complete, but you've obviously put a ton of effort into it already.

Let me know if you don't understand any of it

<?php

$string = getData();

print_r( parseCustomTags($string) );

function parseCustomTags( $html ) {

$r = array(); // This will hold the data for all of our tags
preg_match_all('%(<!-- ([A-z0-9]++) //-->)(.*?)\1%s',$html,$ms,PREG_SET_ORDER);
// print_r($ms); // uncomment this to see what your script is doing.
if( empty($ms) ) return $r; // if no custom tags, return empty array
foreach( $ms as $m ) { // loop through matches
	$m[3] = trim($m[3]); // get rid of extra whitespace around data
	$r[] = array($m[2],$m[3]); // add an array to $r containing tag and data
	$inners = parseCustomTags($m[3]); // check if there are nested tags
	if( !empty($inners) ) // if there are nested tags
		foreach( $inners as $inner ) // loop through nests
			$r[] = $inner; // add nests to array
}
return $r; // return the populated array
}

function getData() {

return <<<END
<html>
<head><title></title>
</head>
<body>

<!-- MYTAG //-->
<div id='1'>
MYTAG
</div>
<!-- MYTAG //-->

<!-- MYTAG2 //-->
<div id='2'>
MYTAG2
<!-- EMBEDTAG //-->
	morestuff
	<b>with html encoding!<b>
<!-- EMBEDTAG //-->
</div>	
<!-- MYTAG2 //-->

</body>
</html>
END;

}

?>

wow xyph thanks that is a very nifty script.. the issue I think i have with it is that it doesn't make the changes to the searched string itself.. It finds the matches and records them but I need them removed, maybe I'm just not following it properly.. I'll continue to study it.. I do like the fact that it actually will find the names of the tag dynamically. I may need that later on. In the meantime I did find requinix's suggestion to work quite well, I'll have to stress it some more to be sure but this is solved again for now guys thanks very much!

Sign In

need some help making regex pattern

Recommended Posts

efficacious

Link to comment

Share on other sites

efficacious

Link to comment

Share on other sites

efficacious

Link to comment

Share on other sites

efficacious

Link to comment

Share on other sites

efficacious

Link to comment

Share on other sites

requinix

Link to comment

Share on other sites

xyph

Link to comment

Share on other sites

xyph

Link to comment

Share on other sites

efficacious

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information