Jump to content

need some help making regex pattern


efficacious

Recommended Posts

Hello,

 

I'm trying to make a regular expression pattern to find and remove code in a string of html code.

 

I came up with this pattern but it doesn't seem to be working.. I'm using preg_replace() function and its returing untouched html.

 

Heres my current pattern:

$Pattern = '/^<!-- MYTAG \/\/-->.+<!-- MYTAG \/\/-->$/s';

 

I'm trying to match HTML comment tags with MYTAG inbetween them and remove the tags and all the HTML between them.

<!-- MYTAG //--><html>MORE HTML CODE</html><!-- MYTAG //-->

Can any1 help me with this?

 

~Thanks all

Link to comment
Share on other sites

I even tried just a simple expression and its still not working.. here is how I have it used.

 

$string = <<<END
<html>
<head><title></title>
</head>
<body>

<!-- MYTAG //-->
<div id='1'>
MYTAG
</div>
<!-- MYTAG //-->

<!-- MYTAG2 //-->
<div id='2'>
MYTAG2
</div>	
<!-- MYTAG2 //-->

</body>
</html>
END;

$NewString = preg_replace("/^MYTAG$/", 'REPLACED', $string);

echo($NewString);

 

It still doesn't replace anything.. I'm not sure what I'm doing wrong..

Link to comment
Share on other sites

I thought I had this cracked but I recently hit a snag with it.. it does work but only once.. if I use the "tags" more than one time than the pattern ends up catching everything from the first tag to the last tag. Do I need a subpattern or something that says not to match if there is a match within the match .. this ones got me confused.

Link to comment
Share on other sites

I have an idea i'm going to try but it seems kinda of cumbersome.. please post if you have a better way..

 

My idea is to use strpos to find where each of the occurrences of the ending tags are.. record those positions in an array

and then using the offeset in preg_replace to force it to remove each tag start from the last set back to the first set.

I'm thinking that will get what I want accomplished..

 

EDIT: nevermind this won't work either.. I can't tell preg_replace where to start its search from.

Link to comment
Share on other sites

Use back references

 

%(<!-- [A-z0-9]++ //-->)(.*?)\1%s

 

English -


(<!-- [A-z0-9]++ //-->)(.*?)\1

Options: dot matches newline

Match the regular expression below and capture its match into backreference number 1 «(<!-- [A-z0-9]++ //-->)»
   Match the characters “<!-- ” literally «<!-- »
   Match a single character present in the list below «[A-z0-9]++»
      Between one and unlimited times, as many times as possible, without giving back (possessive) «++»
      A character in the range between “A” and “z” «A-z»
      A character in the range between “0” and “9” «0-9»
   Match the characters “ //-->” literally « //-->»
Match the regular expression below and capture its match into backreference number 2 «(.*?)»
   Match any single character «.*?»
      Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the same text as most recently matched by capturing group number 1 «\1»

Link to comment
Share on other sites

Here's how I might implement it. I don't usually provide code so complete, but you've obviously put a ton of effort into it already.

Let me know if you don't understand any of it

 

<?php

$string = getData();

print_r( parseCustomTags($string) );

function parseCustomTags( $html ) {

$r = array(); // This will hold the data for all of our tags
preg_match_all('%(<!-- ([A-z0-9]++) //-->)(.*?)\1%s',$html,$ms,PREG_SET_ORDER);
// print_r($ms); // uncomment this to see what your script is doing.
if( empty($ms) ) return $r; // if no custom tags, return empty array
foreach( $ms as $m ) { // loop through matches
	$m[3] = trim($m[3]); // get rid of extra whitespace around data
	$r[] = array($m[2],$m[3]); // add an array to $r containing tag and data
	$inners = parseCustomTags($m[3]); // check if there are nested tags
	if( !empty($inners) ) // if there are nested tags
		foreach( $inners as $inner ) // loop through nests
			$r[] = $inner; // add nests to array
}
return $r; // return the populated array
}

function getData() {

return <<<END
<html>
<head><title></title>
</head>
<body>

<!-- MYTAG //-->
<div id='1'>
MYTAG
</div>
<!-- MYTAG //-->

<!-- MYTAG2 //-->
<div id='2'>
MYTAG2
<!-- EMBEDTAG //-->
	morestuff
	<b>with html encoding!<b>
<!-- EMBEDTAG //-->
</div>	
<!-- MYTAG2 //-->

</body>
</html>
END;

}

?>

Link to comment
Share on other sites

Here's how I might implement it. I don't usually provide code so complete, but you've obviously put a ton of effort into it already.

Let me know if you don't understand any of it

 

<?php

$string = getData();

print_r( parseCustomTags($string) );

function parseCustomTags( $html ) {

$r = array(); // This will hold the data for all of our tags
preg_match_all('%(<!-- ([A-z0-9]++) //-->)(.*?)\1%s',$html,$ms,PREG_SET_ORDER);
// print_r($ms); // uncomment this to see what your script is doing.
if( empty($ms) ) return $r; // if no custom tags, return empty array
foreach( $ms as $m ) { // loop through matches
	$m[3] = trim($m[3]); // get rid of extra whitespace around data
	$r[] = array($m[2],$m[3]); // add an array to $r containing tag and data
	$inners = parseCustomTags($m[3]); // check if there are nested tags
	if( !empty($inners) ) // if there are nested tags
		foreach( $inners as $inner ) // loop through nests
			$r[] = $inner; // add nests to array
}
return $r; // return the populated array
}

function getData() {

return <<<END
<html>
<head><title></title>
</head>
<body>

<!-- MYTAG //-->
<div id='1'>
MYTAG
</div>
<!-- MYTAG //-->

<!-- MYTAG2 //-->
<div id='2'>
MYTAG2
<!-- EMBEDTAG //-->
	morestuff
	<b>with html encoding!<b>
<!-- EMBEDTAG //-->
</div>	
<!-- MYTAG2 //-->

</body>
</html>
END;

}

?>

 

wow xyph thanks that is a very nifty script.. the issue I think i have with it is that it doesn't make the changes to the searched string itself.. It finds the matches and records them but I need them removed, maybe I'm just not following it properly.. I'll continue to study it.. I do like the fact that it actually will find the names of the tag dynamically. I may need that later on. In the meantime I did find requinix's suggestion to work quite well, I'll have to stress it some more to be sure but this is solved again for now guys thanks very much! :D

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.