Tag matching

soycharliente · February 25, 2013

I'm working on a site that will make use of some custom tags. The idea is that depending on the visitor type, anything with tags wrapped around it will be hidden unless the type matches their type. The tags will help personalize content based on the visitor type. The tags will be structured like this: [[start:TYPE]]Content here.[[end:TYPE]]

This works great for stripping out content that shouldn't be displayed for them:

\[\[start\$remove)\]\](.*?)\[\[end\$remove)\]\]

And this works great for preserving content that should be displayed for them:

\[\[(start|end)\$keep)\]\]

The problem that I'm having is that I have to formally list each type that I want to remove. I would prefer to simply remove all of the tags that have a type that is not their type. I cannot for the life of me apply the ^ flag which I thought would mean "not type" listed.

\[\[start\^$keep)\]\](.*?)\[\[end\^$keep)\]\]

Can someone help me understand why this doesn't work and what I need to look into for how to get this small tweak going?

I'm working with the flavor of regex that Java uses. Thanks!

Edited February 25, 2013 by charlieholder

soycharliente · February 25, 2013

Pasted the wrong version of the code.

Updated above as well:

\[\[start\^$keep)\]\](.*?)\[\[end\^$keep)\]\]

Edited February 25, 2013 by charlieholder

requinix · February 26, 2013

Outside of a character set ^ means the beginning of the string (or in multiline mode the beginning of the line).

As the first character inside a character set ^ means "none of the following characters".

As anything after the first character inside a character set ^ means literally a ^.

That's all.

If you don't want something to match then use a negative assertion, but you then have to specify something that can match.

\[\[start:(?!$keep\])[^\]]+\]\]...

The added \] after $keep is so that $keep=123 will not prevent [[start:1234]] from matching.

Edited February 26, 2013 by requinix

Christian F. · February 26, 2013

The easier way, is to first remove the tags around the content you want the user to keep. Then, remove everything you don't want the user to keep. That way you don't have to use the $remove variable in the removing RegExp, but can use a standard character group instead.

soycharliente · February 26, 2013

The added \] after $keep is so that $keep=123 will not prevent [[start:1234]] from matching.

I would not have thought about that edge case. Thanks!

The easier way, is to first remove the tags around the content you want the user to keep. Then, remove everything you don't want the user to keep.

This makes a lot of sense. Thanks!

soycharliente · February 26, 2013

$regexK = "\[\[(start|end)\$keep)\]\]"
$regexR = "\[\[start:(?!$keep\])[^\]]+\]\](.*?)\[\[end:(?!$keep\])[^\]]+\]\]"

Taking both of those pieces of advice, this seems to be getting the job done very well. I run the top one first to remove tags for what is kept and then the bottom one to remove all over pieces of information (including tags) for what doesn't match.

Sign In

Tag matching

Recommended Posts

soycharliente

Link to comment

Share on other sites

soycharliente

Link to comment

Share on other sites

requinix

Link to comment

Share on other sites

Christian F.

Link to comment

Share on other sites

soycharliente

Link to comment

Share on other sites

soycharliente

Link to comment

Share on other sites

Join the conversation

Browse

Activity

Important Information