Jump to content

Recommended Posts

Or you could go this route (which basically accepts your <key=3D"value"> and have preg_replace simple reconstruct the statement without the second capture ($2)..) Using this expression, it doesn't care what's in the double quotes.. so in theory, you could dump anything with the format <key=*"anyname"> (* = anything).

 

$str = '<key=3D"value">';
$str = preg_replace('#(<key=)(\w+)("\w+">)#e', '"$1$3"', $str);
echo $str;

Thank you. Could you please explain the regular expression a bit further.

key and value could be any html and css tag.  So they are arbitrary. 

So I guess I am looking for something like preg_replace("(<.*=)(3D)(.*>)", '"$1$3"', $str).  Let me explain what I am trying to do.

(<.*=) means starts with a < then bunch of characters then end with a =, (.*>) means bunch of characters followed by a >.  But how do I write it?  Sorry I am not all that great regular expressions.

 

Doesn't \w represent digits?  What does # represent?  Tried googling it but I could not find anything.

 

 

Ah.. should have it explained it that way from the beginning  ;)

 

Ok, so here is the newer version..

 

$str = '<H1=2D"some_value">'; // try something other than 'key'..
$str = preg_replace('#(<[^=]+=)(\w+)("\w+">)#e', '"$1$3"', $str);
echo $str;

 

Here's the breakdown of the pattern..

The first capture (which is the first set of parenthesis..which is automatically stored as variable $1) is:

(<[^=]+=)

So basically, this is saying: start with a <, then in a character class '[ ]', match anything that is NOT an equal sign (the not part is due to the carot '^' inside the class. So match anything not an equal sign one or more times (the plus sign), until you reach an equal sign (the last equal sign inside the parenthesis).

 

Next, we have the second capture (which is obviously the second parenthesis ($2).. this is what we will ultimately not include..

(\w+)

the \w is a word character, which by standard definition matches a-zA-Z0-9_ (although depending on your locale, it might actually match more.. but for this sake, not important).. so since the third  part starts with an equal sign (which is NOT matched by \w), the regex engine will match all characters (which fall into the \w category one or more times (till it arrives at the equal sign), and stores this into variable $2 automatically.

 

Finally, we get to the last set of parenthesis ($3):

("\w+">)

And this basically says, any word character once or more times then a '>' character. And that's the pattern..

 

You may notice the 'e' after the last delimiter.. this is a modifier for preg_replace.. it allows the replace aspect to utilise php code.. so, looking at the replacement part, what has been basically done is:

'"$1$3"'

 

Which is a set of single quotes with a set of double quotes nested inside. Inside those quotes is the first and third captures that we want (remember, we don't want the second capture).. and lastly, in this example, we tell regex that we are using $str string as the source for all of this to be matched in..

 

Hopefully, this helps you in understanding this a little better.. regex can be tricky at first..  but if you keep hacking away at it, it starts to slowly make sense :)

 

Cheers,

 

NRG

 

 

Oops.. I re-read it afterwards and made some errors in my explanation.. so I'll revise what I didn't get right..

The first capture (which is the first set of parenthesis..which is automatically stored as variable $1) is:

 

should be:

The first capture (which is the first set of parenthesis..which (if matched) is automatically stored as variable $1) is:

 

Next, we have the second capture (which is obviously the second parenthesis ($2).. this is what we will ultimately not include..

(\w+)

the \w is a word character, which by standard definition matches a-zA-Z0-9_ (although depending on your locale, it might actually match more.. but for this sake, not important).. so since the third  part starts with an equal sign (which is NOT matched by \w), the regex engine will match all characters (which fall into the \w category one or more times (till it arrives at the equal sign), and stores this into variable $2 automatically.

 

should be:

 

Next, we have the second capture (which is obviously the second parenthesis ($2).. this is what we will ultimately not include..

(\w+)

the \w is a word character, which by standard definition matches a-zA-Z0-9_ (although depending on your locale, it might actually match more.. but for this sake, not important).. so since the third  part starts with an equal sign (which is NOT matched by \w), the regex engine will match all characters (which fall into the \w category one or more times (till it arrives at the double quote sign), and stores this into variable $2 automatically.

$str = preg_replace('#(<[^=>]+=)(\w+)("\w+">)#', '$1$3', $str);

 

  • I added > to the character class so it will not match into another tag should the originating one not have an equals sign.
  • There's no need for /e.

 

Another approach with less captures, should it fit the context of the data:

$str = preg_replace('#(?<==)\w+(?=")#', '', $str);

$str = preg_replace('#(<[^=>]+=)(\w+)("\w+">)#', '$1$3', $str);

 

  • I added > to the character class so it will not match into another tag should the originating one not have an equals sign.
  • There's no need for /e.

 

Ah, I forgot about including the closing '>' character.. oops.. yes, that would be bad if the string had multiple tags.. my bad  :-[

As for the e modifier, a slip up on my part by mixing single and double quotes (it's those little things here and there that end up snagging you!)

 

Another approach with less captures, should it fit the context of the data:

$str = preg_replace('#(?<==)\w+(?=")#', '', $str);

 

As for your newest pattern involving less captures.. I automatically wondered why there was a lookahead assertion after the \w+ (afterall, \w does not encompass " characters, and thus should stop there. While your example works, I tested it without the forward assertion.. and it still works:

 

$str = '<H1=2D"some_value">';
$str = preg_replace('#(?<==)\w+#', '', $str);
echo $str;

 

output (via right-click - view source):

<H1="some_value">

 

Is the forward assertion absolutely necessary? Or am I missing something?

 

It depends on the data. I think it's a safer approach because we're making two verifications rather than one: (1) the data must appear after an equals sign; and (2) the data must appear before a double quote.

 

Otherwise, it could botch up something like this:

 

<pre>
<?php
$str = '<H1=2D"some_value">The following is an emoticon: =o) The following is a formula: a=b*c';
$str = preg_replace('#(?<==)\w+#', '', $str);
echo $str;
?>
</pre>

Given your last code snippet, when I run it, I get:

 

source code ouput:

<H1="some_value">The following is an emoticon: =) The following is a formula: a=*c

 

This still acheives what we want, no? (meaning, remove any word characters after = but before "). Naturally, the rest gets outputted on screen. But as far as tags are concerned, would this still not suffice?

 

Is there another example using the preg pattern last used that could yeild some unpredictable results?

 

I do agree that the additional lookahead assertion would certainly 'strengthen' the 'conditions' required to match. I suppose better to be safe than sorry. Just as someone still learning regex, it has peaked my curiosity as to the 'why' aspect of it (not sure if I'm explaining myself correctly or not).

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.