ryy705 Posted August 27, 2008 Share Posted August 27, 2008 Hello, I need to replace '3D' out of malformed strings like <key=3D"value">. I assume the that I could match this by using "<.*(3D).*>" but how can I replace 3D with an empty string? Many thanks in advance for helping me. Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted August 27, 2008 Share Posted August 27, 2008 You could just use a simple str_replace: $str = '<key=3D"value">'; $str = str_replace('3D', '', $str); echo $str; Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted August 27, 2008 Share Posted August 27, 2008 Or you could go this route (which basically accepts your <key=3D"value"> and have preg_replace simple reconstruct the statement without the second capture ($2)..) Using this expression, it doesn't care what's in the double quotes.. so in theory, you could dump anything with the format <key=*"anyname"> (* = anything). $str = '<key=3D"value">'; $str = preg_replace('#(<key=)(\w+)("\w+">)#e', '"$1$3"', $str); echo $str; Quote Link to comment Share on other sites More sharing options...
ryy705 Posted August 28, 2008 Author Share Posted August 28, 2008 Thank you. Could you please explain the regular expression a bit further. key and value could be any html and css tag. So they are arbitrary. So I guess I am looking for something like preg_replace("(<.*=)(3D)(.*>)", '"$1$3"', $str). Let me explain what I am trying to do. (<.*=) means starts with a < then bunch of characters then end with a =, (.*>) means bunch of characters followed by a >. But how do I write it? Sorry I am not all that great regular expressions. Doesn't \w represent digits? What does # represent? Tried googling it but I could not find anything. Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted August 28, 2008 Share Posted August 28, 2008 Ah.. should have it explained it that way from the beginning Ok, so here is the newer version.. $str = '<H1=2D"some_value">'; // try something other than 'key'.. $str = preg_replace('#(<[^=]+=)(\w+)("\w+">)#e', '"$1$3"', $str); echo $str; Here's the breakdown of the pattern.. The first capture (which is the first set of parenthesis..which is automatically stored as variable $1) is: (<[^=]+=) So basically, this is saying: start with a <, then in a character class '[ ]', match anything that is NOT an equal sign (the not part is due to the carot '^' inside the class. So match anything not an equal sign one or more times (the plus sign), until you reach an equal sign (the last equal sign inside the parenthesis). Next, we have the second capture (which is obviously the second parenthesis ($2).. this is what we will ultimately not include.. (\w+) the \w is a word character, which by standard definition matches a-zA-Z0-9_ (although depending on your locale, it might actually match more.. but for this sake, not important).. so since the third part starts with an equal sign (which is NOT matched by \w), the regex engine will match all characters (which fall into the \w category one or more times (till it arrives at the equal sign), and stores this into variable $2 automatically. Finally, we get to the last set of parenthesis ($3): ("\w+">) And this basically says, any word character once or more times then a '>' character. And that's the pattern.. You may notice the 'e' after the last delimiter.. this is a modifier for preg_replace.. it allows the replace aspect to utilise php code.. so, looking at the replacement part, what has been basically done is: '"$1$3"' Which is a set of single quotes with a set of double quotes nested inside. Inside those quotes is the first and third captures that we want (remember, we don't want the second capture).. and lastly, in this example, we tell regex that we are using $str string as the source for all of this to be matched in.. Hopefully, this helps you in understanding this a little better.. regex can be tricky at first.. but if you keep hacking away at it, it starts to slowly make sense Cheers, NRG Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted August 28, 2008 Share Posted August 28, 2008 Oops.. I re-read it afterwards and made some errors in my explanation.. so I'll revise what I didn't get right.. The first capture (which is the first set of parenthesis..which is automatically stored as variable $1) is: should be: The first capture (which is the first set of parenthesis..which (if matched) is automatically stored as variable $1) is: Next, we have the second capture (which is obviously the second parenthesis ($2).. this is what we will ultimately not include.. (\w+) the \w is a word character, which by standard definition matches a-zA-Z0-9_ (although depending on your locale, it might actually match more.. but for this sake, not important).. so since the third part starts with an equal sign (which is NOT matched by \w), the regex engine will match all characters (which fall into the \w category one or more times (till it arrives at the equal sign), and stores this into variable $2 automatically. should be: Next, we have the second capture (which is obviously the second parenthesis ($2).. this is what we will ultimately not include.. (\w+) the \w is a word character, which by standard definition matches a-zA-Z0-9_ (although depending on your locale, it might actually match more.. but for this sake, not important).. so since the third part starts with an equal sign (which is NOT matched by \w), the regex engine will match all characters (which fall into the \w category one or more times (till it arrives at the double quote sign), and stores this into variable $2 automatically. Quote Link to comment Share on other sites More sharing options...
effigy Posted August 28, 2008 Share Posted August 28, 2008 $str = preg_replace('#(<[^=>]+=)(\w+)("\w+">)#', '$1$3', $str); I added > to the character class so it will not match into another tag should the originating one not have an equals sign. There's no need for /e. Another approach with less captures, should it fit the context of the data: $str = preg_replace('#(?<==)\w+(?=")#', '', $str); Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted August 28, 2008 Share Posted August 28, 2008 $str = preg_replace('#(<[^=>]+=)(\w+)("\w+">)#', '$1$3', $str); I added > to the character class so it will not match into another tag should the originating one not have an equals sign. There's no need for /e. Ah, I forgot about including the closing '>' character.. oops.. yes, that would be bad if the string had multiple tags.. my bad As for the e modifier, a slip up on my part by mixing single and double quotes (it's those little things here and there that end up snagging you!) Another approach with less captures, should it fit the context of the data: $str = preg_replace('#(?<==)\w+(?=")#', '', $str); As for your newest pattern involving less captures.. I automatically wondered why there was a lookahead assertion after the \w+ (afterall, \w does not encompass " characters, and thus should stop there. While your example works, I tested it without the forward assertion.. and it still works: $str = '<H1=2D"some_value">'; $str = preg_replace('#(?<==)\w+#', '', $str); echo $str; output (via right-click - view source): <H1="some_value"> Is the forward assertion absolutely necessary? Or am I missing something? Quote Link to comment Share on other sites More sharing options...
effigy Posted August 28, 2008 Share Posted August 28, 2008 It depends on the data. I think it's a safer approach because we're making two verifications rather than one: (1) the data must appear after an equals sign; and (2) the data must appear before a double quote. Otherwise, it could botch up something like this: <pre> <?php $str = '<H1=2D"some_value">The following is an emoticon: =o) The following is a formula: a=b*c'; $str = preg_replace('#(?<==)\w+#', '', $str); echo $str; ?> </pre> Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted August 28, 2008 Share Posted August 28, 2008 Given your last code snippet, when I run it, I get: source code ouput: <H1="some_value">The following is an emoticon: =) The following is a formula: a=*c This still acheives what we want, no? (meaning, remove any word characters after = but before "). Naturally, the rest gets outputted on screen. But as far as tags are concerned, would this still not suffice? Is there another example using the preg pattern last used that could yeild some unpredictable results? I do agree that the additional lookahead assertion would certainly 'strengthen' the 'conditions' required to match. I suppose better to be safe than sorry. Just as someone still learning regex, it has peaked my curiosity as to the 'why' aspect of it (not sure if I'm explaining myself correctly or not). Quote Link to comment Share on other sites More sharing options...
effigy Posted August 28, 2008 Share Posted August 28, 2008 It made unwanted modifications; observe: The following is an emoticon: =o) The following is a formula: a=b*c The following is an emoticon: =) The following is a formula: a=*c Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted August 28, 2008 Share Posted August 28, 2008 Touché Without the tag, it becomes very apparent! It's all clear now. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.