MK27 Posted June 18, 2010 Share Posted June 18, 2010 I have a line of text that ends like this: supports TEI-Lite, TEI XML, and TEI SGML documents.\", I need to remove the backslash before the quote, so I assumed a preg_replace like this would work: preg_replace("/\\\",$/","\",",$lines[$i-1]); One \ and one " to be replaced with ". It does not do the replacement correctly, even though it does match, which I realized by testing with: preg_replace("/\\\",$/","XX\",",$lines[$i-1]); Which changes the line to: supports TEI-Lite, TEI XML, and TEI SGML documents.\XX", Hmmm. So what does work is this: preg_replace("/\\\\\",$/","\",",$lines[$i-1]); To me, this looks like two \ (not one) then ". The line does not contain that, but this is the preg_replace which gets the job done. What have I misunderstood about php's regular expression handling? Quote Link to comment https://forums.phpfreaks.com/topic/205176-preg_replace-strangeness/ Share on other sites More sharing options...
kenrbnsn Posted June 18, 2010 Share Posted June 18, 2010 Why don't you just use a plain str_replace? <?php $str = 'supports TEI-Lite, TEI XML, and TEI SGML documents.\",'; $newstr = str_replace('\\','',$str); echo $newstr; ?> Ken Quote Link to comment https://forums.phpfreaks.com/topic/205176-preg_replace-strangeness/#findComment-1073959 Share on other sites More sharing options...
MK27 Posted June 18, 2010 Author Share Posted June 18, 2010 Why don't you just use a plain str_replace? I don't want to replace all occurances of \" with ", just the last one. I'd still like to know why preg_replace requires that. It seems like a bonified bug to me, which I am very surprised to find in a general function in a very widely used and seasoned language. Can anyone explain this (or is it really a bug)? Quote Link to comment https://forums.phpfreaks.com/topic/205176-preg_replace-strangeness/#findComment-1073967 Share on other sites More sharing options...
kratsg Posted June 18, 2010 Share Posted June 18, 2010 This is just a thought - it could be the differences between single quotes and double quotes. Try the following pattern: $pattern = '/\\\",$/'; To see if that works. Quote Link to comment https://forums.phpfreaks.com/topic/205176-preg_replace-strangeness/#findComment-1074007 Share on other sites More sharing options...
cags Posted June 18, 2010 Share Posted June 18, 2010 It's not a bug, you firstly need to account for what PHP thinks is an backslash escape sequence, you then need to account for what PCRE considers an escape sequence. You are using a double quote inside a double quoted string, thus meaning it needs to be escaped, there's one backslash. You then wish to match a backslash in your input string. In order to do this let's say we place a single quote in the string. PHP will see this as escaping the backslash which is supposed to be escaping the double quote, thus you need to escape it to prevent that happening. At this point we have 3 backslashes in our patterns. Out of these 3 only one will survive the PHP interpolation. Meaning the Regex pattern contains a single slash. The PCRE engine will assume this backslash is an escape sequence. In order to counter that we need to make sure 2 make it through the the PCRE engine, the only way to do this is add another 2 into the string. That's 5 backslashes. As kratsg has pointed out, this can be alleviated somewhat by using a single quote string, since the double quote then doesn't need escaping. I think you will still need 4 though not the 3 they suggested. Quote Link to comment https://forums.phpfreaks.com/topic/205176-preg_replace-strangeness/#findComment-1074023 Share on other sites More sharing options...
MK27 Posted June 18, 2010 Author Share Posted June 18, 2010 It's not a bug, you firstly need to account for what PHP thinks is an backslash escape sequence, you then need to account for what PCRE considers an escape sequence. You are using a double quote inside a double quoted string, thus meaning it needs to be escaped, there's one backslash. You then wish to match a backslash in your input string. In order to do this let's say we place a single quote in the string. PHP will see this as escaping the backslash which is supposed to be escaping the double quote, thus you need to escape it to prevent that happening. At this point we have 3 backslashes in our patterns. Out of these 3 only one will survive the PHP interpolation. Meaning the Regex pattern contains a single slash. The PCRE engine will assume this backslash is an escape sequence. In order to counter that we need to make sure 2 make it through the the PCRE engine, the only way to do this is add another 2 into the string. That's 5 backslashes. As kratsg has pointed out, this can be alleviated somewhat by using a single quote string, since the double quote then doesn't need escaping. I think you will still need 4 though not the 3 they suggested. I'm think I follow you on this but (sorry to nitpick), it is still a bug/oversight: <?php $s = 'hello \ world'; $s = preg_replace('/\\/','_', $s); print $s; ?> Here I get "PHP Warning: preg_replace(): No ending delimiter '/' found in /media/sda6/root/php/test.php on line 3" I did find this in a tutorial: If you are looking for a backslash, you need to escape that also. But, we also need to escape the control character too, which is itself a backslash, hence we need to escape twice like this \\\\ In fact three \\\ does work. I'm coming from perl, where that is not the case altho the delimiters, PCRE, etc, are the same -- this story about how logically you must "escape the control character" is a bit bogus. It's due to an oversight in the design (again: not to gripe! just honesty). I guess knowing about the issue suffices as a "fix". Quote Link to comment https://forums.phpfreaks.com/topic/205176-preg_replace-strangeness/#findComment-1074033 Share on other sites More sharing options...
cags Posted June 18, 2010 Share Posted June 18, 2010 Just because it doesn't work how you want it to, does not mean there was an oversight or that there is a bug. The issue IS to do with escaping characters successfully so that at the point the PCRE engine receives them it is a valid PCRE pattern. It has nothing to do with preg_replace, it's just the way PHP handles strings. Obviously I was slightly incorrect in my previous post, but hey it get's confusing and it's been a long day. If you don't believe me try echo'ing out your pattern before you pass it to preg_replace. $pattern = '/\/'; print $pattern . '<br/>'; $pattern = '/\\/'; print $pattern . '<br/>'; // etc... Quote Link to comment https://forums.phpfreaks.com/topic/205176-preg_replace-strangeness/#findComment-1074045 Share on other sites More sharing options...
MK27 Posted June 18, 2010 Author Share Posted June 18, 2010 Just because it doesn't work how you want it to, does not mean there was an oversight or that there is a bug. [...] It has nothing to do with preg_replace, it's just the way PHP handles strings. I don't want to sound like some perl jerk who's here to slam php, but the fact that this is a result of "the way PHP handles strings" still makes it seem like an oversight or consequence of the design. I can also see why there is no need to "fix" that, tho. try echo'ing out your pattern before you pass it to preg_replace. That is enlightening, thanks. Quote Link to comment https://forums.phpfreaks.com/topic/205176-preg_replace-strangeness/#findComment-1074050 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.