Jump to content

Various Issues with backslashes and replacing strings


JacobSeated
Go to solution Solved by kicken,

Recommended Posts

I recently fixed a bug in my own website involving preg_replace, the thing is, each time I was editing an article in the front-end, preg_replace would be called on the HTML to replace certain HTML elements.

What I suspect is that preg_replace has a bug that causes it to remove certain characters from the replacement string, leading to corruption of the output.
Obviously only the HTML (haystack) should be modified, and the replacement string dropped in place of the needle, without modifying it in any way. This is not what happens if the replacement string contains backslashes.

I have tried to figure out what exactly is going on here, and have come up with a fix by using str_replace instead. But, I wonder if there is a solution that would allow me to keep using preg_replace? I also wonder if there are other characters that might be removed doing the replacement operation?

I know that you must escape backslashes when declaring variables, but the replacement string is obtained directly from a MySQL database, and I know the data is OK.

The fact that you need to escape literal backslashes in PHP scripts makes it harder to debug the problem. For example, if you just try my solution directly, without addslashes, you will be missing backslashes. I guess you either have to escape those, or load the data from a file. This is my current solution (I do not use addslashes in the live version):

$html = '<div>REPLACEMENT_ID</div>';

$replacement_id = 'REPLACEMENT_ID';
$replacement = addslashes('<pre>\\\\</pre>');

// $html = preg_replace("|{$replacement_id}|", $replacement, $html);
// $html = str_replace($replacement_id, $replacement, $html);

$pos = strpos($html, $replacement_id);
if ($pos !== false) {
    $html = substr_replace($html, $replacement, $pos, strlen($replacement_id));
}

print_r($html);


If you comment out the substr_replace test, and instead uncomment the preg_replace one, then you will get an inaccurate number of backslashes, similar to the result I got when using data directly from my database.

Hope someone can help shed some light on this 😄
 

Edited by JacobSeated
Link to comment
Share on other sites

  • Solution

This is just a matter of two separate levels of escape sequence processing that you need to wrap your mind around, which can be difficult at times.

When you're setting a string in PHP first you have PHP's escape sequence processing.  PCRE then has it's own level of processing that is done on the value that was passed to the function.  For example, if you wanted to use \0 in a replacement literally rather than have it interpreted as a back reference you have to pass the string '\\0' as your replacement.    If you're defining value in your PHP source as a string then you need to escape those slashes again for PHP's sake so you have $replacement = "\\\\0"

If you get the value from a file or database you don't have to worry about the PHP level of escaping, but do still need to account for the PCRE level so you need your file to contain \\0 not just \0.

It's not clear to me exactly what output you're expecting in your code sample.  The addslashes call effectively mitigates PHP's escaping meaning $replacement is set to the literal value "<pre>\\\\</pre>".  preg_replace will then see that and process it's own escaping which means the value it's working with is effectively "<pre>\\</pre>".  That means your final replaced output would be "<div><pre>\\</pre></div>"

If you have "<pre>\\\\</pre>" stored in your database and are pulling that value from there then you should get the same result, just don't run it through addslashes() as you don't have to deal with the PHP level of escaping things.

 

  • Thanks 1
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.