Jump to content

Preg replace fails when string is larger than 1089 chars


sKunKbad

Recommended Posts

I'm on Ubuntu 16.04 with PHP7, and I have no encountered this problem in other environments. The following script fails (white screen of death) unless I subtract a character from $string. What is going on?

<?php

$string = "# MAKE SURE TO LEAVE THE NEXT TWO LINES HERE. # BEGIN DENY LIST -- # END DENY LIST --  asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd fsdfsdfsdfsdf asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd fsdfsdfsdfsdf asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd fsdfsdfsdfsdf asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd asdasdasdasdasd fsdfsdfsdfsdf asdasdasdasdasd asdasdasdasdasd ass";

$insert = 'Whatever';

$pattern = '/(?<=# BEGIN DENY LIST --)(.|\n)*(?=# END DENY LIST --)/';

// Within the string, replace the denial list with the new one
$string = preg_replace( $pattern, $insert, $string );

echo $string;
Link to comment
Share on other sites

I ended up coming up with a solution that uses explode and str_replace, but it relies on the DENY LIST comments being at the top of the file. I thought about using exec and sed, but I took the easy way out for now. Still curious as to what is up with preg_replace. It doesn't seem like it's very reliable if it can't handle a big string.

Link to comment
Share on other sites

You're using the worst possible regex for the input, so it's only natural that your script blows up. Since you're using a greedy quantifier in the middle part, the entire input after “BEGIN DENY LIST” is consumed. Then the regex engine has to go all the way back to “END DENY LIST”, character by character, each time checking the lookahead. If you anaylze the regex with a tool like Regex Buddy, you can actually see the excessive backtracking and the large number of required steps.

 

If the deny list is very small compared to the part after the “END DENY LIST”, try a nongreedy quantifier (like “*?”). Or simply use strpos() and strrpos(). Regular expressions aren't the solution to everything.

Edited by Jacques1
  • Like 1
Link to comment
Share on other sites

You're using the worst possible regex for the input, so it's only natural that your script blows up. Since you're using a greedy quantifier in the middle part, the entire input after “BEGIN DENY LIST” is consumed. Then the regex engine has to go all the way back to “END DENY LIST”, character by character, each time checking the lookahead. If you anaylze the regex with a tool like Regex Buddy, you can actually see the excessive backtracking and the large number of required steps.

 

If the deny list is very small compared to the part after the “END DENY LIST”, try a nongreedy quantifier (like “*?”). Or simply use strpos() and strrpos(). Regular expressions aren't the solution to everything.

 

Everything before "END DENY LIST --" ends up getting tossed out and dynamically rebuilt, so I just used explode:

$arr = explode('END DENY LIST --', $string);
$string = $new_deny_list . $arr[1];

I have a copy of Regex Buddy that's probably almost a decade old, and a book on regex, so I should probably go find them. In the interest of trying to understand what you're suggesting, I just found a site that does online regex analysis, https://regex101.com/

 

Do I understand correctly that the nongreedy quantifier would simply add a question mark after my asterisk, like this:

(?<=# BEGIN DENY LIST --)(.|\n)*?(?=# END DENY LIST --)

In the interest of learning, what would your regex look like if you had to use regex?

Link to comment
Share on other sites

Ungreedy will help but it will still crash on large enough inputs (that is, a long enough distance between the BEGIN and END). The problem is you're putting the quantifier on a capturing group, and the PCRE library will smash the stack trying to remember everything.

 

1. If you don't need to capture what's in there, don't use a capturing group.

2. Alternating .|\n is like using the /s flag but worse.

'/(?
  • Like 1
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.