idris Posted December 25, 2009 Share Posted December 25, 2009 Hi. It's me again. Question: is backtracking a good thing to do ? Do we always need to ignore backtracking ? Does it always reduce the performance ? Example: String: abcdefghi Regex #1 , .*i Regex #2 , .*?i First regex will go through a to i then backtrack one character and it's ok. Second regex will check if the next character is i every time it loops. So regex #1 needs 2 steps to be done while regex #2 needs 8. Or am I completely wrong ? So in this example, which one is better to use ? #1 or #2 ? Is backtracking completely BAD thing to do ? Quote Link to comment https://forums.phpfreaks.com/topic/186327-backtracking-is-something-good-to-do/ Share on other sites More sharing options...
Daniel0 Posted December 25, 2009 Share Posted December 25, 2009 Regex #1 does this: Nothing at the start matches "anything", can I still do that? Yes, the 'a' matches, can I still match anything, yeah the 'b' does. It continues to do that all the way until: can I still match anything, yes, the 'i' matches. Then it finally checks, can I match 'i'? Nope, so I'll go back one step. Now can I match 'i'? Yes. Done. Regex #2 does this: Can I match anything 0 or more times? Yes, the nothing at the start matches that. Now does 'i' match? No, okay then I'll try matching .*? again which 'a' does. Now does 'i' match? Not yet, but 'b' matches. It does all that until it has matched 'h'. At that point 'i' matches, the regex has ended and the engine tells you there is a match. Quote Link to comment https://forums.phpfreaks.com/topic/186327-backtracking-is-something-good-to-do/#findComment-984058 Share on other sites More sharing options...
idris Posted December 26, 2009 Author Share Posted December 26, 2009 first, thx for reply. So which one provides better performence ? It's always against the performance to use backtracking expressions ? Quote Link to comment https://forums.phpfreaks.com/topic/186327-backtracking-is-something-good-to-do/#findComment-984191 Share on other sites More sharing options...
cags Posted December 26, 2009 Share Posted December 26, 2009 If my understanding is correct the performance of the two would be very similar given the input string you gave. The thing to bare in mind here is that the two patterns are not different ways of writing the same thing as they will match different patterns, as such comparing the efficiency of the two patterns has no 'real' meaning outside the context of your simple example input. I would guess the first pattern will backtrack once as it will initially match the i then realise it needs to match an i and hence have to backtrack. Whereas the second example doesn't backtrack because the 'lazy' nature of the match. Backtracking is generally best avoided as much as possible as the more backtracking that is being performed the more steps the PCRE engine has to take and hence the longer it will take. Let's take your two patterns and use a different input script. $input = "abcdefghijklmnopqrtuvwxyz"; Since the first pattern is greedy, it will initially match all the way to z, it will then have to backtrack back to the i. If I'm honest I'm not 100% sure of the internals on this, but I imagine this would involve backtracking a letter at a time meaning you actual backtrack 17 times. The second pattern however will stop as soon as it matches the first i meaning it doesn't have to backtrack at all. Given a third input... $input = "abcdefghijklmnopqrtuvwxyzi"; The first pattern will match the whole string whereas the second pattern will only match 'abcdefghi'. This just goes to re-enforce the fact that comparing the efficiency of those two patterns only really have a meaning given the limited scope of a single input. Having said all this, I'm pretty green with regards to regular expressions, this is simply my interpretation of it I'm sure somebody else will chime in if I'm wrong. Quote Link to comment https://forums.phpfreaks.com/topic/186327-backtracking-is-something-good-to-do/#findComment-984198 Share on other sites More sharing options...
nrg_alpha Posted December 28, 2009 Share Posted December 28, 2009 You got it pretty much correct, cags. @idris The issue of backtracking and its effectiveness is highly circumstantial, depending on the pattern and the source string in question. In your initial example, using a greedy quantifier is prefereable, as the speed of backtracking once vs lazily checking every character prior to matching would be faster in this case I'd wager (this of course assumes that the required character is found only once, and at the end). In general, I am finding that using greedy quantifiers is like walking a slippery slope.. the issue I find more often than not is not speed, but rather accuracy issues. Given the 3rd input excample in cag's post would be an example of greedy not yielding to correct expectations if the desired outcome is abcdefghi and not abcdefghijklmnopqrtuvwxyzi. Like everything else, greedy quantifiers are tools. It's not correct to label it as evil, but when used incorrectly (and yes, it is used incorrectly more often than not), your concerns should be more centered around accuracy rather than speed. Yes, backtracking eats away at performance, but depending on how much backtracking will dictate whether it is desirable or not (assuming the end results are the same between using greedy vs lazy quantifiers). I always recommend this book to people.. so here I recommend it to you There is tons of info on the mechanics of how regex engines *think* and will definitely open the doors of understanding a hell of a lot more once you read that puppy! Quote Link to comment https://forums.phpfreaks.com/topic/186327-backtracking-is-something-good-to-do/#findComment-984665 Share on other sites More sharing options...
cags Posted December 28, 2009 Share Posted December 28, 2009 You got it pretty much correct, cags. Wonders will never cease... In yourinitial example, using a greedy quantifier is prefereable, as the speedof backtracking once vs lazily checking every character prior tomatching would be faster in this case I'd wager (this of course assumesthat the required character is found only once, and at the end). I theorised this myself, but wasn't sure enough to post it, I'm glad you've pretty much confirmed that though. Quote Link to comment https://forums.phpfreaks.com/topic/186327-backtracking-is-something-good-to-do/#findComment-984807 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.