Jump to content

Regex Help (PCRE)


idris

Recommended Posts

This works:

 

$pattern="#<title>(.*) - .*</title>.*<!-- body -->(.*)<!-- / body -->#si";

 

This doesn't work:

 

$pattern="#<title>(.*) -.*</title>.*<!-- body -->(.*)<!-- / body -->#si";

 

 

The dot can't replace space ? Or is there any other problem ?

Link to comment
Share on other sites

It's no use saying that something works or doesn't work without providing snippets for us to either a) tell you what's wrong, or b) run, see the problem and help you to figure it out.

 

In answer to your questions; yes, dot can replace space and yes, there probably is.

Link to comment
Share on other sites

Sry I can't edit my topics. It isn't body it's message, you are right.

 

But there is only one of <!-- message --> <!-- / message --> tags. Why do i need to care about greediness?

Also there is only one of <title></title> tags. Are you sure its about greediness?

Link to comment
Share on other sites

The issue, as far as I can see, is that your regular expression is particularly inefficient (like searching for any amount of any characters then being forced to backtrack through many thousands of characters!).

 

By default (as of PHP 5.2.0) there is a backtrack limit of 100,000. Your regex (with the space missing) needs to do more than that many backtracks so it will fail.  To see that this is the problem, use preg_last_error which returns an integer corresponding to one of the PREG_*_ERROR constants.

Link to comment
Share on other sites

The issue, as far as I can see, is that your regular expression is particularly inefficient (like searching for any amount of any characters then being forced to backtrack through many thousands of characters!).

 

By default (as of PHP 5.2.0) there is a backtrack limit of 100,000. Your regex (with the space missing) needs to do more than that many backtracks so it will fail.  To see that this is the problem, use preg_last_error which returns an integer corresponding to one of the PREG_*_ERROR constants.

 

Yes you are right. it's PREG_BACKTRACK_LIMIT_ERROR.

 

But I didn't really understand you. There aren't 100.000 characters in this page ? Can you describe a bit more ? And how can i fix this pattern ? is it possible to fix ?

 

Thanks.

Link to comment
Share on other sites

Backtracking is a process that the regex engine will go through in an attempt to find a match, if it can. Consider the following string and regular expression:

 

  • String: abcdefghi
  • Regex: .*f

 

When that regex is executed, the .* will match all of the characters in the string. Then it will look for an f which cannot be matched (since we're at the end of the string!).  At this point the engine backtracks one character and tries allowing .* to match all but the last character in the string. Again it tries to match the letter f but fails. The engine will keep backtracking, one character at a time, until either a) an f is found or b) it has backtracked through the entire string.

 

If that's unclear, here's the basic process that happens:

 

[*]Match .* => abcdefghi

[*]Match f => Fails (at end of string), backtrack

[*]Match .* => abcdefgh

[*]Match f => Fails (found i), backtrack

[*]Match .* => abcdefg

[*]Match f => Fails (found h), backtrack

[*]Match .* => abcdef

[*]Match f => Fails (found g), backtrack

[*]Match .* => abcde

[*]Match f => Success! abcdef

Link to comment
Share on other sites

OKAY, salathe has posted a nice example and i have a bad one.. but i don't care i spent the time typing it i am god dame going to posts it  :P

 

Okay lets look at a bad example,

I wish to match L (2 or more O's) and K

So it fines LOOK, LOOOK, LOOOOK etc etc etc

LO+O+K

Now while that may look fine,

when we break it down we can see the problem

On the work LOOOK, this part of the code

LO+

finds LOOO, now the next part of the code

O+

Needs a O but the first part was greedy and took them all!.. SO I backtrack (go back one character at a time from the last greedy expression) now with this simple example that's just a O but I need to hold all this backtrack data in memory, so theirs a limit (you can change this limit in the php.ini)

 

HERE is another example

String (note the VOL near the middle)

LOLOLOLOLOLOLOLOLOLVOLOLOLOLOLOLOLOLOLO

regex

[LOV]*VOL

 

This will match the whole string then backtrack until it gets to the middle so about 20 backtrack steps

Now this RegEx

[LOV]*?VOL

does the reverse it gets the first L or O or V then checks for the string VOL, then repeats until it matches VOL then stops matching

 

Greedy vs non-greedy(lazy)

So with

.*21

in this string

0123456789101112131415161718192021222324252627282930

will match

0123456789101112131415161718192021

about 52 steps

this

.*?21

will only match this

01234567891011121

about 30 steps

 

Humm.. hope that helps!

 

PS nice example salathe

Link to comment
Share on other sites

[ot]

OKAY, salathe has posted a nice example and i have a bad one.. but i don't care i spent the time typing it i am god dame going to posts it  :P

Hahaha, I actually lol'd reading that. It echo's my sentiments on several occasions.

[/ot]

Link to comment
Share on other sites

[ot]

i don't care i spent the time typing it i am god dame going to posts it  :P

 

To be honest, this same situation occurs for me a couple of times every day. I'm happy to see that I'm not the only one who this happens to! It can be frustrating at times to have spent so much time only to not publish the post. But the frustration is only short lived and there's plenty more threads to visit.[/ot]

Link to comment
Share on other sites

[ot]

[ot]

i don't care i spent the time typing it i am god dame going to posts it  :P

 

To be honest, this same situation occurs for me a couple of times every day. I'm happy to see that I'm not the only one who this happens to! It can be frustrating at times to have spent so much time only to not publish the post. But the frustration is only short lived and there's plenty more threads to visit.[/ot]

kind of like going to your gf's house and seeing your best friend is already on top of her...

[/ot]

Link to comment
Share on other sites

[ot]

[ot]

i don't care i spent the time typing it i am god dame going to posts it  :P

To be honest, this same situation occurs for me a couple of times every day. I'm happy to see that I'm not the only one who this happens to! It can be frustrating at times to have spent so much time only to not publish the post. But the frustration is only short lived and there's plenty more threads to visit.[/ot]

Schadenfreude![/ot]

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.