idris Posted December 8, 2009 Share Posted December 8, 2009 This works: $pattern="#<title>(.*) - .*</title>.*<!-- body -->(.*)<!-- / body -->#si"; This doesn't work: $pattern="#<title>(.*) -.*</title>.*<!-- body -->(.*)<!-- / body -->#si"; The dot can't replace space ? Or is there any other problem ? Quote Link to comment Share on other sites More sharing options...
salathe Posted December 8, 2009 Share Posted December 8, 2009 It's no use saying that something works or doesn't work without providing snippets for us to either a) tell you what's wrong, or b) run, see the problem and help you to figure it out. In answer to your questions; yes, dot can replace space and yes, there probably is. Quote Link to comment Share on other sites More sharing options...
idris Posted December 9, 2009 Author Share Posted December 9, 2009 here is the string $str=file_get_contents("http://www.skyscrapercity.com/showthread.php?t=1015089"); I'm trying to get title and message of this topic in one pattern. But instead of giving me any different regex, what is wrong in my regex? I became crazy cuz of this.. Quote Link to comment Share on other sites More sharing options...
MadTechie Posted December 9, 2009 Share Posted December 9, 2009 Well it didn't work at all for me, the problem is probably greediness, I replaced body with message to get it working but you need to be careful when using .* as it grabs as much as it can, try less greedy .*? instead Quote Link to comment Share on other sites More sharing options...
idris Posted December 9, 2009 Author Share Posted December 9, 2009 Sry I can't edit my topics. It isn't body it's message, you are right. But there is only one of <!-- message --> <!-- / message --> tags. Why do i need to care about greediness? Also there is only one of <title></title> tags. Are you sure its about greediness? Quote Link to comment Share on other sites More sharing options...
MadTechie Posted December 9, 2009 Share Posted December 9, 2009 As i was unable to re-create the problem (No having the correct RegEx) i said its "probably" as that's the most obvious problem with that RegEx, Quote Link to comment Share on other sites More sharing options...
salathe Posted December 9, 2009 Share Posted December 9, 2009 The issue, as far as I can see, is that your regular expression is particularly inefficient (like searching for any amount of any characters then being forced to backtrack through many thousands of characters!). By default (as of PHP 5.2.0) there is a backtrack limit of 100,000. Your regex (with the space missing) needs to do more than that many backtracks so it will fail. To see that this is the problem, use preg_last_error which returns an integer corresponding to one of the PREG_*_ERROR constants. Quote Link to comment Share on other sites More sharing options...
idris Posted December 9, 2009 Author Share Posted December 9, 2009 The issue, as far as I can see, is that your regular expression is particularly inefficient (like searching for any amount of any characters then being forced to backtrack through many thousands of characters!). By default (as of PHP 5.2.0) there is a backtrack limit of 100,000. Your regex (with the space missing) needs to do more than that many backtracks so it will fail. To see that this is the problem, use preg_last_error which returns an integer corresponding to one of the PREG_*_ERROR constants. Yes you are right. it's PREG_BACKTRACK_LIMIT_ERROR. But I didn't really understand you. There aren't 100.000 characters in this page ? Can you describe a bit more ? And how can i fix this pattern ? is it possible to fix ? Thanks. Quote Link to comment Share on other sites More sharing options...
idris Posted December 9, 2009 Author Share Posted December 9, 2009 Since i am not a master in regular expressions but learner, better to ask, what is backtracking ? Cuz it works when i use a lazy quantifier in first subpattern :/ Please describe me backtracking in human-language Quote Link to comment Share on other sites More sharing options...
salathe Posted December 9, 2009 Share Posted December 9, 2009 Backtracking is a process that the regex engine will go through in an attempt to find a match, if it can. Consider the following string and regular expression: String: abcdefghi Regex: .*f When that regex is executed, the .* will match all of the characters in the string. Then it will look for an f which cannot be matched (since we're at the end of the string!). At this point the engine backtracks one character and tries allowing .* to match all but the last character in the string. Again it tries to match the letter f but fails. The engine will keep backtracking, one character at a time, until either a) an f is found or b) it has backtracked through the entire string. If that's unclear, here's the basic process that happens: [*]Match .* => abcdefghi [*]Match f => Fails (at end of string), backtrack [*]Match .* => abcdefgh [*]Match f => Fails (found i), backtrack [*]Match .* => abcdefg [*]Match f => Fails (found h), backtrack [*]Match .* => abcdef [*]Match f => Fails (found g), backtrack [*]Match .* => abcde [*]Match f => Success! abcdef Quote Link to comment Share on other sites More sharing options...
MadTechie Posted December 9, 2009 Share Posted December 9, 2009 OKAY, salathe has posted a nice example and i have a bad one.. but i don't care i spent the time typing it i am god dame going to posts it Okay lets look at a bad example, I wish to match L (2 or more O's) and K So it fines LOOK, LOOOK, LOOOOK etc etc etc LO+O+K Now while that may look fine, when we break it down we can see the problem On the work LOOOK, this part of the code LO+ finds LOOO, now the next part of the code O+ Needs a O but the first part was greedy and took them all!.. SO I backtrack (go back one character at a time from the last greedy expression) now with this simple example that's just a O but I need to hold all this backtrack data in memory, so theirs a limit (you can change this limit in the php.ini) HERE is another example String (note the VOL near the middle) LOLOLOLOLOLOLOLOLOLVOLOLOLOLOLOLOLOLOLO regex [LOV]*VOL This will match the whole string then backtrack until it gets to the middle so about 20 backtrack steps Now this RegEx [LOV]*?VOL does the reverse it gets the first L or O or V then checks for the string VOL, then repeats until it matches VOL then stops matching Greedy vs non-greedy(lazy) So with .*21 in this string 0123456789101112131415161718192021222324252627282930 will match 0123456789101112131415161718192021 about 52 steps this .*?21 will only match this 01234567891011121 about 30 steps Humm.. hope that helps! PS nice example salathe Quote Link to comment Share on other sites More sharing options...
cags Posted December 9, 2009 Share Posted December 9, 2009 [ot] OKAY, salathe has posted a nice example and i have a bad one.. but i don't care i spent the time typing it i am god dame going to posts it Hahaha, I actually lol'd reading that. It echo's my sentiments on several occasions. [/ot] Quote Link to comment Share on other sites More sharing options...
.josh Posted December 9, 2009 Share Posted December 9, 2009 you can *probably* get rid of your error by doing .*? instead of .* Quote Link to comment Share on other sites More sharing options...
salathe Posted December 9, 2009 Share Posted December 9, 2009 [ot] i don't care i spent the time typing it i am god dame going to posts it To be honest, this same situation occurs for me a couple of times every day. I'm happy to see that I'm not the only one who this happens to! It can be frustrating at times to have spent so much time only to not publish the post. But the frustration is only short lived and there's plenty more threads to visit.[/ot] Quote Link to comment Share on other sites More sharing options...
.josh Posted December 9, 2009 Share Posted December 9, 2009 [ot] [ot] i don't care i spent the time typing it i am god dame going to posts it To be honest, this same situation occurs for me a couple of times every day. I'm happy to see that I'm not the only one who this happens to! It can be frustrating at times to have spent so much time only to not publish the post. But the frustration is only short lived and there's plenty more threads to visit.[/ot] kind of like going to your gf's house and seeing your best friend is already on top of her... [/ot] Quote Link to comment Share on other sites More sharing options...
salathe Posted December 9, 2009 Share Posted December 9, 2009 [ot] kind of like going to your gf's house and seeing your best friend is already on top of her... Sure, but it's not every day that that happens. [/ot] Quote Link to comment Share on other sites More sharing options...
cags Posted December 9, 2009 Share Posted December 9, 2009 [ot] [ot] i don't care i spent the time typing it i am god dame going to posts it To be honest, this same situation occurs for me a couple of times every day. I'm happy to see that I'm not the only one who this happens to! It can be frustrating at times to have spent so much time only to not publish the post. But the frustration is only short lived and there's plenty more threads to visit.[/ot] Schadenfreude![/ot] Quote Link to comment Share on other sites More sharing options...
idris Posted December 10, 2009 Author Share Posted December 10, 2009 Thanks a lot.. Just learned something I didn't know. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.