kratsg Posted October 18, 2007 Share Posted October 18, 2007 IE: $pattern = "/(ab|ac)/"; //either "ab" or "ac" $pattern = "/<img\s?src=([^>]*)>/"; //matches <img src=someurl.gif> or <imgsrc=someurl.gif> with the backreference being someurl.gif Is there a difference (and tell me if I made that second pattern wrong o_o It sounds right, although I keep getting stuck between lazy and greedy xD EDIT: the lazy: (.*?)> which I still don't get o_o It sounds like it gets everything except what's after it, but wouldn't it work the same way as (.*[^>])? Quote Link to comment Share on other sites More sharing options...
effigy Posted October 18, 2007 Share Posted October 18, 2007 A backreference refers to captured information, and or--commonly called "alternation"--is just that, it allows you to specify alternates to be matched. Greediness takes as much as possible, while laziness does not. For example, a + is the equivalent of {1,}, which is a minimum of one with an infinite maximum. When greedy, it takes the infinite number before considering following patterns; when lazy, it only takes one before considering following patterns. P.S. /(ab|ac)/ is better written as /(a[bc])/. Quote Link to comment Share on other sites More sharing options...
kratsg Posted October 19, 2007 Author Share Posted October 19, 2007 Ok, I can see the meaning of "alternation" as it's being applied. What about my "lazy" question as if both of these would work the same way? EDIT: the lazy: (.*?)> which I still don't get o_o It sounds like it gets everything except what's after it, but wouldn't it work the same way as (.*[^>])? Quote Link to comment Share on other sites More sharing options...
effigy Posted October 19, 2007 Share Posted October 19, 2007 The ? in the second expression is not indicating a lazy match, but an optional match. If you're trying to match data up to an ending tag, the better expression is /([^>]*)/. The first pattern will still work, but it's a little less informative and--I believe--inefficient. Since you know you don't want to match >, say so. The second expression will gobble up everything (except a new line), then backtrack to find the next non-> character. Of course, all of this is optional. Quote Link to comment Share on other sites More sharing options...
kratsg Posted October 19, 2007 Author Share Posted October 19, 2007 I can see how laziness is a bit inefficient. For me, I prefer to use pre-existing methods in new ways, such as including everything except the ending tag, instead of doing the lazy thing. Do you know of any speed issues between the two different ways? Quote Link to comment Share on other sites More sharing options...
effigy Posted October 19, 2007 Share Posted October 19, 2007 It depends on the data and what you're trying to match. In cases where either can be used, laziness is better if the stop character is going to be sooner than later and vice versa for greediness. When in doubt, use greediness. If speed is that crucial to you, I recommend running benchmarks. Quote Link to comment Share on other sites More sharing options...
kratsg Posted October 19, 2007 Author Share Posted October 19, 2007 So, if I was going to breakdown html tags, it would be better to do something like this: //img tag $pattern = "/<img\s?src=(\"|')?([^'\">]*)(\"|')?>/"; Where the img html could look like any of the following: <img src=URL> <img src="URL"> <img src='URL'> <imgsrc=URL> <imgsrc="URL"> <imgsrc='URL'> Correct? (even with the quotations?) I'm wondering though, if I was to reference the URL, would it be $1, or $2? I'm thinking it's $2. Quote Link to comment Share on other sites More sharing options...
effigy Posted October 19, 2007 Share Posted October 19, 2007 Why the optional space? imgsrc is not valid. You'll want to replace this with some flexibility since src may not be the first attribute in the tag. <pre> <?php $html = <<<HTML <img src="1.jpg"> <img style="border:none;" src=2.gif> <img src='3.png' border="3"> HTML; preg_match_all('/<img[^>]*src=([\'"])?((?(1).+?|[^\s>]+))(?(1)\1)[^>]*>/', $html, $matches); array_shift($matches); array_shift($matches); print_r($matches); ?> </pre> Quote Link to comment Share on other sites More sharing options...
kratsg Posted October 20, 2007 Author Share Posted October 20, 2007 Ok, that makes sense :-D Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.