supermerc Posted May 16, 2009 Share Posted May 16, 2009 Hey I want to do a preg_match_all search of my document and extract two things however I dont understand the things we need to put to make it extract. I had one that worked which was $pattern = "/src=[\"']?([^\"']?.*(png|jpg|gif))[\"']?/i"; But this only gets my the image link, theres something else further I also need. Forexample the complete string looks something like src="/images/potion_red.gif" style="border:0px;margin:0px;width:23px;height:24px;" ONMOUSEOVER="itempopup(event,'8986689')" What i would want to extract in that would be potion_red.gif AND 8986689 Please help me! Quote Link to comment Share on other sites More sharing options...
Ken2k7 Posted May 16, 2009 Share Posted May 16, 2009 $pattern = '#src=[\'\"]?/images/(.+)?[\'\"]?\s.*?itempopup\(event,[\'\"]?(\d+)?[\'\"]?\)#'; ? Quote Link to comment Share on other sites More sharing options...
supermerc Posted May 16, 2009 Author Share Posted May 16, 2009 Is that going to extract them together because I need to link the image name with the ID Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted May 16, 2009 Share Posted May 16, 2009 $str = <<<EOF src="/images/potion_red.gif" style="border:0px;margin:0px;width:23px;height:24px;" ONMOUSEOVER="itempopup(event,'8986689')" EOF; $pattern = '#src=[\'"][^\'"]+/(.+?\.(?:png|jpe?g|gif))[\'"].+?ONMOUSEOVER=[\'"]itempopup\(event,\'(\d+)\'\)[\'"]#i'; preg_match_all($pattern, $str, $matches); echo $matches[1][0] . ' - ' . $matches[2][0]; Ok, here's the logic. Start off with src= then either ' or ". Now here's the trick.. we want to stay within the src quotes, otherwise we might end up matching a file name ending with png, gif or jpeg further down the string (if it exists) if we aren't careful... We do this by following [\'"] with [^\'"]+. So at this point, it will match: /images/potion_red.gif However, this is too much info. So after [^\'"]+, we specify that we want /, then using a lazy quantifier, creep up, matching everything to (and including) a required dot followed by either png, gif, or jpe?g and finally the closing quote (be it ' or "). So while this causes some backtracking early, it ensures we stay within the quotes for accuracy. Next, I make the assumption that ONMOUSEOVER is still in the same line as src.. so I next make it match anything (other than a new line, as we are not using the s modifer after the cloding dleimiter) and be lazy about it: .+? until we match ONMOUSEOVER. Unfortunately, we can't use the typical [^\'"]+ safeguard we had in place with src because ONMOUSEOVER can contain both single and double quotes.. so we just lazily match along to find the rest (capturing the digits we find). Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.