wintallo Posted January 4, 2008 Share Posted January 4, 2008 Hello, Right now, I'm trying to right a piece of PHP that grabs data out of a string, based on a regular expression. This is the regex: (width)(=")[0-9]{3,4} Say I want to get the "425" out of the following bit of HTML (stored as a string in PHP) and store it in another variable. <object width="425" height="350"> <param name="movie" value="http://www.youtube.com/v/SRzm3wm1Qu0"></param><param name="wmode" value="transparent"></param> <embed src="http://www.youtube.com/v/SRzm3wm1Qu0" type="application/x-shockwave-flash" wmode="transparent" width="425" height="350"></embed> </object> The regex that matches the width="425" but I don't know how to use it with a PHP function to actually get the number out of the string. I looked into ereg, which test if its in the string or not, and ereg_replace, which replaces it in the string. Niether of those functions do what I am looking for. I want to (in terms of the above example) get the "425" out of the block of HTML. Thanks for the read! (and sorry if this shouldn't be in the regex forum ) Quote Link to comment https://forums.phpfreaks.com/topic/84404-solved-using-regular-expressions-to-grab-data-out-of-a-string/ Share on other sites More sharing options...
teng84 Posted January 4, 2008 Share Posted January 4, 2008 $string ='<object width="425" height="350"> <param name="movie" value="http://www.youtube.com/v/SRzm3wm1Qu0"></param><param name="wmode" value="transparent"></param> <embed src="http://www.youtube.com/v/SRzm3wm1Qu0" type="application/x-shockwave-flash" wmode="transparent" width="425" height="350"></embed> </object>'; preg_match_all('~width="(.*?)"~s',$string, $matches); echo $matches[1][1]; should give 425 Quote Link to comment https://forums.phpfreaks.com/topic/84404-solved-using-regular-expressions-to-grab-data-out-of-a-string/#findComment-429980 Share on other sites More sharing options...
dsaba Posted January 4, 2008 Share Posted January 4, 2008 If you want to match the substring, here's a faster alternative that produces no backtracking on the regex engine: ~width="([0-9]+)"~ Tested: http://nancywalshee03.freehostia.com/regextester/regex_tester.php?seeSaved=cocamcrd You have two options in matching your '425' , the first is demonstrated in the above example, where you can match it in the subgroup(subgroups are the parts of the pattern in parenthesis), the other option is to match only the 425 with lookaheads and lookbehinds like so: ~(?<=width=")[0-9]+(?=")~ Tested: http://nancywalshee03.freehostia.com/regextester/regex_tester.php?seeSaved=favns8nv The advantage of the 2nd regex I just stated is that you can only replace what you want with preg_replace() can leaving everything else unchanged. preg_replace() only replaces the full pattern match, which in the 2nd option is only what you want. While with the first option the content that is not the subgroup is not dynamic, it is known 'width="' and ' " ' . So you could still replace only the 425 and put back the known other parts of the match. Of course... There are instances where it is much harder to create a regex pattern to only match these subgroups that you want to replace, or just different circumstances, this is why I created this function that only replaces the subgroups within a haystack. See it here: http://tinyurl.com/yvkbak Read about lookaheads, lookbehinds, and other regex methods: http://www.regular-expressions.info/refadv.html I had to figure out this knowledge the hard way I wish someone would have told me this.. like so.. So continue spreading the knowledge! Quote Link to comment https://forums.phpfreaks.com/topic/84404-solved-using-regular-expressions-to-grab-data-out-of-a-string/#findComment-430098 Share on other sites More sharing options...
wintallo Posted January 4, 2008 Author Share Posted January 4, 2008 Thanks a lot for your replies! I have a few questions though. I gave the regex: (width)(=")[0-9]{3,4} You gave me a regex that looks a lot different: ~(?<=width=")[0-9]+(?=")~ I honestly have no idea how to read the latter regex. If I wanted to write a regex that works with preg_match_all (like the one you gave me) that matches both the example I gave above: width="425" and width:425px; how would I do that? In both cases I wanted the "$matches" array to contain 425. Quote Link to comment https://forums.phpfreaks.com/topic/84404-solved-using-regular-expressions-to-grab-data-out-of-a-string/#findComment-430627 Share on other sites More sharing options...
effigy Posted January 4, 2008 Share Posted January 4, 2008 I honestly have no idea how to read the latter regex. NODE EXPLANATION ---------------------------------------------------------------------- (?<= look behind to see if there is: ---------------------------------------------------------------------- width=" 'width="' ---------------------------------------------------------------------- ) end of look-behind ---------------------------------------------------------------------- [0-9]+ any character of: '0' to '9' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- (?= look ahead to see if there is: ---------------------------------------------------------------------- " '"' ---------------------------------------------------------------------- ) end of look-ahead matches both... width="425" and width:425px; how would I do that? ~(?<=width[:=])\D?(\d+)~ Quote Link to comment https://forums.phpfreaks.com/topic/84404-solved-using-regular-expressions-to-grab-data-out-of-a-string/#findComment-430645 Share on other sites More sharing options...
dsaba Posted January 4, 2008 Share Posted January 4, 2008 I did not see that you wanted to match 'width:425px;' in your earlier post #matching just the 425 this one very specific: ~((?<=width=")[0-9]+(?=")|(?<=width:)[0-9]+(?=px;))~ this one more general: ~((?<=width=")|(?<=width:))[0-9]+((?=")|(?=px;))~ #matching all of it and 425 in the 2nd subgroup ($2): ~width(="|([0-9]+)("|px;)~ If you want to better understand these patterns lookup/study these symbols and what they mean: (?<=) lookbehind (?=) lookahead | the OR pipe used in parenthesis (matchthis|orthat) [0-9] character classes + repetition symbol Quote Link to comment https://forums.phpfreaks.com/topic/84404-solved-using-regular-expressions-to-grab-data-out-of-a-string/#findComment-430658 Share on other sites More sharing options...
wintallo Posted January 4, 2008 Author Share Posted January 4, 2008 Thanks you so much for all your help guys! For future viewers: This is the code I used to do what I was looking for: $movie_code = '<object width="425" height="350"><param name="movie" value="http://www.youtube.com/v/SRzm3wm1Qu0"></param><param name="wmode" value="transparent"></param><embed src="http://www.youtube.com/v/SRzm3wm1Qu0" type="application/x-shockwave-flash" wmode="transparent" width="425" height="350"></embed></object>'; preg_match_all('~((?<=width=")[0-9]+(?=")|(?<=width:)[0-9]+(?=px;))~', $movie_code, $matches); echo $matches[0][0]."<br />"; $movie_code = '<embed style="width:400px; height:326px;" id="VideoPlayback" type="application/x-shockwave-flash" src="http://video.google.com/googleplayer.swf?docId=3728266100951844857&hl=en" flashvars=""> </embed>>'; preg_match_all('~((?<=width=")[0-9]+(?=")|(?<=width:)[0-9]+(?=px;))~', $movie_code, $matches); echo $matches[0][0]; // The first "echo" outputted 425 and the second outputted 400. Yay! That's just what I needed! Keywords for the Google Spider: use regexp regex regular expressions to grab extract get pull HTML attributes parameters preg_match preg_match_all php Quote Link to comment https://forums.phpfreaks.com/topic/84404-solved-using-regular-expressions-to-grab-data-out-of-a-string/#findComment-430665 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.