AHA7 Posted May 22, 2007 Share Posted May 22, 2007 Hello, I am struggling with a regex format and I am starting to lose it I want to use PHP's preg_match_all() function to search HTML files for <img> and <embed> tages and extract all the src URLs from those tags on a given HTML document. I want to cover all the possibilities and forms that those tages may be formated in. Here's an example with all the matches highlighted: <html> <body> <h1>Multimedia Page</h1> < img src="http://ex.com/img.jpg"> this is just an <img style='margin-top: 10px' src='http://ex.com/img.jpg' >example this is a falsh object <embed type="application/x-shockwave-flash" src=" width="425" height="350"></embed> this is another flash object <embed (there is a newline, a tab and a space characters seperating the rest of this tag from its opening <embed) type="application/x-shockwave-flash" src=" width="425" height="350"></embed> Here is another image tag <IMG (newline) (new line and tab) (new line) SRC="http://ex.com/img.jpg" HEIGHT="10">... <body> </html> The regex in words: MATCH THE FOLLOWING: <img (or <IMG) followed by any character (including spaces, tabs newlines with any count) followed by src= (or SRC=) which may be followed by a single or double quotation mark followed by anything (this is the URL part which will be the first set of matches stored in the multi-dimentional array generated by preg_match_all()) followed by an optional single or double quotation mark followed by optional anything (including spaces, tabs and newlines with any count) until the first > (not greedy) OR (|) MATCH THE FOLLOWING: the same scenario but this time for the <embed> tag and the URL (anything in regex) after src= as the second set of matches. I know that the regex would be only one line long or so, but writing all the above is much simpler, at least to me! Link to comment https://forums.phpfreaks.com/topic/52451-solved-help-with-regex/ Share on other sites More sharing options...
effigy Posted May 22, 2007 Share Posted May 22, 2007 Run and view source: <pre> <?php $string = <<<STR <html> <body> <h1>Multimedia Page</h1> <img src="http://ex.com/img.jpg"> this is just an <img style='margin-top: 10px' src='http://ex.com/img.jpg' >example this is a falsh object <embed type="application/x-shockwave-flash" src="http://www.youtube.com/v/azWRiwAmGRM" width="425" height="350"></embed> this is another flash object <embed type="application/x-shockwave-flash" src="http://www.youtube.com/v/azWRiwAmGRM" width="425" height="350"></embed> Here is another image tag <IMG SRC="http://ex.com/img.jpg" HEIGHT="10">... <body> </html> STR; preg_match_all('/<(img|embed)[^>]*src=[\'"]([^\'"]+)/i', $string, $matches, PREG_SET_ORDER); print_r($matches); ?> </pre> Link to comment https://forums.phpfreaks.com/topic/52451-solved-help-with-regex/#findComment-259076 Share on other sites More sharing options...
AHA7 Posted May 23, 2007 Author Share Posted May 23, 2007 Thanks effigy! Link to comment https://forums.phpfreaks.com/topic/52451-solved-help-with-regex/#findComment-259543 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.