AHA7 Posted May 22, 2007 Share Posted May 22, 2007 Hello, I am struggling with a regex format and I am starting to lose it I want to use PHP's preg_match_all() function to search HTML files for <img> and <embed> tages and extract all the src URLs from those tags on a given HTML document. I want to cover all the possibilities and forms that those tages may be formated in. Here's an example with all the matches highlighted: <html> <body> <h1>Multimedia Page</h1> < img src="http://ex.com/img.jpg"> this is just an <img style='margin-top: 10px' src='http://ex.com/img.jpg' >example this is a falsh object <embed type="application/x-shockwave-flash" src=" width="425" height="350"></embed> this is another flash object <embed (there is a newline, a tab and a space characters seperating the rest of this tag from its opening <embed) type="application/x-shockwave-flash" src=" width="425" height="350"></embed> Here is another image tag <IMG (newline) (new line and tab) (new line) SRC="http://ex.com/img.jpg" HEIGHT="10">... <body> </html> The regex in words: MATCH THE FOLLOWING: <img (or <IMG) followed by any character (including spaces, tabs newlines with any count) followed by src= (or SRC=) which may be followed by a single or double quotation mark followed by anything (this is the URL part which will be the first set of matches stored in the multi-dimentional array generated by preg_match_all()) followed by an optional single or double quotation mark followed by optional anything (including spaces, tabs and newlines with any count) until the first > (not greedy) OR (|) MATCH THE FOLLOWING: the same scenario but this time for the <embed> tag and the URL (anything in regex) after src= as the second set of matches. I know that the regex would be only one line long or so, but writing all the above is much simpler, at least to me! Quote Link to comment Share on other sites More sharing options...
effigy Posted May 22, 2007 Share Posted May 22, 2007 Run and view source: <pre> <?php $string = <<<STR <html> <body> <h1>Multimedia Page</h1> <img src="http://ex.com/img.jpg"> this is just an <img style='margin-top: 10px' src='http://ex.com/img.jpg' >example this is a falsh object <embed type="application/x-shockwave-flash" src="http://www.youtube.com/v/azWRiwAmGRM" width="425" height="350"></embed> this is another flash object <embed type="application/x-shockwave-flash" src="http://www.youtube.com/v/azWRiwAmGRM" width="425" height="350"></embed> Here is another image tag <IMG SRC="http://ex.com/img.jpg" HEIGHT="10">... <body> </html> STR; preg_match_all('/<(img|embed)[^>]*src=[\'"]([^\'"]+)/i', $string, $matches, PREG_SET_ORDER); print_r($matches); ?> </pre> Quote Link to comment Share on other sites More sharing options...
AHA7 Posted May 23, 2007 Author Share Posted May 23, 2007 Thanks effigy! Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.