EthanV2 Posted November 2, 2008 Share Posted November 2, 2008 Hi, just wondering if anyone could help me out with this. I just need a piece of code, or a function, that will search a string for any images, then put the list of images into an array. So far all of my Google'ing hasn't shown up any results, so any help would be much appreciated. I'll be using the function to scan an HTML page, and generate a list of images embedded on that page (using the <img> tag), but I just can't seem to get it to work. Link to comment https://forums.phpfreaks.com/topic/131039-regex-help-build-a-list-of-images-on-a-page/ Share on other sites More sharing options...
ddrudik Posted November 2, 2008 Share Posted November 2, 2008 This is a starter function, you could certainly extend it by checking for relative URLs in the returned array and then prepending the directory URL of the page so that the image src would be complete etc. <pre> <?php function getimgs($page){ $ch=curl_init(); curl_setopt($ch, CURLOPT_URL, $page); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); preg_match_all('/<img\s[^>]*\bsrc=(??:([\'"])((??!\1|\?).)*))|([^\s?>]*))/is',curl_exec($ch),$images); curl_close($ch); $images=array_merge(array_filter($images[2]),array_filter($images[3])); return $images; } echo print_r(getimgs('http://www.phpfreaks.com/forums/'),true); ?> Link to comment https://forums.phpfreaks.com/topic/131039-regex-help-build-a-list-of-images-on-a-page/#findComment-680341 Share on other sites More sharing options...
nrg_alpha Posted November 2, 2008 Share Posted November 2, 2008 This solution assumes you only want the directories / image file names.. $str = <<<DATA <table id="logintable"> <tr id="usernamerow"> <th rowspan="3" valign="top">Login</th> <td>Username:</td> <td><input type="text" class="text150" name="vb_login_username" /></td> <td id="submit"><input type="image" src="/mainpage_images/buttons/submit.gif" alt="Submit" /></td> </tr> <tr id="passwordrow"> <td>Password:</td> <td><input type="password" class="text150" name="vb_login_password" /></td> <td id="join"><a href="/forums/register.php"><img src="/mainpage_images/buttons/join.gif" alt="Join" title="Join" border="0" /></a></td> </tr> <tr id="remembermerow"> <td align="right"><input type="checkbox" name="cookieuser" value="1" tabindex="3" id="cb_cookieuser_navbar" accesskey="c" checked="checked" /></td><td align="left"><label for="cb_cookieuser_navbar">Remember Me</label></td> <td id="help"><a href="/forums/faq.php?"><img src="/mainpage_images/buttons/help.gif" alt="Help" title="Help" width="15" height="15" border="0" /></a></td> </tr> </table> DATA; preg_match_all('#<img[^"]+"([^"]+)"#', $str, $matches); echo '<pre>'; print_r($matches[1]); echo '</pre>'; Output: Array ( [0] => /mainpage_images/buttons/join.gif [1] => /mainpage_images/buttons/help.gif ) If you want only the filenames, and nothing else, you can use this pattern instead: preg_match_all('#<img.+?([^/]+\.[^"]+)"#', $str, $matches); Output: Array ( [0] => join.gif [1] => help.gif ) Link to comment https://forums.phpfreaks.com/topic/131039-regex-help-build-a-list-of-images-on-a-page/#findComment-680379 Share on other sites More sharing options...
ddrudik Posted November 2, 2008 Share Posted November 2, 2008 My code in my original post was munged by the forum server during posting (\'s removed before alpha characters and a ' removed from within [ ] character set and before the pattern, quite unintended changes that makes me concerned about posting future source code here). Here's the proper code posting: http://pastebin.com/f42a371cc Here's the pattern in action matching against img tags without quotes, with single quotes, and with double quotes: http://www.myregextester.com/?r=020906eb The array_filter and array_merge in my code is done to merge the results of quoted and non-quoted images into one array result if they happen to appear on the same page, you can see the results of the preg_match_all code before merging in the example link above. Link to comment https://forums.phpfreaks.com/topic/131039-regex-help-build-a-list-of-images-on-a-page/#findComment-680480 Share on other sites More sharing options...
EthanV2 Posted November 2, 2008 Author Share Posted November 2, 2008 My code in my original post was munged by the forum server during posting (\'s removed before alpha characters and a ' removed from within [ ] character set and before the pattern, quite unintended changes that makes me concerned about posting future source code here). Here's the proper code posting: http://pastebin.com/f42a371cc Here's the pattern in action matching against img tags without quotes, with single quotes, and with double quotes: http://www.myregextester.com/?r=020906eb The array_filter and array_merge in my code is done to merge the results of quoted and non-quoted images into one array result if they happen to appear on the same page, you can see the results of the preg_match_all code before merging in the example link above. That's perfect, thanks. This is a huge help for me. Link to comment https://forums.phpfreaks.com/topic/131039-regex-help-build-a-list-of-images-on-a-page/#findComment-680534 Share on other sites More sharing options...
ddrudik Posted November 2, 2008 Share Posted November 2, 2008 While testing with various sites I noticed that some sites such as msn.com had img src URLs that referenced an image with a # sign and then more (unwanted) data, so I modified the expression to drop the # sign and anything after that if encountered (as it does with the ?): http://pastebin.com/f1935e1d6 Link to comment https://forums.phpfreaks.com/topic/131039-regex-help-build-a-list-of-images-on-a-page/#findComment-680798 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.