scarhand Posted November 19, 2008 Share Posted November 19, 2008 i have a code ive been working on its supposed to grab all images from a posted URL that are larger than 80x80 and put them into an array but its not working properly, it works for most web sites, but not all of them heres the code: $url = $_POST['url']; if (substr($url, -1) != '/') $ddurl = "$url/"; else $ddurl = $url; $ddcontent = file_get_contents($ddurl); preg_match_all('/<img.*src="(.*?)".*?>/i', $ddcontent, $ddmatches); foreach (array_unique($ddmatches[1]) as $ddimg) { if ($ddimg[0] == '/') $ddimg = substr($ddimg, 1); if (!preg_match("/^http/i", $ddimg)) $ddimg = "$ddurl$ddimg"; $ddmysock = getimagesize($ddimg); $ddwidth = $ddmysock[0]; $ddheight = $ddmysock[1]; if ($ddwidth > 80 && $ddheight > 80) { if ($ddwidth > 150 || $ddheight > 150) { if ($ddwidth > $hddeight) $ddpercentage = (150 / $ddwidth); else $ddpercentage = (150 / $ddheight); $ddwidth = round($ddwidth * $ddpercentage); $ddheight = round($ddheight * $ddpercentage); } $ddimgtag = "<img src=\"$ddimg\" width=\"$ddwidth\" height=\"$ddheight\">"; $ddext = substr($ddimg, -4); if ($ddext == '.gif' || $ddext == '.jpg' || $ddext == '.png') $newimages[] = array('url' => $ddimg, 'img_tag' => $ddimgtag); } } ?> Quote Link to comment Share on other sites More sharing options...
btherl Posted November 20, 2008 Share Posted November 20, 2008 Can you give an example of a site it doesn't work for? That's the simplest approach. Then we have something concrete to work on. Situations I imagine might cause problems are 1. Multiple img tags on one line 2. img tag split over 2 lines 3. base href set for html page (modifying where image paths are relative to) Edit: The most reliable way is to use an HTML parser Quote Link to comment Share on other sites More sharing options...
cooldude832 Posted November 20, 2008 Share Posted November 20, 2008 did u try to use striptags but exclude the img ta from the stripe thus givining only img tags left very easy to find em then Quote Link to comment Share on other sites More sharing options...
scarhand Posted November 20, 2008 Author Share Posted November 20, 2008 Can you give an example of a site it doesn't work for? That's the simplest approach. Then we have something concrete to work on. Situations I imagine might cause problems are 1. Multiple img tags on one line 2. img tag split over 2 lines 3. base href set for html page (modifying where image paths are relative to) Edit: The most reliable way is to use an HTML parser example of URL that works: http://www.wkdservers.co.uk example of URL that does not work: http://www.wkdservers.co.uk/game-servers.php ive been working on this for days i need to get an images entire URL (i.e. http://www.website.com/images/image.gif) for the page entered in the text box.... i have taken into consideration that many sites do not use entire URL's when sourcing their images....but im really at a loss of getting this to work properly Quote Link to comment Share on other sites More sharing options...
scarhand Posted November 21, 2008 Author Share Posted November 21, 2008 ok i updated the script: heres what i looks like now: function rel2abs($absolute, $relative) { $p = parse_url($relative); if($p["scheme"])return $relative; extract(parse_url($absolute)); $path = dirname($path); if($relative{0} == '/') { $cparts = array_filter(explode("/", $relative)); } else { $aparts = array_filter(explode("/", $path)); $rparts = array_filter(explode("/", $relative)); $cparts = array_merge($aparts, $rparts); foreach($cparts as $i => $part) { if($part == '.') { $cparts[$i] = null; } if($part == '..') { $cparts[$i - 1] = null; $cparts[$i] = null; } } $cparts = array_filter($cparts); } $path = implode("/", $cparts); $url = ""; if($scheme) { $url = "$scheme://"; } if($user) { $url .= "$user"; if($pass) { $url .= ":$pass"; } $url .= "@"; } if($host) { $url .= "$host/"; } $url .= $path; return $url; } if (!getimagesize($url)) { $ddcontent = file_get_contents($url); preg_match_all('/<img.*src="(.*?)".*?>/i', $ddcontent, $ddmatches); foreach (array_unique($ddmatches[1]) as $ddimg) { $ddimg = rel2abs($url, $ddimg); $ddmysock = getimagesize($ddimg); $ddwidth = $ddmysock[0]; $ddheight = $ddmysock[1]; if ($ddwidth > 80 && $ddheight > 80) { if ($ddwidth > 150 || $ddheight > 150) { if ($ddwidth > $hddeight) $ddpercentage = (150 / $ddwidth); else $ddpercentage = (150 / $ddheight); $ddwidth = round($ddwidth * $ddpercentage); $ddheight = round($ddheight * $ddpercentage); } $ddimgtag = "<img src=\"$ddimg\" width=\"$ddwidth\" height=\"$ddheight\">"; $newimages[] = array('url' => $ddimg, 'img_tag' => $ddimgtag); } } } else { $ddmysock = getimagesize($url); $ddwidth = $ddmysock[0]; $ddheight = $ddmysock[1]; if ($ddwidth > 80 && $ddheight > 80) { if ($ddwidth > 150 || $ddheight > 150) { if ($ddwidth > $hddeight) $ddpercentage = (150 / $ddwidth); else $ddpercentage = (150 / $ddheight); $ddwidth = round($ddwidth * $ddpercentage); $ddheight = round($ddheight * $ddpercentage); } $ddimgtag = "<img src=\"$url\" width=\"$ddwidth\" height=\"$ddheight\">"; $newimages[] = array('url' => $url, 'img_tag' => $ddimgtag); } } it is failing to get images from: http://ca.yahoo.com/ http://in.yahoo.com/ http://yahoo.com/ http://www.cnn.com/2008/LIVING/11/19/eharmony.same.sex.matches/ http://forbes.com/ Quote Link to comment Share on other sites More sharing options...
btherl Posted November 23, 2008 Share Posted November 23, 2008 Where does it fail? Try printing out all the urls it matches and comparing that to the original page source. If it's finding the urls but misinterpreting, you can print out what your script thinks the urls should be and see which ones are wrong. You might want to print out the widths and heights too just to confirm there's not something going wrong there (for example, the website might check the referrer and that might not be set correctly). Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.