Jump to content

please help - showing all images from an URL


scarhand

Recommended Posts

i have a code ive been working on

 

its supposed to grab all images from a posted URL that are larger than 80x80 and put them into an array

 

but its not working properly, it works for most web sites, but not all of them

 

heres the code:

 

  $url = $_POST['url'];

  if (substr($url, -1) != '/')
    $ddurl = "$url/";
  else
    $ddurl = $url;
  
  $ddcontent = file_get_contents($ddurl);
  
  preg_match_all('/<img.*src="(.*?)".*?>/i', $ddcontent, $ddmatches);
  
  foreach (array_unique($ddmatches[1]) as $ddimg)
  {
    if ($ddimg[0] == '/')
      $ddimg = substr($ddimg, 1);
  
    if (!preg_match("/^http/i", $ddimg))
      $ddimg = "$ddurl$ddimg";

    $ddmysock = getimagesize($ddimg);
    $ddwidth = $ddmysock[0];
    $ddheight = $ddmysock[1];
    
    if ($ddwidth > 80 && $ddheight > 80)
    {
      if ($ddwidth > 150 || $ddheight > 150)
      {
        if ($ddwidth > $hddeight)
          $ddpercentage = (150 / $ddwidth); 
        else
          $ddpercentage = (150 / $ddheight);
          
        $ddwidth = round($ddwidth * $ddpercentage); 
        $ddheight = round($ddheight * $ddpercentage); 
      }

      $ddimgtag = "<img src=\"$ddimg\" width=\"$ddwidth\" height=\"$ddheight\">";
      
      $ddext = substr($ddimg, -4);
      
      if ($ddext == '.gif' || $ddext == '.jpg' || $ddext == '.png')
        $newimages[] = array('url' => $ddimg, 'img_tag' => $ddimgtag);
    }
  }

?>

Can you give an example of a site it doesn't work for?  That's the simplest approach.  Then we have something concrete to work on.

 

Situations I imagine might cause problems are

 

1. Multiple img tags on one line

2. img tag split over 2 lines

3. base href set for html page (modifying where image paths are relative to)

 

Edit:  The most reliable way is to use an HTML parser

 

Can you give an example of a site it doesn't work for?  That's the simplest approach.  Then we have something concrete to work on.

 

Situations I imagine might cause problems are

 

1. Multiple img tags on one line

2. img tag split over 2 lines

3. base href set for html page (modifying where image paths are relative to)

 

Edit:  The most reliable way is to use an HTML parser

 

 

example of URL that works: http://www.wkdservers.co.uk

example of URL that does not work: http://www.wkdservers.co.uk/game-servers.php

 

ive been working on this for days

 

i need to get an images entire URL (i.e. http://www.website.com/images/image.gif) for the page entered in the text box....

 

i have taken into consideration that many sites do not use entire URL's when sourcing their images....but im really at a loss of getting this to work properly

ok i updated the script: heres what i looks like now:

 

  function rel2abs($absolute, $relative) 
  {
    $p = parse_url($relative);
    if($p["scheme"])return $relative;
    
    extract(parse_url($absolute));
    
    $path = dirname($path); 

    if($relative{0} == '/') {
        $cparts = array_filter(explode("/", $relative));
    }
    else {
        $aparts = array_filter(explode("/", $path));
        $rparts = array_filter(explode("/", $relative));
        $cparts = array_merge($aparts, $rparts);
        foreach($cparts as $i => $part) {
            if($part == '.') {
                $cparts[$i] = null;
            }
            if($part == '..') {
                $cparts[$i - 1] = null;
                $cparts[$i] = null;
            }
        }
        $cparts = array_filter($cparts);
    }
    $path = implode("/", $cparts);
    $url = "";
    if($scheme) {
        $url = "$scheme://";
    }
    if($user) {
        $url .= "$user";
        if($pass) {
            $url .= ":$pass";
        }
        $url .= "@";
    }
    if($host) {
        $url .= "$host/";
    }
    $url .= $path;
    return $url;
  }



  if (!getimagesize($url))
  {
    $ddcontent = file_get_contents($url);
    
    preg_match_all('/<img.*src="(.*?)".*?>/i', $ddcontent, $ddmatches);
    
    foreach (array_unique($ddmatches[1]) as $ddimg)
    {
      $ddimg = rel2abs($url, $ddimg);

      $ddmysock = getimagesize($ddimg);
      $ddwidth = $ddmysock[0];
      $ddheight = $ddmysock[1];
      
      if ($ddwidth > 80 && $ddheight > 80)
      {
        if ($ddwidth > 150 || $ddheight > 150)
        {
          if ($ddwidth > $hddeight)
            $ddpercentage = (150 / $ddwidth); 
          else
            $ddpercentage = (150 / $ddheight);
            
          $ddwidth = round($ddwidth * $ddpercentage); 
          $ddheight = round($ddheight * $ddpercentage); 
        }

        $ddimgtag = "<img src=\"$ddimg\" width=\"$ddwidth\" height=\"$ddheight\">";
        
        $newimages[] = array('url' => $ddimg, 'img_tag' => $ddimgtag);
      }
    }
  }
  else
  {
    $ddmysock = getimagesize($url);
    $ddwidth = $ddmysock[0];
    $ddheight = $ddmysock[1];
    
    if ($ddwidth > 80 && $ddheight > 80)
    {
      if ($ddwidth > 150 || $ddheight > 150)
      {
        if ($ddwidth > $hddeight)
          $ddpercentage = (150 / $ddwidth); 
        else
          $ddpercentage = (150 / $ddheight);
          
        $ddwidth = round($ddwidth * $ddpercentage); 
        $ddheight = round($ddheight * $ddpercentage); 
      }

      $ddimgtag = "<img src=\"$url\" width=\"$ddwidth\" height=\"$ddheight\">";
      
      $newimages[] = array('url' => $url, 'img_tag' => $ddimgtag);
    }
  }

 

it is failing to get images from:

http://ca.yahoo.com/

http://in.yahoo.com/

http://yahoo.com/

http://www.cnn.com/2008/LIVING/11/19/eharmony.same.sex.matches/

http://forbes.com/

Where does it fail?  Try printing out all the urls it matches and comparing that to the original page source.

 

If it's finding the urls but misinterpreting, you can print out what your script thinks the urls should be and see which ones are wrong.

 

You might want to print out the widths and heights too just to confirm there's not something going wrong there (for example, the website might check the referrer and that might not be set correctly).

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.