Jump to content

please help - showing all images from an URL


scarhand

Recommended Posts

i have a code ive been working on

 

its supposed to grab all images from a posted URL that are larger than 80x80 and put them into an array

 

but its not working properly, it works for most web sites, but not all of them

 

heres the code:

 

  $url = $_POST['url'];

  if (substr($url, -1) != '/')
    $ddurl = "$url/";
  else
    $ddurl = $url;
  
  $ddcontent = file_get_contents($ddurl);
  
  preg_match_all('/<img.*src="(.*?)".*?>/i', $ddcontent, $ddmatches);
  
  foreach (array_unique($ddmatches[1]) as $ddimg)
  {
    if ($ddimg[0] == '/')
      $ddimg = substr($ddimg, 1);
  
    if (!preg_match("/^http/i", $ddimg))
      $ddimg = "$ddurl$ddimg";

    $ddmysock = getimagesize($ddimg);
    $ddwidth = $ddmysock[0];
    $ddheight = $ddmysock[1];
    
    if ($ddwidth > 80 && $ddheight > 80)
    {
      if ($ddwidth > 150 || $ddheight > 150)
      {
        if ($ddwidth > $hddeight)
          $ddpercentage = (150 / $ddwidth); 
        else
          $ddpercentage = (150 / $ddheight);
          
        $ddwidth = round($ddwidth * $ddpercentage); 
        $ddheight = round($ddheight * $ddpercentage); 
      }

      $ddimgtag = "<img src=\"$ddimg\" width=\"$ddwidth\" height=\"$ddheight\">";
      
      $ddext = substr($ddimg, -4);
      
      if ($ddext == '.gif' || $ddext == '.jpg' || $ddext == '.png')
        $newimages[] = array('url' => $ddimg, 'img_tag' => $ddimgtag);
    }
  }

?>

Link to comment
Share on other sites

Can you give an example of a site it doesn't work for?  That's the simplest approach.  Then we have something concrete to work on.

 

Situations I imagine might cause problems are

 

1. Multiple img tags on one line

2. img tag split over 2 lines

3. base href set for html page (modifying where image paths are relative to)

 

Edit:  The most reliable way is to use an HTML parser

 

Link to comment
Share on other sites

Can you give an example of a site it doesn't work for?  That's the simplest approach.  Then we have something concrete to work on.

 

Situations I imagine might cause problems are

 

1. Multiple img tags on one line

2. img tag split over 2 lines

3. base href set for html page (modifying where image paths are relative to)

 

Edit:  The most reliable way is to use an HTML parser

 

 

example of URL that works: http://www.wkdservers.co.uk

example of URL that does not work: http://www.wkdservers.co.uk/game-servers.php

 

ive been working on this for days

 

i need to get an images entire URL (i.e. http://www.website.com/images/image.gif) for the page entered in the text box....

 

i have taken into consideration that many sites do not use entire URL's when sourcing their images....but im really at a loss of getting this to work properly

Link to comment
Share on other sites

ok i updated the script: heres what i looks like now:

 

  function rel2abs($absolute, $relative) 
  {
    $p = parse_url($relative);
    if($p["scheme"])return $relative;
    
    extract(parse_url($absolute));
    
    $path = dirname($path); 

    if($relative{0} == '/') {
        $cparts = array_filter(explode("/", $relative));
    }
    else {
        $aparts = array_filter(explode("/", $path));
        $rparts = array_filter(explode("/", $relative));
        $cparts = array_merge($aparts, $rparts);
        foreach($cparts as $i => $part) {
            if($part == '.') {
                $cparts[$i] = null;
            }
            if($part == '..') {
                $cparts[$i - 1] = null;
                $cparts[$i] = null;
            }
        }
        $cparts = array_filter($cparts);
    }
    $path = implode("/", $cparts);
    $url = "";
    if($scheme) {
        $url = "$scheme://";
    }
    if($user) {
        $url .= "$user";
        if($pass) {
            $url .= ":$pass";
        }
        $url .= "@";
    }
    if($host) {
        $url .= "$host/";
    }
    $url .= $path;
    return $url;
  }



  if (!getimagesize($url))
  {
    $ddcontent = file_get_contents($url);
    
    preg_match_all('/<img.*src="(.*?)".*?>/i', $ddcontent, $ddmatches);
    
    foreach (array_unique($ddmatches[1]) as $ddimg)
    {
      $ddimg = rel2abs($url, $ddimg);

      $ddmysock = getimagesize($ddimg);
      $ddwidth = $ddmysock[0];
      $ddheight = $ddmysock[1];
      
      if ($ddwidth > 80 && $ddheight > 80)
      {
        if ($ddwidth > 150 || $ddheight > 150)
        {
          if ($ddwidth > $hddeight)
            $ddpercentage = (150 / $ddwidth); 
          else
            $ddpercentage = (150 / $ddheight);
            
          $ddwidth = round($ddwidth * $ddpercentage); 
          $ddheight = round($ddheight * $ddpercentage); 
        }

        $ddimgtag = "<img src=\"$ddimg\" width=\"$ddwidth\" height=\"$ddheight\">";
        
        $newimages[] = array('url' => $ddimg, 'img_tag' => $ddimgtag);
      }
    }
  }
  else
  {
    $ddmysock = getimagesize($url);
    $ddwidth = $ddmysock[0];
    $ddheight = $ddmysock[1];
    
    if ($ddwidth > 80 && $ddheight > 80)
    {
      if ($ddwidth > 150 || $ddheight > 150)
      {
        if ($ddwidth > $hddeight)
          $ddpercentage = (150 / $ddwidth); 
        else
          $ddpercentage = (150 / $ddheight);
          
        $ddwidth = round($ddwidth * $ddpercentage); 
        $ddheight = round($ddheight * $ddpercentage); 
      }

      $ddimgtag = "<img src=\"$url\" width=\"$ddwidth\" height=\"$ddheight\">";
      
      $newimages[] = array('url' => $url, 'img_tag' => $ddimgtag);
    }
  }

 

it is failing to get images from:

http://ca.yahoo.com/

http://in.yahoo.com/

http://yahoo.com/

http://www.cnn.com/2008/LIVING/11/19/eharmony.same.sex.matches/

http://forbes.com/

Link to comment
Share on other sites

Where does it fail?  Try printing out all the urls it matches and comparing that to the original page source.

 

If it's finding the urls but misinterpreting, you can print out what your script thinks the urls should be and see which ones are wrong.

 

You might want to print out the widths and heights too just to confirm there's not something going wrong there (for example, the website might check the referrer and that might not be set correctly).

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.