Jump to content

Scraping Images from Google


rednax

Recommended Posts

Hello,

 

I've been using this script to scrape images from Google. It was working beautifully the other night, but suddenly only pulls 1 image.

 

http://blogoscoped.com/archive/2007-03-19-n36.html

 

<?
header("Content-type: text/html; charset=utf-8");
?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
    <title>Get Images</title>
</head>
<body>

<?

$results = getGoogleImages('horses');
foreach ($results as $result) {
    echo '<p><a href="' . htmlentities($result['url']) . '">' .
            '<img src="' . htmlentities($result['thumbnail']) . '" alt="" ' .
            'oncontextmenu="this.src=\'' . htmlentities($result['image']) . '\';return false;" ' .
            'style="border: 1px solid black" /></a><br />' .
            '<em>' . htmlentities($result['description']) . '</em>' .
            '</p>';
}

?>

</body>
</html><?

function getGoogleImages($q, $doSafeSearch = false)
{
    $results = array();

    $safe = ($doSafeSearch) ? 'on' : 'off';
    $url = 'http://images.google.com/images?safe=' . $safe .
            '&q=' . urlencode($q);
    $result = file_get_contents($url);

    $from = 'dyn.Img("';
    $startPos = strPos($result, $from);
    $endPos = strPos($result, ');dyn.updateStatus');
    $functions = substr( $result, $startPos + strlen($from), $endPos );
    $functions = explode('");dyn.Img("', $functions);

    foreach ($functions as $f) {
        $i = count($results);
        list($results[$i]['url'], $v1, $hash,
                $results[$i]['image'],
                $results[$i]['width'], $results[$i]['height'],
                $results[$i]['description'],
                $v2, $v3, $more, $extension, $domain) = explode('","', $f);
        list($results[$i]['url'], $params) = explode('&h', $results[$i]['url']);

        $prefix = 'http://tbn0.google.com/images?q=tbn:';
        $results[$i]['thumbnail'] = $prefix . $hash . ':' . $results[$i]['image'];
        $results[$i]['description'] = strip_tags($results[$i]['description']);
    }

    return $results;
}

?>

 

Did Google Images change their code or what? I'm not particularly good with these string functions so I'm not sure...

 

Thanks..

Link to comment
https://forums.phpfreaks.com/topic/102038-scraping-images-from-google/
Share on other sites

Yes, it's very likely that google made some minor changes to the HTML code, thus breaking your script. That happens all the time with scraping programs.

 

Also programs don't just 'break' on their own, something else has to be changed to cause the to 'break'. So one more reason why it was probably a minor change in googles HTML code.

 

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.