rednax Posted April 20, 2008 Share Posted April 20, 2008 Hello, I've been using this script to scrape images from Google. It was working beautifully the other night, but suddenly only pulls 1 image. http://blogoscoped.com/archive/2007-03-19-n36.html <? header("Content-type: text/html; charset=utf-8"); ?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>Get Images</title> </head> <body> <? $results = getGoogleImages('horses'); foreach ($results as $result) { echo '<p><a href="' . htmlentities($result['url']) . '">' . '<img src="' . htmlentities($result['thumbnail']) . '" alt="" ' . 'oncontextmenu="this.src=\'' . htmlentities($result['image']) . '\';return false;" ' . 'style="border: 1px solid black" /></a><br />' . '<em>' . htmlentities($result['description']) . '</em>' . '</p>'; } ?> </body> </html><? function getGoogleImages($q, $doSafeSearch = false) { $results = array(); $safe = ($doSafeSearch) ? 'on' : 'off'; $url = 'http://images.google.com/images?safe=' . $safe . '&q=' . urlencode($q); $result = file_get_contents($url); $from = 'dyn.Img("'; $startPos = strPos($result, $from); $endPos = strPos($result, ');dyn.updateStatus'); $functions = substr( $result, $startPos + strlen($from), $endPos ); $functions = explode('");dyn.Img("', $functions); foreach ($functions as $f) { $i = count($results); list($results[$i]['url'], $v1, $hash, $results[$i]['image'], $results[$i]['width'], $results[$i]['height'], $results[$i]['description'], $v2, $v3, $more, $extension, $domain) = explode('","', $f); list($results[$i]['url'], $params) = explode('&h', $results[$i]['url']); $prefix = 'http://tbn0.google.com/images?q=tbn:'; $results[$i]['thumbnail'] = $prefix . $hash . ':' . $results[$i]['image']; $results[$i]['description'] = strip_tags($results[$i]['description']); } return $results; } ?> Did Google Images change their code or what? I'm not particularly good with these string functions so I'm not sure... Thanks.. Link to comment https://forums.phpfreaks.com/topic/102038-scraping-images-from-google/ Share on other sites More sharing options...
dptr1988 Posted April 20, 2008 Share Posted April 20, 2008 Yes, it's very likely that google made some minor changes to the HTML code, thus breaking your script. That happens all the time with scraping programs. Also programs don't just 'break' on their own, something else has to be changed to cause the to 'break'. So one more reason why it was probably a minor change in googles HTML code. Link to comment https://forums.phpfreaks.com/topic/102038-scraping-images-from-google/#findComment-522194 Share on other sites More sharing options...
rednax Posted April 21, 2008 Author Share Posted April 21, 2008 Okay... so i guess my next question is.. can someone help me see what's been changed and why this is broken? Their javascript source is a bit confusing for me with the str functions... Link to comment https://forums.phpfreaks.com/topic/102038-scraping-images-from-google/#findComment-522586 Share on other sites More sharing options...
rednax Posted April 21, 2008 Author Share Posted April 21, 2008 Alright I figured it out... Google added an extra element to their array... broke the script: $functions = explode(');dyn.Img("', $functions); Removed an extraneous " before explode('"); Link to comment https://forums.phpfreaks.com/topic/102038-scraping-images-from-google/#findComment-523107 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.