natasha_thomas Posted May 15, 2010 Share Posted May 15, 2010 Friends, Does anyone know of any script that will scrape the Google Related Search results? Thanks all!!! Quote Link to comment https://forums.phpfreaks.com/topic/201849-google-related-search-script/ Share on other sites More sharing options...
xcoderx Posted May 15, 2010 Share Posted May 15, 2010 something like this? <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>My Google AJAX Search API Application</title> <script src="http://www.google.com/jsapi?key=ABQIAAAAclF0CaSAO_QquNTAYGngFBTtvdACkO1BIkzUzGmCFOiqlW10pRQQGlGnuPiXbCvjoRrG9Yth5Gv2-w" type="text/javascript"></script> <script language="Javascript" type="text/javascript"> //<![CDATA[ google.load("search", "1"); function OnLoad() { // Create a search control var searchControl = new google.search.SearchControl(); // Add in a full set of searchers var localSearch = new google.search.LocalSearch(); searchControl.addSearcher(localSearch); searchControl.addSearcher(new google.search.WebSearch()); searchControl.addSearcher(new google.search.VideoSearch()); searchControl.addSearcher(new google.search.BlogSearch()); // Set the Local Search center point localSearch.setCenterPoint("Texas, US"); // Tell the searcher to draw itself and tell it where to attach searchControl.draw(document.getElementById("searchcontrol")); // Execute an inital search searchControl.execute("Google"); } google.setOnLoadCallback(OnLoad); //]]> </script> </head> <body> <div id="searchcontrol">Loading...</div> </body> </html> Quote Link to comment https://forums.phpfreaks.com/topic/201849-google-related-search-script/#findComment-1058686 Share on other sites More sharing options...
natasha_thomas Posted May 15, 2010 Author Share Posted May 15, 2010 Am looking for something else. Have a look at it: http://www.google.com/#hl=en&q=paintball&aq=f&aqi=g10&aql=&oq=&gs_rfai=&fp=c78e48b898b2787e Now observe at the bottom of page you see: Searches related to paintball paintball store paintball guns paintball gear paintball fields paintball discounters paintball game paintball online game spyder paintball i want a script that gets me those Keywords..... May you help with this? Or does any such script exist? Natty Quote Link to comment https://forums.phpfreaks.com/topic/201849-google-related-search-script/#findComment-1058694 Share on other sites More sharing options...
Daniel0 Posted May 15, 2010 Share Posted May 15, 2010 This should do it (until Google changes their HTML output): <?php function getRelatedTerms($term) { $url = sprintf('http://www.google.com/search?q=%s', urlencode($term)); $userAgent = 'Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.3) Gecko/20100423 Ubuntu/10.04 (lucid) Firefox/3.6.3'; $ch = curl_init($url); curl_setopt_array($ch, array( CURLOPT_RETURNTRANSFER => true, CURLOPT_USERAGENT => $userAgent, )); $googleRes = curl_exec($ch); curl_close($ch); preg_match_all('#href\="([^"]+q=([^&]+)[^"]+oi=revisions_inline[^"]+)"#miu', $googleRes, $matches); return array_map('urldecode', $matches[2]); } print_r(getRelatedTerms('paintball')); Quote Link to comment https://forums.phpfreaks.com/topic/201849-google-related-search-script/#findComment-1058712 Share on other sites More sharing options...
natasha_thomas Posted May 15, 2010 Author Share Posted May 15, 2010 This should do it (until Google changes their HTML output): <?php function getRelatedTerms($term) { $url = sprintf('http://www.google.com/search?q=%s', urlencode($term)); $userAgent = 'Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.3) Gecko/20100423 Ubuntu/10.04 (lucid) Firefox/3.6.3'; $ch = curl_init($url); curl_setopt_array($ch, array( CURLOPT_RETURNTRANSFER => true, CURLOPT_USERAGENT => $userAgent, )); $googleRes = curl_exec($ch); curl_close($ch); preg_match_all('#href\="([^"]+q=([^&]+)[^"]+oi=revisions_inline[^"]+)"#miu', $googleRes, $matches); return array_map('urldecode', $matches[2]); } print_r(getRelatedTerms('paintball')); Beautiful!!! Output is: Array ( [0] => wedding rings sets [1] => engagement rings [2] => design your own wedding rings [3] => jewelry stores [4] => wedding bands [5] => tacori [6] => mens wedding rings [7] => wedding dresses ) Any way i can use this array in foreach statment.... Like.. foreach ($array as value) { echo array; } Coz i want to make each search as hyperlink... what to be changed in code to get it run like this? Quote Link to comment https://forums.phpfreaks.com/topic/201849-google-related-search-script/#findComment-1058719 Share on other sites More sharing options...
Daniel0 Posted May 15, 2010 Share Posted May 15, 2010 Well, print_r just prints out an array so you can read it. You can do like this: $related = getRelatedTerms('paintball'); foreach ($related as $term) { // do something } Quote Link to comment https://forums.phpfreaks.com/topic/201849-google-related-search-script/#findComment-1058721 Share on other sites More sharing options...
natasha_thomas Posted May 15, 2010 Author Share Posted May 15, 2010 Well, print_r just prints out an array so you can read it. You can do like this: $related = getRelatedTerms('paintball'); foreach ($related as $term) { // do something } Sugar & Sweet!!! Now am scared on how frequently does Google change its HTML output... Coz, if i implement this code on my production sites, and Google changes the Output formate i will see lot of Error messages so its like lot of mantenece. Is there anyway, we can restrict the Code not to Show Error Message, even when the HTML output changes??? Thanks for All Daniel... You guys are Real Smart Coders.... Quote Link to comment https://forums.phpfreaks.com/topic/201849-google-related-search-script/#findComment-1058723 Share on other sites More sharing options...
Daniel0 Posted May 15, 2010 Share Posted May 15, 2010 Well, if Google changes its output then you just won't get any matches. You shouldn't get any error messages. Still, I tried making it as general as possible. Essentially all it does is that it looks for links that has oi=revisions_inline in it and then it extracts the q (the search query) part. If they for some reason stop doing that, then it won't work. Quote Link to comment https://forums.phpfreaks.com/topic/201849-google-related-search-script/#findComment-1058726 Share on other sites More sharing options...
ignace Posted May 15, 2010 Share Posted May 15, 2010 You can also use XPath: <?php ini_set('display_errors', 0); $url = 'http://www.google.com/search?q=paintball'; $dom = new DomDocument(); if ($dom->loadHtmlFile($url)) { $xpath = new DomXPath($dom); $entries = $xpath->query('//div[@class="brs_col"]/p/a'); foreach ($entries as $entry) { echo $entry->nodeValue, "<br>\n"; } } But like Daniel already said: Well, if Google changes its output then you just won't get any matches. Quote Link to comment https://forums.phpfreaks.com/topic/201849-google-related-search-script/#findComment-1058728 Share on other sites More sharing options...
Daniel0 Posted May 15, 2010 Share Posted May 15, 2010 The problem with using DOM is just that Google doesn't provide valid HTML, so DOMDocument will complain when it parses it. Also, your script doesn't output anything. Quote Link to comment https://forums.phpfreaks.com/topic/201849-google-related-search-script/#findComment-1058731 Share on other sites More sharing options...
ignace Posted May 15, 2010 Share Posted May 15, 2010 Is there anyway, we can restrict the Code not to Show Error Message, even when the HTML output changes??? Your production server should be configured like this already. Just make sure that your input is what you expect like: preg_match_all('#href\="([^"]+q=([^&]+)[^"]+oi=revisions_inline[^"]+)"#miu', $googleRes, $matches); return !empty($matches) && 3 <= sizeof($matches) ? array_map('urldecode', $matches[2]) : array(); Quote Link to comment https://forums.phpfreaks.com/topic/201849-google-related-search-script/#findComment-1058732 Share on other sites More sharing options...
ignace Posted May 15, 2010 Share Posted May 15, 2010 so DOMDocument will complain when it parses it. I tried $dom->strictErrorChecking = false and $dom->validateOnParse = false but it kept giving warning's. Also, your script doesn't output anything. Weird, I get: paintball store paintball guns paintball gear paintball fields paintball discounters paintball game paintball online game spyder paintball Quote Link to comment https://forums.phpfreaks.com/topic/201849-google-related-search-script/#findComment-1058735 Share on other sites More sharing options...
Daniel0 Posted May 15, 2010 Share Posted May 15, 2010 Is there anyway, we can restrict the Code not to Show Error Message, even when the HTML output changes??? Your production server should be configured like this already. Just make sure that your input is what you expect like: preg_match_all('#href\="([^"]+q=([^&]+)[^"]+oi=revisions_inline[^"]+)"#miu', $googleRes, $matches); return !empty($matches) && 3 <= sizeof($matches) ? array_map('urldecode', $matches[2]) : array(); The additional checks are redundant because $matches[2] will always exist and be an array even if it doesn't find anything. See: $googleRes = 'foo'; preg_match_all('#href\="([^"]+q=([^&]+)[^"]+oi=revisions_inline[^"]+)"#miu', $googleRes, $matches); var_dump($matches); Output: array(3) { [0]=> array(0) { } [1]=> array(0) { } [2]=> array(0) { } } Quote Link to comment https://forums.phpfreaks.com/topic/201849-google-related-search-script/#findComment-1058737 Share on other sites More sharing options...
ignace Posted May 15, 2010 Share Posted May 15, 2010 @Natasha note that your server may not be supporting cURL Quote Link to comment https://forums.phpfreaks.com/topic/201849-google-related-search-script/#findComment-1058738 Share on other sites More sharing options...
natasha_thomas Posted May 15, 2010 Author Share Posted May 15, 2010 Is there anyway, we can restrict the Code not to Show Error Message, even when the HTML output changes??? Your production server should be configured like this already. Just make sure that your input is what you expect like: preg_match_all('#href\="([^"]+q=([^&]+)[^"]+oi=revisions_inline[^"]+)"#miu', $googleRes, $matches); return !empty($matches) && 3 <= sizeof($matches) ? array_map('urldecode', $matches[2]) : array(); IGnace i have Curl on Server.. Another thing, as per my understanding adding this code will keep me from showing any Error message when google HTML Out put Changes.... Right? Quote Link to comment https://forums.phpfreaks.com/topic/201849-google-related-search-script/#findComment-1058769 Share on other sites More sharing options...
Daniel0 Posted May 15, 2010 Share Posted May 15, 2010 Is there anyway, we can restrict the Code not to Show Error Message, even when the HTML output changes??? Your production server should be configured like this already. Just make sure that your input is what you expect like: preg_match_all('#href\="([^"]+q=([^&]+)[^"]+oi=revisions_inline[^"]+)"#miu', $googleRes, $matches); return !empty($matches) && 3 <= sizeof($matches) ? array_map('urldecode', $matches[2]) : array(); IGnace i have Curl on Server.. Another thing, as per my understanding adding this code will keep me from showing any Error message when google HTML Out put Changes.... Right? That's not necessary. See my previous post. Quote Link to comment https://forums.phpfreaks.com/topic/201849-google-related-search-script/#findComment-1058771 Share on other sites More sharing options...
natasha_thomas Posted May 15, 2010 Author Share Posted May 15, 2010 Aha... I get it now... Many Thanks Danial and Ignace!!! Quote Link to comment https://forums.phpfreaks.com/topic/201849-google-related-search-script/#findComment-1058773 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.