stabnsprint Posted July 10, 2009 Share Posted July 10, 2009 Hi there, I'm relatively new to PHP and was wondering if you guys could help me out. I'm trying to write some PHP code that performs a search on Google given certain keywords and returns all of the links on the search result page. Right now, I'm using cURL to query the site and then DOM and XPath to parse the HTML and give me the links. Here is the code: Line number On/Off | Expand/Contract 1. 2. <?php 3. 4. class scraper_google extends scraper_base 5. { 6. public $dom; 7. public $hrefs; 8. 9. public function init($keywords) 10. { 11. $this->keywords = $keywords; 12. 13. $this->target_url = 'http://www.google.com/#hl=en&q=' 14. .$keywords[0].'&aq=f&oq=&aqi=g10&fp=ADrf44LAAa8'; 15. echo $this->target_url; 16. $this->search_engine = 'www.google.com'; 17. $this->userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)'; 18. } 19. public function parse_results() 20. { 21. // make the cURL request to $target_url 22. $ch = curl_init(); 23. curl_setopt($ch, CURLOPT_USERAGENT, $this->userAgent); 24. curl_setopt($ch, CURLOPT_URL,$this->target_url); 25. curl_setopt($ch, CURLOPT_FAILONERROR, true); 26. curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); 27. curl_setopt($ch, CURLOPT_AUTOREFERER, true); 28. curl_setopt($ch, CURLOPT_RETURNTRANSFER,true); 29. curl_setopt($ch, CURLOPT_TIMEOUT, 10); 30. $html= curl_exec($ch); 31. if (!$html) 32. { 33. echo "<br />cURL error number:" .curl_errno($ch); 34. echo "<br />cURL error:" . curl_error($ch); 35. exit; 36. } 37. 38. // parse the html into a DOMDocument 39. $dom = new DOMDocument(); 40. @$dom->loadHTML($html); 41. 42. // grab all the on the page 43. $xpath = new DOMXPath($dom); 44. $this->hrefs = $xpath->evaluate("/html//a"); 45. } 46. public function display_results() 47. { 48. for ($i = 0; $i < $this->hrefs->length; $i++) 49. { 50. $href = $this->hrefs->item($i); 51. $url = $href->getAttribute('href'); 52. echo "<br />Link stored: $url"; 53. } 54. } 55. 56. } 57. 58. ?> 59. And this is the script that implements it: <?php require_once('__root.inc.php'); $scraper = new scraper_google(); $scraper->keywords[0] = "keyword"; $scraper->init($scraper->keywords); $scraper->parse_results(); $scraper->display_results(); ?> Feel free to try it out yourself. The problem that I'm having is that it gets to the page but is only able to read the header of the result page (with the Google bar up top along with the image, video, and blog search links. I'm guessing the reason for this is because Google AJAXs the search result after the page loads so my question is, is there any way to have access to and parse the page after the search results are displayed? Thank you. Link to comment https://forums.phpfreaks.com/topic/165535-scraping-search-results-with-curl-and-php/ Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.