dilbertone Posted March 30, 2012 Share Posted March 30, 2012 helllo dear buddies Is it possible to scrape the Google search results page using PHP to pull out the total number of search results found? If so how would I go about doing this? Well i f ound the great script of http://google-rank-checker.squabbel.com/ how to obtain the www.seo-proxies.com API password note: is this a Google-API-Code!? /* License: open source for private and commercial use This code is free to use and modify as long as this comment stays untouched on top. URL of original source: http://google-rank-checker.squabbel.com Author of original source: justone@squabbel.com This tool should be completely legal but in any case you may not sue or seek compensation from the original Author for any damages or legal issues the use may cause. By using this source code you agree NOT to increase the request rates beyond the IP management function limitations, this would only harm our common cause. */ error_reporting(E_ALL); // ************************* Configuration variables ************************* // Your seo-proxies api credentials $pwd="2b24aff3c1266-----your-api-key---"; // Your www.seo-proxies.com API password $uid=YOUR_USER_ID; // Your www.seo-proxies.com API userid love to hear from you greetings dilbert Quote Link to comment Share on other sites More sharing options...
QuickOldCar Posted March 30, 2012 Share Posted March 30, 2012 Personally I think it's wrong to run this type of code. RED FLAG! This tool should be completely legal but in any case you may not sue or seek compensation from the original Author for any damages or legal issues the use may cause. Just look what they write about scraping googles results. Hints for scraping Google and avoiding detection First you need a reliable proxy source to be able to change your IP-Address. Of course the proxies have to be high anonymous, they should be fast and there should have been no previous abuse against Google. I can personally recommend the built in private proxy solution at www.seo-proxies.com but you can try another proxy solution as long as it delivers quality IPs that without abusive history. For continued scraping activity you should use between 50 and 150 proxies depending on the average resultset of each search query. Some projects might require even more. If you wish to start with a lower number of IPs during development you should still at least get 5 IPs, better 10 or 20. You will need them! Never continue scraping when Google did detect you! You need automated detection and abort routines like in the free Google Rank Checker. Make sure you clear Cookies after each IP change or disable them completely, with libCURL cookies are ignored by default Do not change the number of search results from 10 to a higher number if you wish to receive accurate ranks from Google Do not use threads (multiple scraping processes at the same time) if it is not really required. That just makes things more complicate. In the case you receive a virus/captcha warning, it's time to stop immediately. Captcha means : Your scraping has been detected ! Add more proxies, if you already use 100 or more you might have to use another source of IPs (see my recommendation for private proxies above, it is unlikely you can find a better source). If you do your job right you can scrape Google 24 hours a day without being detected. For reliable scraping you need to avoid any sort of black and graylisting, do not scrape more than 20 requests per hour per IP address. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.