j5uh Posted June 10, 2008 Share Posted June 10, 2008 I found this AWESOME yellowpages scraper online for free instead of paying someone to scrap the pages... http://www.scrapingpages.com/ I've tested the code here: <? ini_set('memory_limit', '99999M'); function createUrl($url,$lastnum) { $find = "?"; $trim = rtrim ($url,'a..z,A..Z,=,_,&'); $remove_to = strpbrk($trim, '?'); $number = 1; $counter= 0; while ($lastnum != $number) { $over = "?page=".$number."&"; $replace = str_replace($find,$over,$url); $myArray[$counter] = $replace; $number++; $counter++; } return $myArray; } $url = "http://www.yellowpages.com/TX/Internet-Marketing-Advertising?search_mode=all&search_terms=seo"; $lastnum = 1 +1; $url = createUrl($url,$lastnum); function createList ($url ) { $counter=0; foreach ($url as $value) { $html=file_get_contents ($value); $myArray[$counter] = $html; $counter++; } return $myArray; } $list = createList($url); foreach ($list as $value){ echo "<span style='width:8px; background:blue'> </span>"; preg_match_all ("/<div class=\"description\">([^`]*?)<\/div>/", $value, $matches); foreach ($matches[0] as $match) { preg_match ("/<h2>([^`]*?)<\/h2>/", $match, $temp); preg_match ("/<p>([^`]*?)<\/p>/" , $match, $desc); preg_match ("/<ul>([^`]*?)<\/ul>/" , $match, $num); $title = $temp['1']; $title = strip_tags(trim($title)); $description = $desc['1']; $description = strip_tags(trim($description)); $phone = $num['1']; $phone = strip_tags(trim($phone)); print "<b>$title</b> <br>$description<br> $phone<br> <br>"; } } ?> Works great but how do I get it to search more than 50+ pages...? I want to scrape all the houston businesses but it times out at 50 or so pages. Is there a way to modify this code to search maybe 50 pages at a time or something? like scrape pages 1-50, than 51-100, etc. etc. Quote Link to comment Share on other sites More sharing options...
jonsjava Posted June 10, 2008 Share Posted June 10, 2008 there is nothing anywhere in the code that stops it at 50 pages. I'm betting the 50 page limit is on yellowpages side. Quote Link to comment Share on other sites More sharing options...
discomatt Posted June 10, 2008 Share Posted June 10, 2008 Or the script times out. And are you aware this is against the yellowpages TOS? HOW YOU MAY USE OUR MATERIALS: We use a diverse range of information, text, photographs, designs, graphics, images, sound and video recordings, animation and other materials and effects on the YELLOWPAGES.COM Web site. We provide the information, content or advertisements (which we collectively call the "Materials") on the YELLOWPAGES.COM site FOR YOUR PERSONAL, NON-COMMERCIAL USE ONLY. Accordingly, You may view, use, copy, and distribute the Materials found on YELLOWPAGES.COM Web sites for internal, noncommercial, informational purposes only. You are prohibited from data mining, scraping, crawling, or using any process or processes that send automated queries to the YELLOWPAGES.COM Web site. You may not use the YELLOWPAGES.COM Web sites to compile a collection of listings, including a competing listing product or service. You may not use the Site or any Materials for any unsolicited commercial e-mail. Except as authorized in this paragraph, you are not being granted a license under any copyright, trademark, patent or other intellectual property right in the Materials or the products, services, processes or technology described therein. All such rights are retained by YELLOWPAGES.COM, its subsidiaries, parent companies, and/or any third party owner of such rights. Quote Link to comment Share on other sites More sharing options...
j5uh Posted June 10, 2008 Author Share Posted June 10, 2008 Or the script times out. And are you aware this is against the yellowpages TOS? HOW YOU MAY USE OUR MATERIALS: We use a diverse range of information, text, photographs, designs, graphics, images, sound and video recordings, animation and other materials and effects on the YELLOWPAGES.COM Web site. We provide the information, content or advertisements (which we collectively call the "Materials") on the YELLOWPAGES.COM site FOR YOUR PERSONAL, NON-COMMERCIAL USE ONLY. Accordingly, You may view, use, copy, and distribute the Materials found on YELLOWPAGES.COM Web sites for internal, noncommercial, informational purposes only. You are prohibited from data mining, scraping, crawling, or using any process or processes that send automated queries to the YELLOWPAGES.COM Web site. You may not use the YELLOWPAGES.COM Web sites to compile a collection of listings, including a competing listing product or service. You may not use the Site or any Materials for any unsolicited commercial e-mail. Except as authorized in this paragraph, you are not being granted a license under any copyright, trademark, patent or other intellectual property right in the Materials or the products, services, processes or technology described therein. All such rights are retained by YELLOWPAGES.COM, its subsidiaries, parent companies, and/or any third party owner of such rights. ooh. did not know this. but there are actual softwares being sold that does the scraping. How are they getting away with that? Quote Link to comment Share on other sites More sharing options...
discomatt Posted June 10, 2008 Share Posted June 10, 2008 Sold outside of US Jurisdictions would be my guess. People sell pirated software, music and videos all the time... doesn't make it legal. Quote Link to comment Share on other sites More sharing options...
j5uh Posted June 10, 2008 Author Share Posted June 10, 2008 well if this script is illegal than i am sorry. moderators, please remove it. if not someone help me out Quote Link to comment Share on other sites More sharing options...
jonsjava Posted June 10, 2008 Share Posted June 10, 2008 Violating a TOS is not the same as violating the law. With that said, still a bad idea. you could get banned from the site, or they could brow-beat your ISP to booting you (it happened to me once). Quote Link to comment Share on other sites More sharing options...
j5uh Posted June 10, 2008 Author Share Posted June 10, 2008 Violating a TOS is not the same as violating the law. With that said, still a bad idea. you could get banned from the site, or they could brow-beat your ISP to booting you (it happened to me once). ic. well if this is agains't phpfreaks forums TOS, please delete this thread. I don't want to cause any trouble. Quote Link to comment Share on other sites More sharing options...
jonsjava Posted June 10, 2008 Share Posted June 10, 2008 lol, I wasn't talking about this site, I was talking about yellowpages. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.