Jump to content

Yellowpages Scraper


j5uh

Recommended Posts

I found this AWESOME yellowpages scraper online for free instead of paying someone to scrap the pages... http://www.scrapingpages.com/

 

I've tested the code here:


<?
ini_set('memory_limit', '99999M');
function createUrl($url,$lastnum)
{
$find = "?";
$trim = rtrim ($url,'a..z,A..Z,=,_,&');
$remove_to = strpbrk($trim, '?');
$number = 1;
$counter= 0;
while ($lastnum != $number)
{
$over = "?page=".$number."&";
$replace = str_replace($find,$over,$url);
$myArray[$counter] = $replace;
$number++;
$counter++;
}
return $myArray;
}





$url = "http://www.yellowpages.com/TX/Internet-Marketing-Advertising?search_mode=all&search_terms=seo";
$lastnum = 1 +1;
$url = createUrl($url,$lastnum);

function createList ($url ) {
$counter=0;
foreach ($url as $value)
{
$html=file_get_contents ($value);
$myArray[$counter] = $html;
$counter++;
}
return $myArray;
}
$list = createList($url);



foreach ($list as $value){
echo "<span style='width:8px; background:blue'> </span>";
preg_match_all ("/<div class=\"description\">([^`]*?)<\/div>/", $value, $matches);
foreach ($matches[0] as $match) {
preg_match ("/<h2>([^`]*?)<\/h2>/", $match, $temp);
preg_match ("/<p>([^`]*?)<\/p>/" , $match, $desc);
preg_match ("/<ul>([^`]*?)<\/ul>/" , $match, $num);

$title = $temp['1'];
$title = strip_tags(trim($title));

$description = $desc['1'];
$description = strip_tags(trim($description));

$phone = $num['1'];
$phone = strip_tags(trim($phone));



print "<b>$title</b>
<br>$description<br>
$phone<br>
<br>";
}
}
?>

 

Works great but how do I get it to search more than 50+ pages...? I want to scrape all the houston businesses but it times out at 50 or so pages. Is there a way to modify this code to search maybe 50 pages at a time or something? like scrape pages 1-50, than 51-100, etc. etc.

Link to comment
Share on other sites

Or the script times out.

 

And are you aware this is against the yellowpages TOS?

 

HOW YOU MAY USE OUR MATERIALS: We use a diverse range of information, text, photographs, designs, graphics, images, sound and video recordings, animation and other materials and effects on the YELLOWPAGES.COM Web site.

 

We provide the information, content or advertisements (which we collectively call the "Materials") on the YELLOWPAGES.COM site FOR YOUR PERSONAL, NON-COMMERCIAL USE ONLY.

 

Accordingly, You may view, use, copy, and distribute the Materials found on YELLOWPAGES.COM Web sites for internal, noncommercial, informational purposes only. You are prohibited from data mining, scraping, crawling, or using any process or processes that send automated queries to the YELLOWPAGES.COM Web site. You may not use the YELLOWPAGES.COM Web sites to compile a collection of listings, including a competing listing product or service. You may not use the Site or any Materials for any unsolicited commercial e-mail. Except as authorized in this paragraph, you are not being granted a license under any copyright, trademark, patent or other intellectual property right in the Materials or the products, services, processes or technology described therein. All such rights are retained by YELLOWPAGES.COM, its subsidiaries, parent companies, and/or any third party owner of such rights.

Link to comment
Share on other sites

Or the script times out.

 

And are you aware this is against the yellowpages TOS?

 

HOW YOU MAY USE OUR MATERIALS: We use a diverse range of information, text, photographs, designs, graphics, images, sound and video recordings, animation and other materials and effects on the YELLOWPAGES.COM Web site.

 

We provide the information, content or advertisements (which we collectively call the "Materials") on the YELLOWPAGES.COM site FOR YOUR PERSONAL, NON-COMMERCIAL USE ONLY.

 

Accordingly, You may view, use, copy, and distribute the Materials found on YELLOWPAGES.COM Web sites for internal, noncommercial, informational purposes only. You are prohibited from data mining, scraping, crawling, or using any process or processes that send automated queries to the YELLOWPAGES.COM Web site. You may not use the YELLOWPAGES.COM Web sites to compile a collection of listings, including a competing listing product or service. You may not use the Site or any Materials for any unsolicited commercial e-mail. Except as authorized in this paragraph, you are not being granted a license under any copyright, trademark, patent or other intellectual property right in the Materials or the products, services, processes or technology described therein. All such rights are retained by YELLOWPAGES.COM, its subsidiaries, parent companies, and/or any third party owner of such rights.

 

ooh. did not know this. but there are actual softwares being sold that does the scraping. How are they getting away with that?

Link to comment
Share on other sites

Violating a TOS is not the same as violating the law. With that said, still a bad idea. you could get banned from the site, or they could brow-beat your ISP to booting you (it happened to me once).

 

ic. well if this is agains't phpfreaks forums TOS, please delete this thread. I don't want to cause any trouble.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.