j5uh
-
Posts
58 -
Joined
-
Last visited
Never
Posts posted by j5uh
-
-
well if this script is illegal than i am sorry. moderators, please remove it.
if not someone help me out
-
Or the script times out.
And are you aware this is against the yellowpages TOS?
HOW YOU MAY USE OUR MATERIALS: We use a diverse range of information, text, photographs, designs, graphics, images, sound and video recordings, animation and other materials and effects on the YELLOWPAGES.COM Web site.We provide the information, content or advertisements (which we collectively call the "Materials") on the YELLOWPAGES.COM site FOR YOUR PERSONAL, NON-COMMERCIAL USE ONLY.
Accordingly, You may view, use, copy, and distribute the Materials found on YELLOWPAGES.COM Web sites for internal, noncommercial, informational purposes only. You are prohibited from data mining, scraping, crawling, or using any process or processes that send automated queries to the YELLOWPAGES.COM Web site. You may not use the YELLOWPAGES.COM Web sites to compile a collection of listings, including a competing listing product or service. You may not use the Site or any Materials for any unsolicited commercial e-mail. Except as authorized in this paragraph, you are not being granted a license under any copyright, trademark, patent or other intellectual property right in the Materials or the products, services, processes or technology described therein. All such rights are retained by YELLOWPAGES.COM, its subsidiaries, parent companies, and/or any third party owner of such rights.
ooh. did not know this. but there are actual softwares being sold that does the scraping. How are they getting away with that?
-
I found this AWESOME yellowpages scraper online for free instead of paying someone to scrap the pages... http://www.scrapingpages.com/
I've tested the code here:
<? ini_set('memory_limit', '99999M'); function createUrl($url,$lastnum) { $find = "?"; $trim = rtrim ($url,'a..z,A..Z,=,_,&'); $remove_to = strpbrk($trim, '?'); $number = 1; $counter= 0; while ($lastnum != $number) { $over = "?page=".$number."&"; $replace = str_replace($find,$over,$url); $myArray[$counter] = $replace; $number++; $counter++; } return $myArray; } $url = "http://www.yellowpages.com/TX/Internet-Marketing-Advertising?search_mode=all&search_terms=seo"; $lastnum = 1 +1; $url = createUrl($url,$lastnum); function createList ($url ) { $counter=0; foreach ($url as $value) { $html=file_get_contents ($value); $myArray[$counter] = $html; $counter++; } return $myArray; } $list = createList($url); foreach ($list as $value){ echo "<span style='width:8px; background:blue'> </span>"; preg_match_all ("/<div class=\"description\">([^`]*?)<\/div>/", $value, $matches); foreach ($matches[0] as $match) { preg_match ("/<h2>([^`]*?)<\/h2>/", $match, $temp); preg_match ("/<p>([^`]*?)<\/p>/" , $match, $desc); preg_match ("/<ul>([^`]*?)<\/ul>/" , $match, $num); $title = $temp['1']; $title = strip_tags(trim($title)); $description = $desc['1']; $description = strip_tags(trim($description)); $phone = $num['1']; $phone = strip_tags(trim($phone)); print "<b>$title</b> <br>$description<br> $phone<br> <br>"; } } ?>
Works great but how do I get it to search more than 50+ pages...? I want to scrape all the houston businesses but it times out at 50 or so pages. Is there a way to modify this code to search maybe 50 pages at a time or something? like scrape pages 1-50, than 51-100, etc. etc.
-
honestly, i wish it was easier to integrate paypal into a form.. but i have no experience with API's...
-
so could I do this?
<?php $referal = $_SERVER['HTTP_REFERER']; if (preg_match('~^https?://(.*?\.)?paypal.com/.*?$~D', $referal)) { <html> <body>paid content here</body> <html> } else { <html> <body>you must pay first... </body> <html> } ?>
-
good job!
are you addicted yet?
HIGHLY addicted. I've even bought a book and i'm reading about php everyday now.
-
Be aware that HTTP_REFERER can be modified by the user. But generally it would work (if a few users getting "unauthorized" access is OK). If you want to match someone coming from paypal.com, with or without possible sub domains and/or pages aside from the front page, you can use preg_match():
<?php $referal = $_SERVER['HTTP_REFERER']; if (preg_match('~^https?://(.*?\.)?paypal.com/.*?$~D', $referal)) { //they come from paypal.com } else { //they don't } ?>
I don't think the other script posted will work, since the URLs are short of a trailing slash and the "https" scheme. But I guess you were supposed to fill in the exact URLs yourself
So this script here is better with preg_match?
so if someone made a payment on paypal, they would be forwarded to this page and it should allow them to access it right?
I have no problem with just a few people sneaking by... I will review the list every couple weeks to make sure people have paid...
-
so would i code it this way...
<?php $allowed_referer = array("http://paypal.com", "http://phpfreaks.com"); //add the allowed sites in this array $referal = $_SERVER['HTTP_REFERER']; if (in_array($referal, $allowed_referer)){ //let them hit this page all my html code goes here } else{ do I just put a forward script here? } ?>
-
I have a page where people can subscribe to a subscription service. This page is accessible by the internet if you just typed the url in.
What I was wondering, is there a way I can use .htaccess to only allow that page to show if your coming from a paypal website... I know there should be a way to do this... :-X
-
I've finally figured it out. here's the final code to share it with the world.
#!/usr/bin/php <?php $db_host = "xxx"; $db_user = "xxx"; $db_pwd = "xxx"; $db_name = "xxx"; $db_table = "users"; $db_emailfield = "email"; mysql_connect($db_host, $db_user, $db_pwd); mysql_select_db($db_name); // read from stdin $fd = fopen("php://stdin", "r"); $email = ""; while (!feof($fd)) { $email .= fread($fd, 1024); } fclose($fd); function get_string_between($string, $start, $end){ $string = " ".$string; $ini = strpos($string,$start); if ($ini == 0) return ""; $ini += strlen($start); $len = strpos($string,$end,$ini) - $ini; return substr($string,$ini,$len); } $email = get_string_between($email, "<div class=3DSection1>", "</div>"); // handle email $lines = explode("\n", $email); // empty vars $from = ""; $subject = "your subject here"; $headers = ""; $message = ""; $splittingheaders = true; for ($i=0; $i < count($lines); $i++) { if ($splittingheaders) { // this is a header $headers .= $lines[$i]."\n"; // look out for special headers if (preg_match("/^Subject: (.*)/", $lines[$i], $matches)) { $subject = $matches[1]; } if (preg_match("/^From: (.*)/", $lines[$i], $matches)) { $from = $matches[1]; } } else { // not a header, but message $message .= $lines[$i]."\n"; } if (trim($lines[$i])=="") { // empty line, header section has ended $splittingheaders = false; } } $sql = "SELECT `$db_emailfield` FROM `$db_table`;"; $result = mysql_query($sql); while($row = mysql_fetch_assoc($result)){ $emails = $row['email']; $headers = 'MIME-Version: 1.0' . "\n"; $headers .= 'Content-type: text/html; charset=UTF-8' . "\n"; $headers .= "From: your address.com"; $ForwardTo = $emails; mail ($ForwardTo,$subject,$message,$headers); } ?>
-
ok one more issue. now I've stuck in some code to pull the email addresses from the db, but im getting this error here:
Fatal error: Cannot redeclare get_string_between() (previously declared in /home/newhost/public_html/asd/asd/mailer3.php:28) in /home/newhost/public_html/asd/asd/mailer3.php on line 28
here's the code I am using:
#!/usr/bin/php<?php
$db_host = "asd";
$db_user = "asd";
$db_pwd = "asd";
$db_name = "asd";
$db_table = "users";
$db_emailfield = "email";
mysql_connect($db_host, $db_user, $db_pwd);
mysql_select_db($db_name);
$sql = "SELECT `$db_emailfield` FROM `$db_table`;";
$result = mysql_query($sql);
while($row = mysql_fetch_assoc($result)){
$emails = $row['email'];
$ChangeTo = 'asd;
// read from stdin
$fd = fopen("php://stdin", "r");
$email = "";
while (!feof($fd)) {
$email .= fread($fd, 1024);
}
fclose($fd);
function get_string_between($string, $start, $end){
$string = " ".$string;
$ini = strpos($string,$start);
if ($ini == 0) return "";
$ini += strlen($start);
$len = strpos($string,$end,$ini) - $ini;
return substr($string,$ini,$len);
}
$email = get_string_between($email, "<div class=3DSection1>", "</div>");
// handle email
$lines = explode("\n", $email);
// empty vars
$from = "";
$subject = "Expert Advisors";
$headers = "";
$message = "";
$splittingheaders = true;
for ($i=0; $i < count($lines); $i++) {
if ($splittingheaders) {
// this is a header
$headers .= $lines[$i]."\n";
// look out for special headers
if (preg_match("/^Subject: (.*)/", $lines[$i], $matches)) {
$subject = $matches[1];
}
if (preg_match("/^From: (.*)/", $lines[$i], $matches)) {
$from = $matches[1];
}
} else {
// not a header, but message
$message .= $lines[$i]."\n";
}
if (trim($lines[$i])=="") {
// empty line, header section has ended
$splittingheaders = false;
}
}
$headers = 'MIME-Version: 1.0' . "\n";
$headers .= 'Content-type: text/html; charset=UTF-8' . "\n";
$headers .= "From: asd";
$ForwardTo = $emails;
mail ($ForwardTo,$subject,$message,$headers);
}
?>
-
99% there!!!
i typed this in the email nowei dowe dqwmopqwm
but it's showing up like this when i get it nowei dowe = dqwmopqwm
with an equal sign... how do I not make those appear?
-
ok sweet... now I've modified it even more and this is what I have:
#!/usr/bin/php <?php // read from stdin $fd = fopen("php://stdin", "r"); $email = ""; while (!feof($fd)) { $email .= fread($fd, 1024); } fclose($fd); function get_string_between($string, $start, $end){ $string = " ".$string; $ini = strpos($string,$start); if ($ini == 0) return ""; $ini += strlen($start); $len = strpos($string,$end,$ini) - $ini; return substr($string,$ini,$len); } // handle email $lines = explode("\n", $email); // empty vars $from = ""; $subject = ""; $headers = ""; $message = ""; $splittingheaders = true; for ($i=0; $i < count($lines); $i++) { if ($splittingheaders) { // this is a header $headers .= $lines[$i]."\n"; // look out for special headers if (preg_match("/^Subject: (.*)/", $lines[$i], $matches)) { $subject = $matches[1]; } if (preg_match("/^From: (.*)/", $lines[$i], $matches)) { $from = $matches[1]; } } else { // not a header, but message $message .= $lines[$i]."\n"; } if (trim($lines[$i])=="") { // empty line, header section has ended $splittingheaders = false; } } $headers = 'MIME-Version: 1.0' . "\n"; $headers .= 'Content-type: text/html; charset=UTF-8' . "\n"; $headers .= "From: xxx"; $ForwardTo = 'xxx'; mail ($ForwardTo,$subject,$message,$headers); ?>
and I'm getting this :
This is a multipart message in MIME format. ------=_NextPart_000_010F_01C8C663.3932C670 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Testing 1234 ------=_NextPart_000_010F_01C8C663.3932C670 Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Testing 1234 ------=_NextPart_000_010F_01C8C663.3932C670--
Which means I've stripped away all that using this line:
$headers = 'MIME-Version: 1.0' . "\n"; $headers .= 'Content-type: text/html; charset=UTF-8' . "\n";
But how can I get rid of that other mess?
-
ok nvm its not working.. lol
-
ok ... so what I'm getting as a result is a getting better. But still some crap that gets emailed... here it is...
From asd@a.com Wed Jun 04 16:19:47 2008 Received: from 123123123.dsl.hs123123obal.net ([123123]:1421 helo=prexxix) by gator465.hostgator.com with esmtpa (Exim 4.68) (envelope-from <asd@aa.com>) id 1K40OY-0001MR-T5 for oskdpsk@aol.com; Wed, 04 Jun 2008 16:19:47 -0500 From: "John" <12323@ao.com> To: <asdsd@aol.com> Subject: test Date: Wed, 4 Jun 2008 16:19:54 -0500 Message-ID: <123434##.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_00E7_01C8C65E.D3062580" X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: AcjGiLnI28Lp3X5tSLuLhc6u9F0ABA== Content-Language: en-us x-cr-hashedpuzzle: AB0P A5cy CTO+ CX7e CxSy DvBH ECBa HdzE Htgh Ic06 JKPY Jjka Jk2A KtuP LsxO L6XP;1;cwBpAGcAbgBhAGwAQABmAGkAbgBhAG4AYwBpAGEAbAAtAHIAbwBiAG8AdABpAGMAcwAuAGMAbwBtAA==;Sosha1_v1;7;{3A797A8D-B98B-4371-A084-67C4021C6B09};agAuAHMAdQBoAEAAZgBpAG4AYQBuAGMAaQBhAGwALQByAG8AYgBvAHQAaQBjAHMALgBjAG8AbQA=;Wed, 04 Jun 2008 21:19:51 GMT;dABlAHMAdAA= x-cr-puzzleid: {3A797A8D-B98B-4371-A084-67C4021C6B09} This is a multipart message in MIME format. ------=_NextPart_000_00E7_01C8C65E.D3062580 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Asidjasoijd asodj asod j ------=_NextPart_000_00E7_01C8C65E.D3062580 - Show quoted text - Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable <html xmlns:v=3D"urn:schemas-microsoft-com:vml" = xmlns:o=3D"urn:schemas-microsoft-com:office:office" = xmlns:w=3D"urn:schemas-microsoft-com:office:word" = xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" = xmlns=3D"http://www.w3.org/TR/REC-html40"> <head> <META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; = charset=3Dus-ascii"> <meta name=3DGenerator content=3D"Microsoft Word 12 (filtered medium)"> <style> <!-- /* Font Definitions */ @font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4;} @font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0in; margin-bottom:.0001pt; font-size:11.0pt; font-family:"Calibri","sans-serif";} a:link, span.MsoHyperlink {mso-style-priority:99; color:blue; text-decoration:underline;} a:visited, span.MsoHyperlinkFollowed {mso-style-priority:99; color:purple; text-decoration:underline;} span.EmailStyle17 {mso-style-type:personal-compose; font-family:"Calibri","sans-serif"; color:windowtext;} .MsoChpDefault {mso-style-type:export-only;} @page Section1 {size:8.5in 11.0in; margin:1.0in 1.0in 1.0in 1.0in;} div.Section1 {page:Section1;} --> </style> <!--[if gte mso 9]><xml> <o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" /> </xml><![endif]--><!--[if gte mso 9]><xml> <o:shapelayout v:ext=3D"edit"> <o:idmap v:ext=3D"edit" data=3D"1" /> </o:shapelayout></xml><![endif]--> </head> <body lang=3DEN-US link=3Dblue vlink=3Dpurple> <div class=3DSection1> <p class=3DMsoNormal>Asidjasoijd asodj asod j<o:p></o:p></p> </div> </body> </html> ------=_NextPart_000_00E7_01C8C65E.D3062580--
-
yes the numbers was what i sent through... i will look into what you sent and I report back my results =) so close i can smell victory!!!
-
This is what I get
From xxx@asd.com Wed Jun 04 14:44:08 2008Received: from adsl-02020202002.dsl.hstntx.sbcglobal.net ([00.00.00.00]:1234 helo=prexxix)
by gator465.hostgator.com with esmtpa (Exim 4.68)
(envelope-from <123@cs.com>)
id 1K3yu0-0005O6-Hn
for 123@cs.com; Wed, 04 Jun 2008 14:44:08 -0500
From: "john" <123@cs.com>
To: <aasl@cs.com>
Subject: 12323213123
Date: Wed, 4 Jun 2008 14:44:15 -0500
Message-ID: <asdioj@sokd.com>
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary="----=_NextPart_000_00E2_01C8C651.766A7CC0"
X-Mailer: Microsoft Office Outlook 12.0
Thread-Index: AcjGe11qMOjrcJG4R7amtFB1Cvy/Uw==
Content-Language: en-us
x-cr-hashedpuzzle: yYE= AdPi Axu3 BAAg ECQt EaW5 EbYl Ey7J E3EF FHiq GPV0 HF4a IH/L I8i8 JXAS KX8S;1;cwBpAGcAbgBhAGwAQABmAGkAbgBhAG4AYwBpAGEAbAAtAHIAbwBiAG8AdABpAGMAcwAuAGMAbwBtAA==;Sosha1_v1;7;{FFA2519B-E285-46B0-92BB-9425F2DC2D68};agAuAHMAdQBoAEAAZgBpAG4AYQBuAGMAaQBhAGwALQByAG8AYgBvAHQAaQBjAHMALgBjAG8AbQA=;Wed, 04 Jun 2008 19:44:13 GMT;MQAyADMAMgAzADIAMQAzADEAMgAzAA==
x-cr-puzzleid: {FFA2519B-E285-46B0-92BB-9425F2DC2D68}
This is a multipart message in MIME format.
------=_NextPart_000_00E2_01C8C651.766A7CC0
Content-Type: text/plain;
charset="us-ascii"
Content-Transfer-Encoding: 7bit
1231231232
------=_NextPart_000_00E2_01C8C651.766A7CC0
Content-Type: text/html;
charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
<html xmlns:v=3D"urn:schemas-microsoft-com:vml" =
xmlns:o=3D"urn:schemas-microsoft-com:office:office" =
xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" =
xmlns=3D"http://www.w3.org/TR/REC-html40">
<head>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Dus-ascii">
<meta name=3DGenerator content=3D"Microsoft Word 12 (filtered medium)">
<style>
<!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri","sans-serif";
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;}
@page Section1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.Section1
{page:Section1;}
-->
</style>
<!--[if gte mso 9]><xml>
<o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext=3D"edit">
<o:idmap v:ext=3D"edit" data=3D"1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang=3DEN-US link=3Dblue vlink=3Dpurple>
<div class=3DSection1>
<p class=3DMsoNormal>1231231232<o:p></o:p></p>
</div>
</body>
</html>
------=_NextPart_000_00E2_01C8C651.766A7CC0--
-
ok so now I have stripped it down and followed the direction here at http://www.evolt.org/article/Incoming_Mail_and_PHP/18/27914/index.html
Here is the code that I am using now:
#!/usr/bin/php <?php // read from stdin $fd = fopen("php://stdin", "r"); $email = ""; while (!feof($fd)) { $email .= fread($fd, 1024); } fclose($fd); // handle email $lines = explode("\n", $email); // empty vars $from = ""; $subject = ""; $headers = ""; $message = ""; $splittingheaders = true; for ($i=0; $i < count($lines); $i++) { if ($splittingheaders) { // this is a header $headers .= $lines[$i]."\n"; // look out for special headers if (preg_match("/^Subject: (.*)/", $lines[$i], $matches)) { $subject = $matches[1]; } if (preg_match("/^From: (.*)/", $lines[$i], $matches)) { $from = $matches[1]; } } else { // not a header, but message $message .= $lines[$i]."\n"; } if (trim($lines[$i])=="") { // empty line, header section has ended $splittingheaders = false; } } $ForwardTo = 'aaaa@gmail.com'; mail ($ForwardTo,$subject,$message,$headers); ?>
The email piping WORKS! but i'm getting bunch of MS Word html mess along with the email. is there a way to filter all that out and just have the body of the email?
-
I will offer $15 by paypal whoever can solve this issue for me....
-
I'm getting the same results here... no luck! = /
-
I've tried that too... have any other ideas?
-
any other ideas that my be wrong with the script? I'm soooo close to finishing this project up!
-
ok i've tested $message = "testing msg"; and it sends the msg through.
The following part of the script:
$splittingheaders = true; for ($i=0;$i<count($lines);$i++) { if ($splittingheaders) { if (preg_match("/^From: (.*)/",$lines[$i],$matches)) { if (strpos($lines[$i],"<")) { // The name is before the email $data = explode ("<",$lines[$i]); $from = substr(trim($data[1]),0,-1); } else { $from = $matches[1]; } } if (preg_match("/^Subject: (.*)/",$lines[$i],$matches)) { $subject = $matches[1]; } } else { $message .= $lines[$i]."\n"; } if (trim($lines[$i]=="")) { $splittingheaders = false; } } $message = <<< EOF $message EOF; $headers = "Content-type: text/html\n"; $headers .= "From: $from\n"; $headers .= "Return-Path: $from\n"; //$headers .= "To: $to\n";
I don't understand at all... ???
-
So then shouldn't $message =$email
I've tried that too....
Yellowpages Scraper
in PHP Coding Help
Posted
ic. well if this is agains't phpfreaks forums TOS, please delete this thread. I don't want to cause any trouble.