darkvengance Posted November 10, 2009 Share Posted November 10, 2009 I am building a spider that will crawl through random whitepages (eg. anywho.com, switchboard.com, whitepages.com, etc..) and collect the information on the people found there and throw it into a database. So far I've only made this little prototype, however after trying to run it I've run into a bunch of problems....a lot of them I fixed but there are some with the expressions that I can't figure out. Here are the errors: Warning: preg_match_all() [function.preg-match-all]: Compilation failed: missing ) at offset 57 in /home/public_html/spider/inc/anywho.class.php on line 51 Warning: preg_match_all() [function.preg-match-all]: Delimiter must not be alphanumeric or backslash in /home/public_html/spider/inc/anywho.class.php on line 72 Warning: preg_match_all() [function.preg-match-all]: No ending delimiter '^' found in /home/public_html/spider/inc/anywho.class.php on line 73 Warning: preg_match() [function.preg-match]: No ending delimiter '^' found in /home/public_html/spider/inc/anywho.class.php on line 76 Warning: preg_replace() [function.preg-replace]: No ending delimiter '.' found in /home/public_html/spider/inc/anywho.class.php on line 92 Warning: preg_replace() [function.preg-replace]: No ending delimiter '^' found in /home/public_html/spider/inc/anywho.class.php on line 93 Warning: preg_replace() [function.preg-replace]: No ending delimiter '.' found in /home/public_html/spider/inc/anywho.class.php on line 94 Warning: preg_replace() [function.preg-replace]: No ending delimiter '^' found in /home/public_html/spider/inc/anywho.class.php on line 95 Warning: preg_replace() [function.preg-replace]: No ending delimiter '*' found in /home/public_html/spider/inc/anywho.class.php on line 96 Along with these it isn't printing out the info like it is suppose to on line 56 of anywho.class.php As to the fact that these are two files and a little bigger then the normal "snippet" I posted them both in a pin board. The links are below. Spider Class: http://www.coderprofile.com/networks/code-pin-board/258/spiderclassphp Anywho Class: http://www.coderprofile.com/networks/code-pin-board/257/anywhospiderclassphp And here is the source of the form page: <?php require("spider.class.php"); require("anywho.class.php"); $spider=new spider("Lorem Ipsum","Lorem Ipsum","Lorem Ipsum","localhost",15); $any=new anywho; if(isset($_POST['submit'])){ $state=$_POST['state']; $last=$_POST['last']; $first = (isset($_POST['first'])) ? $_POST['first'] : null; $street = (isset($_POST['street'])) ? $_POST['street'] : null; $zip = (isset($_POST['zip'])) ? $_POST['zip'] : null; $any->initialize($last,$state,$first,$street,$city,$zip); $any->any_crawl($any->url,0,1); } ?> <form action="index.php" method="post"> Last Name: <input type="text" name="last">*<br> First Name: <input type="text" name="first"><br> Street: <input type="text" name="street"><br> Zip: <input type="text" name="zip"><br> State: <select name="state" style="height:17px; font-size:9px;"> <option value="">Select a State</option> <option value="AL" selected="selected" >Alabama</option> ........................... ........................... <option value="WY">Wyoming</option> </select>*<br><br> <input type="submit" value="Crawl" name="submit"> </form> I'm really sorry about the messy code and poor documentation. Also I really appreciate any and all replies! Quote Link to comment https://forums.phpfreaks.com/topic/180951-troubles-with-a-spider-class/ Share on other sites More sharing options...
dreamwest Posted November 10, 2009 Share Posted November 10, 2009 What do you want the spider to get from this url: http://whitepages.anywho.com/results.php?ReportType=34 eg links, meta tags etc... Quote Link to comment https://forums.phpfreaks.com/topic/180951-troubles-with-a-spider-class/#findComment-954658 Share on other sites More sharing options...
dreamwest Posted November 10, 2009 Share Posted November 10, 2009 Also file_get_contents() is crappy if you dont know what pages your going after. If the page redirects youll get nothing. Curl is better for this. If you definitley know the page is http://site.com then file_get_contents() is ok but if it redirects http://site.com -> http://www.site.com then your screwed Quote Link to comment https://forums.phpfreaks.com/topic/180951-troubles-with-a-spider-class/#findComment-954661 Share on other sites More sharing options...
darkvengance Posted November 10, 2009 Author Share Posted November 10, 2009 Oh no...when the form is completed it finishes the url eg. http://whitepages.anywho.com/results.php?ReportType=34&qi=0&qk=10&qn=A&qs=AK And this is going to spider the people (names, phone numbers, addresses...etc) from the pages to follow. Yes I understand, but I know for a fact that it is going to be anywho.com as I have designed it only to follow the links to another anywho.com page. Quote Link to comment https://forums.phpfreaks.com/topic/180951-troubles-with-a-spider-class/#findComment-954662 Share on other sites More sharing options...
darkvengance Posted November 10, 2009 Author Share Posted November 10, 2009 *bump* aww comeone people where's the love? haha...but seriously, any help on this problem would be greatly appreciated, and don't worry it was on the top of the 3rd page when I bumped it. Quote Link to comment https://forums.phpfreaks.com/topic/180951-troubles-with-a-spider-class/#findComment-954994 Share on other sites More sharing options...
dreamwest Posted November 10, 2009 Share Posted November 10, 2009 Welll forgetting about the class, this will get the info you want. Your not getting multiple items per page so you wont need preg_match_all() $u = 'http://whitepages.anywho.com/results.php?ReportType=34&qi=0&qk=10&qn=A&qs=AK'; $g = @file_get_contents($u); //remove @ to show errors $a = explode('<span class="singleName">',$g); $a = explode('</div></div>',$a[1]); $a= $a[0]; echo $a; echo '<hr>Done!'; Just str_replace() the divs so you can put it into the database Quote Link to comment https://forums.phpfreaks.com/topic/180951-troubles-with-a-spider-class/#findComment-955160 Share on other sites More sharing options...
darkvengance Posted November 11, 2009 Author Share Posted November 11, 2009 ...but I am getting multiple items per page... Quote Link to comment https://forums.phpfreaks.com/topic/180951-troubles-with-a-spider-class/#findComment-955307 Share on other sites More sharing options...
darkvengance Posted November 12, 2009 Author Share Posted November 12, 2009 *bump* This time it made it to the middle of the fourth page....so am I really on my own on this one? Quote Link to comment https://forums.phpfreaks.com/topic/180951-troubles-with-a-spider-class/#findComment-955929 Share on other sites More sharing options...
dreamwest Posted November 12, 2009 Share Posted November 12, 2009 *bump* This time it made it to the middle of the fourth page....so am I really on my own on this one? Properly because of the type of content your after - ppl's private info. Im not going to ask what you intend to do with it. preg_match_all() will definatley work with a foreach loop Quote Link to comment https://forums.phpfreaks.com/topic/180951-troubles-with-a-spider-class/#findComment-955966 Share on other sites More sharing options...
darkvengance Posted November 12, 2009 Author Share Posted November 12, 2009 Properly because of the type of content your after - ppl's private info. Im not going to ask what you intend to do with it. Well technically it's not private info, considering if you were to just look in a phone book you could easily find the same info (name, phone number, and address)...but I guess you are right. Thank you anyways, I guess I'll work on it some more on my own and if I can't figure it out I'll just hire someone to fix it for me. Quote Link to comment https://forums.phpfreaks.com/topic/180951-troubles-with-a-spider-class/#findComment-955995 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.