JSHINER Posted October 4, 2007 Share Posted October 4, 2007 I reposted this from "Regex within PHP" because I feel this is a PHP lproblem not Regex. You can view the previous posts here: http://www.phpfreaks.com/forums/index.php/topic,161945.0.html And what I am trying to do is start at a pre-defined page, find all the links on that page and run the spider on all of the pages that were found, and return the results from those pages that were spidered. Here is the code I have been working with: <?php $seed = "http://www.site.com/sub/index.htm"; $data = file_get_contents($seed); if (preg_match_all("/http:\/\/www.site.com\/sub\/[^\"\s']+/", $data, $links)) { for ($i=0;$i<count($links[0]);$i++) { $data_b = file_get_contents($links[0][$i]); if (preg_match_all("/\http:[^\"\s']+/", $data_b, $links_b)) { header("Content-type: text/plain"); for ($i_b=0;$i_b<count($links_b[0]);$i_b++) { echo $links_b[0][$i_b]. "\n"; } } } } ?> Any help would be much appreciated! PS: Sorry if moving myself was a problem, but the Regex forums do not get the action the PHP ones get, AND I know my preg_match_all work on single pages, so I figure the problem must be in the PHP logic. Quote Link to comment https://forums.phpfreaks.com/topic/71840-solved-help-with-a-simple-spider/ Share on other sites More sharing options...
trq Posted October 4, 2007 Share Posted October 4, 2007 so I figure the problem must be in the PHP logic. And the problem is what? Quote Link to comment https://forums.phpfreaks.com/topic/71840-solved-help-with-a-simple-spider/#findComment-361845 Share on other sites More sharing options...
JSHINER Posted October 4, 2007 Author Share Posted October 4, 2007 I get no results when I know there are some Can you see any problem with that code? Quote Link to comment https://forums.phpfreaks.com/topic/71840-solved-help-with-a-simple-spider/#findComment-361854 Share on other sites More sharing options...
BlueSkyIS Posted October 4, 2007 Share Posted October 4, 2007 maybe echo $links[0][$i] and see what it looks like. Quote Link to comment https://forums.phpfreaks.com/topic/71840-solved-help-with-a-simple-spider/#findComment-361859 Share on other sites More sharing options...
JSHINER Posted October 4, 2007 Author Share Posted October 4, 2007 Ok just about there. Got it all figured out except if there is a link on the first page twice, I only need it spidered once. How can I do that? Here is the current code: <?php $seed = "http://www.site.com/page.html"; $data = file_get_contents($seed); if (preg_match_all(/\http:[^\"\s']+/", $data, $links)) { for ($i=0;$i<count($links[0]);$i++) { $data_b = file_get_contents('http://www.site.com/sub/'. $links[0][$i]); if (preg_match_all(/\http:[^\"\s']+/", $data_b, $links_b)) { @header("Content-type: text/plain"); for ($i_b=0;$i_b<count($links_b[0]);$i_b++) { echo $links_b[0][$i_b]. "\n"; } } } } ?> Quote Link to comment https://forums.phpfreaks.com/topic/71840-solved-help-with-a-simple-spider/#findComment-361889 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.