Jump to content

[SOLVED] Having a problem with my spider!


JSHINER

Recommended Posts

<?php
for ($n=0;$n<90;$n++)
{
   
$seed = "http://www.site.com/page.php?p=$n";
$data = file_get_contents($seed);

if (preg_match_all("/\http:[^\"\s']+/", $data, $links)) 

{

header("Content-type: text/plain");

     for ($i=0;$i<count($links[0]);$i++) 
     {

     echo $links[0][$i]. "\n";

     }
}
}
?>

 

This collects all the links on a page, and other pages based on # (1-89) - however I am having a few problems:

 

1) It does not seem to go through all 89 pages

2) It duplicates some email links - how can I limit to only one result per link so there are not duplicates?

Link to comment
https://forums.phpfreaks.com/topic/71599-solved-having-a-problem-with-my-spider/
Share on other sites

try this

<?php
header("Content-type: text/plain"); //only 1 header please!
$email = array();

for ($n=0;$n<90;$n++)
{
   
$seed = "http://www.site.com/page.php?p=$n";
$data = file_get_contents($seed);

if (preg_match_all("/\http:[^\"\s']+/", $data, $links)) 
{
     for ($i=0;$i<count($links[0]);$i++) 
     {

     $email[] =  $links[0][$i];

     }
}else{
echo "Skipped: $n<br>";
}
}
$newemail = array_unique($email);
echo "<pre>";
print_r($newemail);
?>

 

untested (written on the fly)

Thanks.

 

Now how can I get it to only display a result once? That one posted before wasn't in plain text and I'm not sure it's what I needed. I know a problem in there has something to do with the $n++ so it must hit pages more than once, so either a fix to displaying emails more than once, or the page problem would be much appreciated.

 

 

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.