Jump to content

Help with scraper for spider


Joeker

Recommended Posts

I'm working on a simple search engine, but I am having a problem with the scraper I wrote.

 

All I want to do is get the page using file_get_contents and insert the title content into my database.

 

The problem is, since its scraping so many pages the script is timing out or not scraping properly.

 

Heres my code:

 

<?php

set_time_limit(0);

$dbhost = "localhost"; 
$dbuser = "*****";
$dbpass = "*****";
$dbname = "*****";

mysql_connect($dbhost, $dbuser, $dbpass);
mysql_select_db($dbname);

function get ($a,$b,$c)
{
$y = explode($b,$a);
$x = explode($c,$y[1]);

return $x[0];
}

for ($i = 1; $i <= 1000000; $i++) 
{
$content = file_get_contents("http://www.website.com/page.php?id=$i");

$title = get($content, "<title>", "</title>");

mysql_query("INSERT INTO spider (title) VALUES ('$title')");
}

?>

 

Anyone?

Link to comment
https://forums.phpfreaks.com/topic/120700-help-with-scraper-for-spider/
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.