Jump to content

Need help with scraper script


scarhand

Recommended Posts

I am trying to make a guitar tab search engine.

 

Here is my scraper script for one of the sites:

 

<?php

set_time_limit(0);



mysql_connect($dbhost, $dbuser, $dbpass);
mysql_select_db($dbname);



function get ($a,$b,$c)
{
  $y = explode($b,$a);
  $x = explode($c,$y[1]);
  
  return $x[0];
}

function slug($str)
{
$str = strtolower(trim($str));
$str = preg_replace("/[^a-z0-9-]/", "-", $str);
$str = preg_replace("/-+/", "-", $str);
$str = rtrim($str, "-");

return $str;
}



for ($i = 1; $i <= 750000; $i++) 
{
  $content = file_get_contents("http://www.website.com/print.php?what=tab&id=$i");

  if ($content != "tab not found")
  {
    $title = get($content, "<title>", "</title>");

    $matches = array();
    
    if (preg_match("/Bass Tab/", $title))
    {
      preg_match('$([a-z ]+) Bass Tab([a-z ]+)by ([a-z ]+)$i', $title, $matches);

      $type = "Bass";
      $song = $matches[1];
      $band = $matches[3];
      
    }
    else
    {
      preg_match('$([a-z ]+) Tab([a-z ]+)by ([a-z ]+)$i', $title, $matches);

      $type = "Guitar";
      $song = $matches[1];
      $band = $matches[3];
    }
    
    $slug_song = slug($song);
    $slug_band = slug($band);

    $tab = get($content, "<pre>", "</pre>");

    foreach ($remove as $value) 
    {
      $tab = str_replace($value, "", $tab);
    }

    $tab = trim($tab);

    $sql = mysql_query("SELECT * FROM tabs WHERE band='$band' AND song='$song' ORDER BY version DESC LIMIT 1");
    $sql_count = mysql_num_rows($sql);
    
    if ($sql_count == 0)
    {
      $version = 1;
    }
    else
    {
      while ($row = mysql_fetch_array($sql))
      {
        $version = $row["version"];
      }
    
      $version = $version + 1;
    }
    
    $date_posted = time();
    
    mysql_query("INSERT INTO tabs (xid, poster_id, poster_name, type, band, slug_band, song, slug_song, tab, version, date_posted, is_approved) VALUES ('$i', '0', 'Guest', '$type', '$band', '$slug_band', '$song', '$slug_song', '$tab', '$version', '$date_posted', '1')");
    
    echo " yes ";
  }
  else
  {
    echo " no ";
  }
}

?>

 

Now this is running into problems. It just stops inserting into the database after 1000, and sometimes it isn't even scraping the page content at all.

 

Can I get some help?

Link to comment
https://forums.phpfreaks.com/topic/124285-need-help-with-scraper-script/
Share on other sites

You may want to use cURL instead of filegetcontents..

 

It's possible the page could be timing out; cURL has an option for this.

Also, if you're scraping a vast amount of data, perhaps your server is running into a memory limit? [more probable if your hosting is shared]

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.