scarhand Posted September 15, 2008 Share Posted September 15, 2008 I am trying to make a guitar tab search engine. Here is my scraper script for one of the sites: <?php set_time_limit(0); mysql_connect($dbhost, $dbuser, $dbpass); mysql_select_db($dbname); function get ($a,$b,$c) { $y = explode($b,$a); $x = explode($c,$y[1]); return $x[0]; } function slug($str) { $str = strtolower(trim($str)); $str = preg_replace("/[^a-z0-9-]/", "-", $str); $str = preg_replace("/-+/", "-", $str); $str = rtrim($str, "-"); return $str; } for ($i = 1; $i <= 750000; $i++) { $content = file_get_contents("http://www.website.com/print.php?what=tab&id=$i"); if ($content != "tab not found") { $title = get($content, "<title>", "</title>"); $matches = array(); if (preg_match("/Bass Tab/", $title)) { preg_match('$([a-z ]+) Bass Tab([a-z ]+)by ([a-z ]+)$i', $title, $matches); $type = "Bass"; $song = $matches[1]; $band = $matches[3]; } else { preg_match('$([a-z ]+) Tab([a-z ]+)by ([a-z ]+)$i', $title, $matches); $type = "Guitar"; $song = $matches[1]; $band = $matches[3]; } $slug_song = slug($song); $slug_band = slug($band); $tab = get($content, "<pre>", "</pre>"); foreach ($remove as $value) { $tab = str_replace($value, "", $tab); } $tab = trim($tab); $sql = mysql_query("SELECT * FROM tabs WHERE band='$band' AND song='$song' ORDER BY version DESC LIMIT 1"); $sql_count = mysql_num_rows($sql); if ($sql_count == 0) { $version = 1; } else { while ($row = mysql_fetch_array($sql)) { $version = $row["version"]; } $version = $version + 1; } $date_posted = time(); mysql_query("INSERT INTO tabs (xid, poster_id, poster_name, type, band, slug_band, song, slug_song, tab, version, date_posted, is_approved) VALUES ('$i', '0', 'Guest', '$type', '$band', '$slug_band', '$song', '$slug_song', '$tab', '$version', '$date_posted', '1')"); echo " yes "; } else { echo " no "; } } ?> Now this is running into problems. It just stops inserting into the database after 1000, and sometimes it isn't even scraping the page content at all. Can I get some help? Link to comment https://forums.phpfreaks.com/topic/124285-need-help-with-scraper-script/ Share on other sites More sharing options...
xtopolis Posted September 19, 2008 Share Posted September 19, 2008 You may want to use cURL instead of filegetcontents.. It's possible the page could be timing out; cURL has an option for this. Also, if you're scraping a vast amount of data, perhaps your server is running into a memory limit? [more probable if your hosting is shared] Link to comment https://forums.phpfreaks.com/topic/124285-need-help-with-scraper-script/#findComment-646051 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.