Jump to content

PHP Script Taking Ages - Precautions?


jamesbrauman

Recommended Posts

Hello how is everyone?  ;)

 

I have a php script which downloads data from a website, formats it, and saves it to my mysql database. It has a large amount of info to download from various pages (which I do through loops) and the pages it visits to get this information totals around 3000. For the script to run from start to end takes about an hour. I acheive php not stopping after a minute by using "set_time_limit(0)" which effectively removes the time limit.

 

My concerns are this: For every page that is opened then formatted (around 3000) I establish a new connection with my database, insert the data, then close the database. Is this going to be a problem and what can I do to safeguard against database corruption?

 

Also is there anything that I can do to make my script more efficient (seems it handles such a large amount of data)?

 

My script:

<?PHP
set_time_limit(0);
//readContents function - returns a string containing the source of a webpage
function readContents ($sourceURL) {
	if ($readStream = fopen($sourceURL, 'r')) {
		return stream_get_contents($readStream);
	} else {
		return false;
	}
}
//stripArray function - strips all useless entries from the array.
function stripArray ($sourceArray) {
	foreach($sourceArray as $key => $value) {
		if($value == "" || $value == " " || is_null($value) || substr_count($value, "HREF")) {
			unset($SourceArray[$key]);
		}
	}
}
//function getStringBetween - returns a string that was found between $str1 andn $str2
function getStringBetween($input,$str1,$str2,$offset=0){	if( $str1 != '' && $str2 != '' && $input != ''){		$p1 = strpos($input,$str1,$offset);		$p2 = strpos($input,$str2,$p1+1); 		if(is_numeric($p1) && is_numeric($p2)){			$p3 = substr($input, $p1+strlen($str1), $p2-$p1-strlen($str2));		if(strlen($p3)>0)			return $p3;		else			return false;		}	}	return false;}
//function stripData - strips useless data from string.
function stripData($sourceString, $stripArray, $replace="") {
	foreach($stripArray as $key => $value) {
		$sourceString = str_ireplace($value, $replace, $sourceString);
	}
	return $sourceString;
} 
//Start off by defining the list of categories...
$categoryPageList = array("http://www.jokesgallery.com/categories.php?category=CleanBlondes",
						 "http://www.jokesgallery.com/categories.php?category=CleanDeepThoughts",
						 "http://www.jokesgallery.com/categories.php?category=CleanMale",
						 "http://www.jokesgallery.com/categories.php?category=CleanRedneck",
						 "http://www.jokesgallery.com/categories.php?category=CleanChildren",
						 "http://www.jokesgallery.com/categories.php?category=CleanFemale",
						 "http://www.jokesgallery.com/categories.php?category=CleanMiscellaneous",
						 "http://www.jokesgallery.com/categories.php?category=CleanReligious",
						 "http://www.jokesgallery.com/categories.php?category=CleanComputers",
						 "http://www.jokesgallery.com/categories.php?category=CleanLawyer",
						 "http://www.jokesgallery.com/categories.php?category=CleanPolitical",
						 "http://www.jokesgallery.com/categories.php?category=CleanYoMama",
						 "http://www.jokesgallery.com/categories.php?category=CleanOneLiners");
//Then define what each page represents
$categoryDescList = array("BLONDES", "DEEP THOUGHTS", "MALE", "REDNECK", "CHILDREN", "FEMALE",
						  "MISCELLANEOUS", "RELIGIOUS", "COMPUTERS", "LAWYER", "POLITICAL",
						  "YO MAMA", "ONE LINERS");
//Start the loop which will search each page for the list of jokes.
foreach ($categoryPageList as $key => $value) {
	$curCategoryPageSource = readContents($value);
	$curCategoryPageSource = getStringBetween($curCategoryPageSource, "Category</b></font></font></td></tr></table>", "        </tr>\n      </table>\n<p>\n<table width=\"380\"");
	$curCategoryPageSource = stripData($curCategoryPageSource, array(
	"<BR>", "<font face=Arial, Helvetica, sans-serif size=2>", " class=one", "<b>", "</b>", "</font>", "<i>", "</i>",
	"<font face=Arial, Helvetica, sans-serif size=1>"
	));
	$curCategoryLinkArray = explode("</a> ", $curCategoryPageSource);
	foreach ($curCategoryLinkArray as $key2 => $value2) {
		if (substr_count($value2, "Average Votes:") != 0) {
			$curCategoryLinkArray[$key2] = substr($value2, 19, strlen($value2) - 19);
		}
		$curCategoryLinkArray[$key2] = trim($curCategoryLinkArray[$key2]);
		$curCategoryLinkArray[$key2] = substr($curCategoryLinkArray[$key2], 0, strpos($curCategoryLinkArray[$key2], ">"));
		$curCategoryLinkArray[$key2] = substr($curCategoryLinkArray[$key2], strpos($curCategoryLinkArray[$key2], "=")+1, strlen($curCategoryLinkArray[$key2])-strpos($curCategoryLinkArray[$key2], "=")+1);
	}
	//Visit each page and obtain the joke.
	foreach ($curCategoryLinkArray as $thekey => $thevalue) {
		$jokePageSource = readContents($curCategoryLinkArray[$thekey]);
		$joke = getStringBetween($jokePageSource, "</font></b></P>\n<P><font size=2 face=Verdana, Arial, Helvetica, sans-serif>", "</font></P>\n<p><a href=\"#\" onclick");
		$joke = stripData($joke, array("</font>", "</P>", "<p><a href=\"#\" onclick=\"Print"));

		//Put it in the database.
		mysql_connect("localhost", "root", "");
		mysql_select_db("laughpolice");
		$joke = mysql_real_escape_string($joke);
		$sqlquery = "INSERT INTO jokedata (joke, category, datesubmitted) VALUES ('$joke','".$categoryDescList[$key]."', CURDATE())";
		mysql_query($sqlquery) or die(mysql_error());
		mysql_close();
	}
}
?>

 

Thankyou for your time  :)

JamesBrauman

Link to comment
https://forums.phpfreaks.com/topic/120679-php-script-taking-ages-precautions/
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.