Jump to content

Allowed memory size of X bytes exhausted; large string parsing


JBud

Recommended Posts

Hey guys, I'm trying out website scraping for the first time (in particular on Craigslist, getting a list of jobs from certain areas), and I keep getting this error about the memory size. I've tried setting up methods with pure string manipulation:

 

// <p> Mar 11 - <a href="/van/sof/1639429290.html">Sr. Graphics SW Architect.</a> - </p>
function extractNextLink(&$code) {
	$keyBeginning = '<p>';
	$keyEnd = '</p>';
******    $code = stristr($code, $keyBeginning);
	if ($code === FALSE)
		return "a";

	$hitEnd = strpos($code, $keyEnd);
	if ($hitEnd === FALSE)
		return "b";

	$nextLink = substr($code, 0, $hitEnd);
	return $nextLink;  // <-- Note we don't fix $code since we're searching for <p><a href"  where <p> was already removed anyways (means less large string parsing)
}

 

and Regex

function getFirstElement(&$code) {
	$pattern = '/<p><a href="http:\/\/[a-zA-Z]{1,15}\.craigslist\.[a-z]{1,5}\/[a-z]{1,5}\/[a-z]{1,5}\/[0-9]{1,10}.html">/';
******    preg_match($pattern, $code, $matches, PREG_OFFSET_CAPTURE);
	$beginning = $matches[0][1] + strlen($matches[0][0]);

	$pattern = '/<\/a> - <font size="-1">/';
	preg_match($pattern, $code, $matches, PREG_OFFSET_CAPTURE);
	$end = -1;
	$endLen;
	foreach ($matches as $match) {
		$end = $match[1];
		if ($end > $beginning)
		{
			$endLen = strlen($match[0]);
			break;
		}
	}

	if ($end == -1)
		return "";

	$returnable = substr($code, $beginning, $end - $beginning);
	$code = substr($code, $end + $endLen);
	return $returnable;
}

 

 

The *'s represent where the error comes up in both methods. What am I doing wrong here?  How can I go about extracting these links from the code more efficiently?  Any ideas ??  Thanks =]

Hey Schilly, thanks for the tip. I should have mentioned it in my post, but I already came across this solution, and I'd really rather find a more efficient way to parse the html. Increasing the memory limit is only really a temporary solution  =[ 

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.