JBud Posted March 15, 2010 Share Posted March 15, 2010 Hey guys, I'm trying out website scraping for the first time (in particular on Craigslist, getting a list of jobs from certain areas), and I keep getting this error about the memory size. I've tried setting up methods with pure string manipulation: // <p> Mar 11 - <a href="/van/sof/1639429290.html">Sr. Graphics SW Architect.</a> - </p> function extractNextLink(&$code) { $keyBeginning = '<p>'; $keyEnd = '</p>'; ****** $code = stristr($code, $keyBeginning); if ($code === FALSE) return "a"; $hitEnd = strpos($code, $keyEnd); if ($hitEnd === FALSE) return "b"; $nextLink = substr($code, 0, $hitEnd); return $nextLink; // <-- Note we don't fix $code since we're searching for <p><a href" where <p> was already removed anyways (means less large string parsing) } and Regex function getFirstElement(&$code) { $pattern = '/<p><a href="http:\/\/[a-zA-Z]{1,15}\.craigslist\.[a-z]{1,5}\/[a-z]{1,5}\/[a-z]{1,5}\/[0-9]{1,10}.html">/'; ****** preg_match($pattern, $code, $matches, PREG_OFFSET_CAPTURE); $beginning = $matches[0][1] + strlen($matches[0][0]); $pattern = '/<\/a> - <font size="-1">/'; preg_match($pattern, $code, $matches, PREG_OFFSET_CAPTURE); $end = -1; $endLen; foreach ($matches as $match) { $end = $match[1]; if ($end > $beginning) { $endLen = strlen($match[0]); break; } } if ($end == -1) return ""; $returnable = substr($code, $beginning, $end - $beginning); $code = substr($code, $end + $endLen); return $returnable; } The *'s represent where the error comes up in both methods. What am I doing wrong here? How can I go about extracting these links from the code more efficiently? Any ideas ?? Thanks =] Link to comment https://forums.phpfreaks.com/topic/195362-allowed-memory-size-of-x-bytes-exhausted-large-string-parsing/ Share on other sites More sharing options...
schilly Posted March 15, 2010 Share Posted March 15, 2010 you can try upping the memory of PHP: ini_set('memory_limit','32M'); Link to comment https://forums.phpfreaks.com/topic/195362-allowed-memory-size-of-x-bytes-exhausted-large-string-parsing/#findComment-1026703 Share on other sites More sharing options...
JBud Posted March 15, 2010 Author Share Posted March 15, 2010 Hey Schilly, thanks for the tip. I should have mentioned it in my post, but I already came across this solution, and I'd really rather find a more efficient way to parse the html. Increasing the memory limit is only really a temporary solution =[ Link to comment https://forums.phpfreaks.com/topic/195362-allowed-memory-size-of-x-bytes-exhausted-large-string-parsing/#findComment-1026706 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.