Why Are These Functions Causing MASSIVE Memory Problems? Please Help!

steven fullman · October 25, 2009

Hi,

I have a script with some options.

I use regex to replace patterns in strings, but I seem to be using them incorrectly, because they very quickly break my max_memory_limit (by several orders of magnitude)

This is strange, because I'm dealing with maybe 10 simultaneous strings of 500 words max.

I'm clearly causing some kind of overly recursive syntax, but I can't see how...

Any help you could give me to programme this better (or tell me where I'm going wrong) would be most appreciated



$string = "this would be about 500 words long";
$parts = $string; // $parts would normally be a substring of $string;

wp_wordmash($parts);
wp_synonymize($string);
wp_keyword2url($string);

//html stuff follows here...


function wp_wordmash($parts) {

$wordlist = file_get_contents('dictionary.txt', true);
$dictionary = explode(",", $wordlist);
$htmldictionary = array();	
foreach($dictionary as $dicword) {
	$htmldictionary[] = wp_htmlcode($dicword);
	$htmldictionary_u[] = wp_htmlcode(strtoupper($dicword));
	$htmldictionary_u1[] = wp_htmlcode(ucfirst($dicword));
	$htmldictionary_ucwords[] = wp_htmlcode(ucwords($dicword));
} 	
for($i=0;$i<count($dictionary);$i++){

	$parts = preg_replace("/\b$dictionary[$i]\b/", $htmldictionary[$i], $parts);
	$parts = preg_replace("/\b" . strtoupper($dictionary[$i]) . "\b/", $htmldictionary_u[$i], $parts);
	$parts = preg_replace("/\b" . ucfirst($dictionary[$i]) . "\b/", $htmldictionary_u1[$i], $parts);
	$parts = preg_replace("/\b" . ucwords($dictionary[$i]) . "\b/", $htmldictionary_ucwords[$i], $parts);
}	
return $parts;
}

function wp_htmlcode($string) { 

$buffer= NULL;	
for($i=0;$i<strlen($string);$i++) { 
	$buffer .= "&#" . ord($string{$i}) . ";"; 
} 	
return $buffer;
}

function wp_synonymize($string){

$buffer=$string;
$synonymfile = file_get_contents('synonyms.txt', true);
$synonyms = explode("\n", $synonymfile);
for($i=0;$i<count($synonyms);$i++){
	$synonymlist = explode(",", $synonyms[$i]);
	$oldword = $synonymlist[0];
	$synonym = $synonymlist[1];
	$synonym = str_replace("\r", '', $synonym);
	$buffer = preg_replace("/\b$oldword\b/", $synonym, $buffer);
	$buffer = preg_replace("/\b" . strtoupper($oldword) . "\b/", strtoupper($synonym), $buffer);
	$buffer = preg_replace("/\b" . ucfirst($oldword) . "\b/", ucfirst($synonym), $buffer);
	$buffer = preg_replace("/\b" . ucwords($oldword) . "\b/", ucwords($synonym), $buffer);
	}
return $buffer;
}

function wp_keyword2url($string){

$buffer=$string;
$keyword2urlfile = file_get_contents('keyword2url.txt', true);
$keywords = explode("\n", $keyword2urlfile);
for($i=0;$i<count($keywords);$i++){
	$keywordlist = explode(",", $keywords[$i]);
	$keyword = $keywordlist[0];
	$url = $keywordlist[1];
	$url = str_replace("\r", '', $url);
	$buffer = preg_replace("/\b$keyword\b/", '<a href = "' . $url . '">' . $keyword . '</a>', $buffer);
	$buffer = preg_replace("/\b" . strtoupper($keyword) . "\b/", '<a href = "' . $url . '">' . strtoupper($keyword) . '</a>', $buffer);
	$buffer = preg_replace("/\b" . ucfirst($keyword) . "\b/", '<a href = "' . $url . '">' . ucfirst($keyword) . '</a>', $buffer);
	$buffer = preg_replace("/\b" . ucwords($keyword) . "\b/", '<a href = "' . $url . '">' . ucwords($keyword) . '</a>', $buffer);
	}
return $buffer;
}

As I say, the string passed to these functions is typically < 500 words.

I've also included the comparison files (dictionary.txt, synonyms.txt and keyword2URL.txt)...HERE

I hope you can help...I'm 99% certain I'm using preg_replace() wrong...because if I substitute it with str_replace() then my memory issues disappear.

Problem is, I like preg_replace because it gives me the word border functionality.

I'm just obviously doing it wrong!

Any thoughts?

Kind regards,

Steve

P.S. Please feel free to mock & laugh at me...as long as you can show me a better way!

And if you need any more info, please ask

dreamwest · October 25, 2009

preg stuff uses more resources and should be used sparingly, explode and str_replace are better alternatives for looping data

Im assuming the dictionary has over 500 words in it so:

 for($i=0;$i<count($dictionary);$i++){
      
      $parts = preg_replace("/\b$dictionary[$i]\b/", $htmldictionary[$i], $parts);
      $parts = preg_replace("/\b" . strtoupper($dictionary[$i]) . "\b/", $htmldictionary_u[$i], $parts);
      $parts = preg_replace("/\b" . ucfirst($dictionary[$i]) . "\b/", $htmldictionary_u1[$i], $parts);
      $parts = preg_replace("/\b" . ucwords($dictionary[$i]) . "\b/", $htmldictionary_ucwords[$i], $parts);
   }

Is executing for each word. Change it all to str_replace

steven fullman · October 26, 2009

Thanks dreamwest,

One of the reasons I'm using preg is that I can specify word borders...str_replace seems too limited in that way (i.e. I want to match the EXACT string only, not the string within other words).

I've tried using spaces to distinguish the exact pattern, but I fall over at the beginning and end of sentences...and using commas, etc...

Is there a way around these limitations?

preg stuff uses more resources and should be used sparingly, explode and str_replace are better alternatives for looping data

Im assuming the dictionary has over 500 words in it so:
 for($i=0;$i<count($dictionary);$i++){
      
      $parts = preg_replace("/\b$dictionary[$i]\b/", $htmldictionary[$i], $parts);
      $parts = preg_replace("/\b" . strtoupper($dictionary[$i]) . "\b/", $htmldictionary_u[$i], $parts);
      $parts = preg_replace("/\b" . ucfirst($dictionary[$i]) . "\b/", $htmldictionary_u1[$i], $parts);
      $parts = preg_replace("/\b" . ucwords($dictionary[$i]) . "\b/", $htmldictionary_ucwords[$i], $parts);
   } 
Is executing for each word. Change it all to str_replace

Sign In

Why Are These Functions Causing MASSIVE Memory Problems? Please Help!

Recommended Posts

steven fullman

Link to comment

Share on other sites

dreamwest

Link to comment

Share on other sites

steven fullman

Link to comment

Share on other sites

Join the conversation

Browse

Activity

Important Information