steven fullman Posted October 25, 2009 Share Posted October 25, 2009 Hi, I have a script with some options. I use regex to replace patterns in strings, but I seem to be using them incorrectly, because they very quickly break my max_memory_limit (by several orders of magnitude) This is strange, because I'm dealing with maybe 10 simultaneous strings of 500 words max. I'm clearly causing some kind of overly recursive syntax, but I can't see how... Any help you could give me to programme this better (or tell me where I'm going wrong) would be most appreciated $string = "this would be about 500 words long"; $parts = $string; // $parts would normally be a substring of $string; wp_wordmash($parts); wp_synonymize($string); wp_keyword2url($string); //html stuff follows here... function wp_wordmash($parts) { $wordlist = file_get_contents('dictionary.txt', true); $dictionary = explode(",", $wordlist); $htmldictionary = array(); foreach($dictionary as $dicword) { $htmldictionary[] = wp_htmlcode($dicword); $htmldictionary_u[] = wp_htmlcode(strtoupper($dicword)); $htmldictionary_u1[] = wp_htmlcode(ucfirst($dicword)); $htmldictionary_ucwords[] = wp_htmlcode(ucwords($dicword)); } for($i=0;$i<count($dictionary);$i++){ $parts = preg_replace("/\b$dictionary[$i]\b/", $htmldictionary[$i], $parts); $parts = preg_replace("/\b" . strtoupper($dictionary[$i]) . "\b/", $htmldictionary_u[$i], $parts); $parts = preg_replace("/\b" . ucfirst($dictionary[$i]) . "\b/", $htmldictionary_u1[$i], $parts); $parts = preg_replace("/\b" . ucwords($dictionary[$i]) . "\b/", $htmldictionary_ucwords[$i], $parts); } return $parts; } function wp_htmlcode($string) { $buffer= NULL; for($i=0;$i<strlen($string);$i++) { $buffer .= "&#" . ord($string{$i}) . ";"; } return $buffer; } function wp_synonymize($string){ $buffer=$string; $synonymfile = file_get_contents('synonyms.txt', true); $synonyms = explode("\n", $synonymfile); for($i=0;$i<count($synonyms);$i++){ $synonymlist = explode(",", $synonyms[$i]); $oldword = $synonymlist[0]; $synonym = $synonymlist[1]; $synonym = str_replace("\r", '', $synonym); $buffer = preg_replace("/\b$oldword\b/", $synonym, $buffer); $buffer = preg_replace("/\b" . strtoupper($oldword) . "\b/", strtoupper($synonym), $buffer); $buffer = preg_replace("/\b" . ucfirst($oldword) . "\b/", ucfirst($synonym), $buffer); $buffer = preg_replace("/\b" . ucwords($oldword) . "\b/", ucwords($synonym), $buffer); } return $buffer; } function wp_keyword2url($string){ $buffer=$string; $keyword2urlfile = file_get_contents('keyword2url.txt', true); $keywords = explode("\n", $keyword2urlfile); for($i=0;$i<count($keywords);$i++){ $keywordlist = explode(",", $keywords[$i]); $keyword = $keywordlist[0]; $url = $keywordlist[1]; $url = str_replace("\r", '', $url); $buffer = preg_replace("/\b$keyword\b/", '<a href = "' . $url . '">' . $keyword . '</a>', $buffer); $buffer = preg_replace("/\b" . strtoupper($keyword) . "\b/", '<a href = "' . $url . '">' . strtoupper($keyword) . '</a>', $buffer); $buffer = preg_replace("/\b" . ucfirst($keyword) . "\b/", '<a href = "' . $url . '">' . ucfirst($keyword) . '</a>', $buffer); $buffer = preg_replace("/\b" . ucwords($keyword) . "\b/", '<a href = "' . $url . '">' . ucwords($keyword) . '</a>', $buffer); } return $buffer; } As I say, the string passed to these functions is typically < 500 words. I've also included the comparison files (dictionary.txt, synonyms.txt and keyword2URL.txt)...HERE I hope you can help...I'm 99% certain I'm using preg_replace() wrong...because if I substitute it with str_replace() then my memory issues disappear. Problem is, I like preg_replace because it gives me the word border functionality. I'm just obviously doing it wrong! Any thoughts? Kind regards, Steve P.S. Please feel free to mock & laugh at me...as long as you can show me a better way! And if you need any more info, please ask Quote Link to comment Share on other sites More sharing options...
dreamwest Posted October 25, 2009 Share Posted October 25, 2009 preg stuff uses more resources and should be used sparingly, explode and str_replace are better alternatives for looping data Im assuming the dictionary has over 500 words in it so: for($i=0;$i<count($dictionary);$i++){ $parts = preg_replace("/\b$dictionary[$i]\b/", $htmldictionary[$i], $parts); $parts = preg_replace("/\b" . strtoupper($dictionary[$i]) . "\b/", $htmldictionary_u[$i], $parts); $parts = preg_replace("/\b" . ucfirst($dictionary[$i]) . "\b/", $htmldictionary_u1[$i], $parts); $parts = preg_replace("/\b" . ucwords($dictionary[$i]) . "\b/", $htmldictionary_ucwords[$i], $parts); } Is executing for each word. Change it all to str_replace Quote Link to comment Share on other sites More sharing options...
steven fullman Posted October 26, 2009 Author Share Posted October 26, 2009 Thanks dreamwest, One of the reasons I'm using preg is that I can specify word borders...str_replace seems too limited in that way (i.e. I want to match the EXACT string only, not the string within other words). I've tried using spaces to distinguish the exact pattern, but I fall over at the beginning and end of sentences...and using commas, etc... Is there a way around these limitations? preg stuff uses more resources and should be used sparingly, explode and str_replace are better alternatives for looping data Im assuming the dictionary has over 500 words in it so: for($i=0;$i<count($dictionary);$i++){ $parts = preg_replace("/\b$dictionary[$i]\b/", $htmldictionary[$i], $parts); $parts = preg_replace("/\b" . strtoupper($dictionary[$i]) . "\b/", $htmldictionary_u[$i], $parts); $parts = preg_replace("/\b" . ucfirst($dictionary[$i]) . "\b/", $htmldictionary_u1[$i], $parts); $parts = preg_replace("/\b" . ucwords($dictionary[$i]) . "\b/", $htmldictionary_ucwords[$i], $parts); } Is executing for each word. Change it all to str_replace Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.