Omzy Posted July 21, 2009 Share Posted July 21, 2009 Basically I'm trying to dynamically generate <meta keywords> tag. Let's say I got a string like this: $description="SanDisk’s Memory Stick PRO Duo is half the size of a standard-size Memory Stick PRO media and it offers the same technologies including high speed data transfer, built-in MagicGate, and high capacities. The Memory Stick PRO Duo is the ideal solution for the most portable devices such as pocket-size digital cameras and with the use of Adaptor, it can be used in all PRO-compatible devices." And I got a list of stopwords which I have put in to a variable $stopwords: $stopwords="in|the|of|to|which|where|how|is|it|if|why|who"; So I basically want to recreate $description with those stopwords taken out of it. How do I do this? I tried using preg_replace but it would only do partial-word matches... Link to comment https://forums.phpfreaks.com/topic/166765-solved-preg_replace-to-remove-whole-words-in-string/ Share on other sites More sharing options...
Q Posted July 21, 2009 Share Posted July 21, 2009 You know google gives you minus points if you have more than 5 meta keywords right? - I'd make a more advanced script that would search the string for the most relevant words, and pick 5 of those Link to comment https://forums.phpfreaks.com/topic/166765-solved-preg_replace-to-remove-whole-words-in-string/#findComment-879359 Share on other sites More sharing options...
Omzy Posted July 21, 2009 Author Share Posted July 21, 2009 LOL I know but I'm not really bothered about that, I'm just working to spec lol! Link to comment https://forums.phpfreaks.com/topic/166765-solved-preg_replace-to-remove-whole-words-in-string/#findComment-879360 Share on other sites More sharing options...
Omzy Posted July 21, 2009 Author Share Posted July 21, 2009 Anyone? Link to comment https://forums.phpfreaks.com/topic/166765-solved-preg_replace-to-remove-whole-words-in-string/#findComment-879388 Share on other sites More sharing options...
Mark Baker Posted July 21, 2009 Share Posted July 21, 2009 $description="SanDisk’s Memory Stick PRO Duo is half the size of a standard-size Memory Stick PRO media and it offers the same technologies including high speed data transfer, built-in MagicGate, and high capacities. The Memory Stick PRO Duo is the ideal solution for the most portable devices such as pocket-size digital cameras and with the use of Adaptor, it can be used in all PRO-compatible devices."; $stopwords="in|the|of|to|which|where|how|is|it|if|why|who"; $stopwordsArray=explode('|',$stopwords); $descriptionArray = str_word_count($description,2); foreach($descriptionArray as $descriptionWordKey => $descriptionWord) { if (in_array($descriptionWord,$stopwordsArray)) { unset($descriptionArray[$descriptionWordKey]); } } $descriptionArray = array_unique($descriptionArray); print_r($descriptionArray); You might also want to drop words of length 1 Link to comment https://forums.phpfreaks.com/topic/166765-solved-preg_replace-to-remove-whole-words-in-string/#findComment-879423 Share on other sites More sharing options...
thebadbad Posted July 21, 2009 Share Posted July 21, 2009 You can also use regular expressions: <?php $description = "SanDisk’s Memory Stick PRO Duo is half the size of a standard-size Memory Stick PRO media and it offers the same technologies including high speed data transfer, built-in MagicGate, and high capacities. The Memory Stick PRO Duo is the ideal solution for the most portable devices such as pocket-size digital cameras and with the use of Adaptor, it can be used in all PRO-compatible devices." $stopwords = "in|the|of|to|which|where|how|is|it|if|why|who"; $stopwords = explode('|', $stopwords); $patterns = array(); foreach ($stopwords as $stopword) { $patterns[] = '~\b' . preg_quote($stopword, '~') . '\b~i'; } $description = preg_replace($patterns, '', $description); //replace whitespace(s) with a single space $description = preg_replace('~\s+~', ' ', $description); ?> But Mark Baker's method is probably faster. Link to comment https://forums.phpfreaks.com/topic/166765-solved-preg_replace-to-remove-whole-words-in-string/#findComment-879434 Share on other sites More sharing options...
Omzy Posted July 21, 2009 Author Share Posted July 21, 2009 Cheers Mark, that works like a charm. Now any chance this can be extended so that it limits the output to : a) the first 10 words b) or preferebly the 10 most popular words in the page Link to comment https://forums.phpfreaks.com/topic/166765-solved-preg_replace-to-remove-whole-words-in-string/#findComment-879448 Share on other sites More sharing options...
thebadbad Posted July 21, 2009 Share Posted July 21, 2009 Regarding b), have a look at this thread: http://www.phpfreaks.com/forums/index.php/topic,260127.0/all.html Link to comment https://forums.phpfreaks.com/topic/166765-solved-preg_replace-to-remove-whole-words-in-string/#findComment-879458 Share on other sites More sharing options...
Omzy Posted July 21, 2009 Author Share Posted July 21, 2009 Right I managed to figure that out, I've now got it to display the 10 most popular words on the page, I did this using array_slice, array_count_values and arsort. Mark also mentioned above "You might also want to drop words of length 1" - how can I do this? Link to comment https://forums.phpfreaks.com/topic/166765-solved-preg_replace-to-remove-whole-words-in-string/#findComment-879491 Share on other sites More sharing options...
Mark Baker Posted July 21, 2009 Share Posted July 21, 2009 Right I managed to figure that out, I've now got it to display the 10 most popular words on the page, I did this using array_slice, array_count_values and arsort. Mark also mentioned above "You might also want to drop words of length 1" - how can I do this? $description="SanDisk’s Memory Stick PRO Duo is half the size of a standard-size Memory Stick PRO media and it offers the same technologies including high speed data transfer, built-in MagicGate, and high capacities. The Memory Stick PRO Duo is the ideal solution for the most portable devices such as pocket-size digital cameras and with the use of Adaptor, it can be used in all PRO-compatible devices."; $stopwords="in|the|of|to|which|where|how|is|it|if|why|who"; $stopwordsArray=explode('|',$stopwords); $descriptionArray = $wordfrequency = array_count_values( str_word_count( $description, 1) ); foreach($descriptionArray as $descriptionWordKey => $descriptionWord) { if ((in_array($descriptionWordKey,$stopwordsArray)) || (strlen($descriptionWordKey) == 1)) { unset($descriptionArray[$descriptionWordKey]); } } arsort($descriptionArray); print_r($descriptionArray); Note that the word is now the array key, and the value is the number of occurrences in the description Link to comment https://forums.phpfreaks.com/topic/166765-solved-preg_replace-to-remove-whole-words-in-string/#findComment-879510 Share on other sites More sharing options...
Omzy Posted July 21, 2009 Author Share Posted July 21, 2009 Think there might be an error there, it didn't seem to work for me and I noticed that $wordfrequency is only referenced once in the code... Link to comment https://forums.phpfreaks.com/topic/166765-solved-preg_replace-to-remove-whole-words-in-string/#findComment-879548 Share on other sites More sharing options...
Mark Baker Posted July 21, 2009 Share Posted July 21, 2009 Think there might be an error there, it didn't seem to work for me and I noticed that $wordfrequency is only referenced once in the code... $wordfrequency is redundant, a variable that's populated but never used, and so it's irrelevant. It can be removed without affecting the code in any way. If it isn't working for you, what errors (if any) are you getting? Or what are you expecting to see an not seeing? The output I'm getting is: Array ( [and] => 3 [PRO] => 3 [stick] => 3 [Memory] => 3 [devices] => 2 [high] => 2 [Duo] => 2 [pocket-size] => 1 [digital] => 1 [cameras] => 1 [portable] => 1 [most] => 1 [such] => 1 [as] => 1 [Adaptor] => 1 [used] => 1 [all] => 1 [PRO-compatible] => 1 [be] => 1 [can] => 1 [use] => 1 [for] => 1 [with] => 1 [capacities] => 1 [offers] => 1 [same] => 1 [technologies] => 1 [media] => 1 [standard-size] => 1 [half] => 1 [size] => 1 [including] => 1 [speed] => 1 [sanDisk] => 1 [The] => 1 [ideal] => 1 [MagicGate] => 1 [built-in] => 1 [data] => 1 [transfer] => 1 [solution] => 1 ) which seems to tally up when I do the counts manually The only quibble I've noted is that it's case-sensitive, so "The" is counted even though "the" is in the $stopwords list. This can be fixed by changing in_array($descriptionWordKey,$stopwordsArray) to in_array(strtolower($descriptionWordKey),$stopwordsArray) Link to comment https://forums.phpfreaks.com/topic/166765-solved-preg_replace-to-remove-whole-words-in-string/#findComment-879609 Share on other sites More sharing options...
Omzy Posted July 21, 2009 Author Share Posted July 21, 2009 Yes mate, it's working, I have to use array_keys top get the values to display :-) Thanks for all your kind help :-) Link to comment https://forums.phpfreaks.com/topic/166765-solved-preg_replace-to-remove-whole-words-in-string/#findComment-879614 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.