GamerGun Posted April 15, 2010 Share Posted April 15, 2010 Dear, I'm having the following code: $result = mysql_query("SELECT bericht FROM berichten WHERE id = '$postid'") or die(mysql_error()); while($row = mysql_fetch_array( $result )) { $keywords = $row['bericht']; function longenough($s) { if ( strlen($s) < 5 ) { return false; } return true; } $arr = explode(" ", $keywords); $keywords = implode(" ", array_filter($arr, "longenough")); function first_words($keywords, $num, $tail='') { $words = str_word_count($keywords, 2); $firstwords = array_slice( $words, 0, $num); return implode(' ', $firstwords).$tail; } $keywords = first_words($keywords, 20); $bad_symbols = array(",", ".", "'", ";", ":", "?", "!", "_"); $keywords = str_replace($bad_symbols, "", $keywords); $word_array = preg_split('/[\s?:;,.]+/', $keywords, -1, PREG_SPLIT_NO_EMPTY); $unique_word_array = array_unique($word_array); $keywords = implode(',',$unique_word_array); $keywords = strtolower($keywords); The idea is that this changes something like this (some Dutch story): Vanmorgen was ik op weg naar mijn werk. Om er te komen neem ik altijd de autoweg (100 km/u). Nu kwam ik een 45-km-wagentje tegen, welke helaas op zulke wegen mogen rijden. Het is nogal schrikken en gevaarlijk als je ineens zeer snel zo'n wagentje nadert. Gelukkig kon ik diegene nog ontwijken, maar het zal me niks verbazen als iemand die niet op zit te letten er achterop rijdt. Into this (keywords for Google and such): vanmorgen,werk,komen,altijd,autoweg,km,u,-km-wagentje,tegen,welke,helaas,zulke,wegen,mogen,rijden,nogal,schrikken,gevaarlijk,ineens,wagentje Most of the code works fine. As you can see it splits the string and only leaves the words which are 5 or more chars long. Then it takes the first 20 words, without any duplicates. So far okay, but why does it do this; (100 km/u) becomes km,u This should be 100 kmu or 100kmu 45-km-wagentje becomes -km-wagentje This should be 45-km-wagentje And another word which is not in this part of text, but also is not correct; 's ochtends becomes ochtends This should be sochtends Hope anyone can help me with this... Thanks in advance! Link to comment https://forums.phpfreaks.com/topic/198627-problem-with-stripping-words-from-string/ Share on other sites More sharing options...
Zyx Posted April 15, 2010 Share Posted April 15, 2010 preg_split() has an option to keep the recognized delimiter in the output. Capture the delimiters, too, and make an extra loop which will examine the output according to some rules, i.e. "if we have a number, then pause, then a string, this should be all one word". The loop will eventually produce a new output and this will be your result. Link to comment https://forums.phpfreaks.com/topic/198627-problem-with-stripping-words-from-string/#findComment-1042354 Share on other sites More sharing options...
GamerGun Posted April 15, 2010 Author Share Posted April 15, 2010 You mean PREG_SPLIT_DELIM_CAPTURE right? So i changed this line; $word_array = preg_split('/[\s?:;,.]+/', $keywords, -1, PREG_SPLIT_NO_EMPTY); Into this; $word_array = preg_split('/[\s?:;,.]+/', $keywords, -1, PREG_SPLIT_DELIM_CAPTURE); But the output is still the same? Thanks Link to comment https://forums.phpfreaks.com/topic/198627-problem-with-stripping-words-from-string/#findComment-1042357 Share on other sites More sharing options...
Zyx Posted April 15, 2010 Share Posted April 15, 2010 The output of the whole script or just that function? Link to comment https://forums.phpfreaks.com/topic/198627-problem-with-stripping-words-from-string/#findComment-1042501 Share on other sites More sharing options...
GamerGun Posted April 15, 2010 Author Share Posted April 15, 2010 The whole script. Seems like the preg_split is not working or something. Link to comment https://forums.phpfreaks.com/topic/198627-problem-with-stripping-words-from-string/#findComment-1042502 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.