zintani Posted September 19, 2011 Share Posted September 19, 2011 Hello, Actually I would like to ask if there is a way to remove the stop words (the, and, is, are,..,etc) from a text without removing other words such as (stand) where the last three letters (and) are removed when to use replace function. After that, I managed to bring text documents saved in a database Mysql up and I would like to know if there is a way to compare the similarities between those documents. Quote Link to comment https://forums.phpfreaks.com/topic/247438-comparing-two-text-documents/ Share on other sites More sharing options...
ManiacDan Posted September 19, 2011 Share Posted September 19, 2011 You can use regular expressions to strip out stopwords by making use of the \b character class: php > $a = "Sandy and the band played at Dave and Buster's"; php > echo str_replace('and', '', $a); Sy the b played at Dave Buster's php > echo preg_replace('/\band\b/i', '', $a); Sandy the band played at Dave Buster's php > -Dan Quote Link to comment https://forums.phpfreaks.com/topic/247438-comparing-two-text-documents/#findComment-1270693 Share on other sites More sharing options...
zintani Posted September 19, 2011 Author Share Posted September 19, 2011 Thanks ManiacDan, The code works just fine. So can I do it for more than one word. For example, to, at, from, the,.., etc.. echo preg_replace('/\band\b/i','/\bto\b/i','/\bfrom\b/i','/\bat\b/i' '', $a); This is my code but I got error message. Quote Link to comment https://forums.phpfreaks.com/topic/247438-comparing-two-text-documents/#findComment-1270785 Share on other sites More sharing options...
xyph Posted September 19, 2011 Share Posted September 19, 2011 Perhaps it's time to let go of our hands and check the manual Quote Link to comment https://forums.phpfreaks.com/topic/247438-comparing-two-text-documents/#findComment-1270789 Share on other sites More sharing options...
Psycho Posted September 19, 2011 Share Posted September 19, 2011 $old_string = "Sandy and the band played at Dave And Buster's the other day."; //Create an array of the words to remove $patterns = array('and', 'the', 'at'); //Convert patterns to have word boundaries and be case insensitive foreach($patterns as &$val) { $val = "#\b{$val}\b#i"; } //Create replacements array $replacements = array_fill(0, count($patterns), ''); //Make replacements $new_string = preg_replace($patterns, $replacements, $old_string); echo "$old_string<br>\n"; //Sandy and the band played at Dave And Buster's the other day. echo "$new_string<br>\n"; //Sandy band played Dave Buster's other day. Quote Link to comment https://forums.phpfreaks.com/topic/247438-comparing-two-text-documents/#findComment-1270806 Share on other sites More sharing options...
zintani Posted September 19, 2011 Author Share Posted September 19, 2011 Thanks mjdamato, I was doing the same idea for the array. $more = array ('/\ba\b/i','/\babout\b/i','/\babove\b/i','/\bacross\b/i','/\bafter\b/i','/\bagain\b/i', '/\bagainst\b/i','/\ball\b/i','/\balmost\b/i','/\balone\b/i','/\balong\b/i','/\balready\b/i', '/\balso\b/i','/\balthough\b/i','/\balways\b/i','/\bamong\b/i','/\ban\b/i','/\band\b/i','/\banother\b/i', '/\bany\b/i','/\banybody\b/i','/\banyone\b/i','/\banything\b/i','/\banywhere\b/i','/\bare\b/i','/\barea\b/i', '/\bareas\b/i') ; which was time consuming and I wanted to add /bcharacter/b automatically and here your code. Quote Link to comment https://forums.phpfreaks.com/topic/247438-comparing-two-text-documents/#findComment-1270807 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.