Jaguar Posted January 29, 2007 Share Posted January 29, 2007 I'm building a search engine. I'm trying to remove stop words like "the, i, he, she, etc." from the search term. I have an array containing all of the stop words. Using str_replace() doesn't match whole words, it removes all occurences of the letters. I looked at the PHP manual and didn't see any function that matches whole words. How do I replace whole words only? Thanks.[code]$search = $_POST['search_term'];$stop_words = array('i', 'a', 'the', 'he', 'she', 'them', 'us', 'are', 'is', 'we', 'they');$new_search = str_replace($stop_words, '', $search);[/code] Quote Link to comment Share on other sites More sharing options...
cmgmyr Posted January 29, 2007 Share Posted January 29, 2007 try putting spaces before and after the word that you want to search for, then just replace it with a space. Maybe something like this: [code]$search = str_replace(' and ', ' ', $search);[/code]But just with 2 arrays...This would make sure the item that you were searching for is an actual word and not a letter. Quote Link to comment Share on other sites More sharing options...
Jaguar Posted January 29, 2007 Author Share Posted January 29, 2007 I thought of that but what if the word occurs at the end or beginning? It wouldn't get caught. I could make an array with spaces on the left and right too but then that would chop off words that happen to end or begin with the same letters at my stop words. Quote Link to comment Share on other sites More sharing options...
Jessica Posted January 29, 2007 Share Posted January 29, 2007 So here's the ways the word 'and' could appear."and she said""bob, fred, and jean""toppings and""Andy Smith""My rock band is awesome!""My rock band"So, you basically want to block " and" and "and ", but not "band ". So, search for instances of "and " and check what's before it. If there's another letter before it, it's not really and, it's something else. If there's nothing there, it's the first word, and remove it. Same concept for "and ". Quote Link to comment Share on other sites More sharing options...
linuxdream Posted January 29, 2007 Share Posted January 29, 2007 I think you would be better off using preg_replace() and doing something like this:[code]<?php$stopwords = array('and', 'or', 'i');$search = $_POST['search'];foreach($stopwords as $word){ preg_replace("/\b$word\b/i", '', $search);}echo $search; //Clean search string?>[/code]I that will work for words within the string but I'm not sure about starting or ending of the string...maybe something like: preg_replace("/(^|\b)$word($|\b)/i", '', $search); instead...but I don't know if that will work. You may need something like a lookbehind or lookahead. Someone here that's better with preg expressions will be able to help you out. Or post this in the regex forum. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.