Jump to content

Replacing whole words


Jaguar

Recommended Posts

I'm building a search engine. I'm trying to remove stop words like "the, i, he, she, etc." from the search term. I have an array  containing all of the stop words. Using str_replace() doesn't match whole words, it removes all occurences of the letters. I looked at the PHP manual and didn't see any function that matches whole words. How do I replace whole words only? Thanks.

[code]
$search = $_POST['search_term'];

$stop_words = array('i', 'a', 'the', 'he', 'she', 'them', 'us', 'are', 'is', 'we', 'they');

$new_search = str_replace($stop_words, '', $search);

[/code]
Link to comment
Share on other sites

try putting spaces before and after the word that you want to search for, then just replace it with a space. Maybe something like this: [code]$search = str_replace(' and ', ' ', $search);[/code]

But just with 2 arrays...

This would make sure the item that you were searching for is an actual word and not a letter.
Link to comment
Share on other sites

I thought of that but what if the word occurs at the end or beginning? It wouldn't get caught. I could make an array with spaces on the left and right too but then that would chop off words that happen to end or begin with the same letters at my stop words.
Link to comment
Share on other sites

So here's the ways the word 'and' could appear.
"and she said"
"bob, fred, and jean"
"toppings and"

"Andy Smith"
"My rock band is awesome!"
"My rock band"

So, you basically want to block " and" and "and ", but not "band ". So, search for instances of "and " and check what's before it. If there's another letter before it, it's not really and, it's something else. If there's nothing there, it's the first word, and remove it. Same concept for "and ".
Link to comment
Share on other sites

I think you would be better off using preg_replace() and doing something like this:

[code]
<?php
$stopwords = array('and', 'or', 'i');
$search = $_POST['search'];

foreach($stopwords as $word){
    preg_replace("/\b$word\b/i", '', $search);
}
echo $search; //Clean search string
?>
[/code]

I that will work for words within the string but I'm not sure about starting or ending of the string...maybe something like: preg_replace("/(^|\b)$word($|\b)/i", '', $search); instead...but I don't know if that will work. You may need something like a lookbehind or lookahead. Someone here that's better with preg expressions will be able to help you out. Or post this in the regex forum.
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.