Jump to content

Trying to devise a watch list


Yesideez

Recommended Posts

I've got a site where people can place posts and I'm wanting to implement a system where users can have up to 5 words in a watch list.

 

When someone makes a post and uses one of these watch words I'd like to fire off an email to whoever has placed a match word in their watch list informing them of the post.

 

Only problem is, I've no idea which is the best way to do this. The database can potentially grow very big and I'm not happy using a LIKE or a stripos() as I'd like to keep the routine optimised as much as possible to help save resources. If there isn't any simple way of doing this then I'll knock it on the head.

 

The watch list will be used on a VARCHAR(40) which is kind of like a subject string.

Link to comment
Share on other sites

From what I can see there is no easy way to do this. I mean, the only way to compare this text to a list of words is to get every word and check against the text. This isn't really a problem if you only have a few words to watch, but even 10 users implementing this is already potentially 50 different words.

 

If I HAD to do this I would use queue for texts and a few cron jobs to manage everything. This is the kind of process that would be better sent off to a completely different server. I mean, really, how else can you compare text?

 

Another idea to reduce load is to only allow certain words to watch. Base those words (say a selection of 20) around the subject matter and then you're only dealing with 20 possible matches instead hundreds or thousands. Also, if one word is found, you would stop attempting to match any others.

 

Love to hear some other people's approach to this, if they HAD to do it.

Link to comment
Share on other sites

I thought this was going to be a pig.

 

The option of only limiting certain words isn't really possible due to the nature of the site. I did think of trying soundex but then, splitting a subject string into separate words then calculating soundex for each and comparing was again a bit off as soundex values aren't unique and, well, soundex values are strings so we're back to square one.

 

Tricky. I think this idea will be binned.

Link to comment
Share on other sites

I thought this was going to be a pig.

 

The option of only limiting certain words isn't really possible due to the nature of the site. I did think of trying soundex but then, splitting a subject string into separate words then calculating soundex for each and comparing was again a bit off as soundex values aren't unique and, well, soundex values are strings so we're back to square one.

 

Tricky. I think this idea will be binned.

 

Exactly. I entertained the idea of breaking the texts down into words but again, like you say, you're still comparing strings. It is a novel feature though and now we know why ;) Personally, the only viable solution I can see is to have a server+ to manage this process.

Link to comment
Share on other sites

Your database table should be like:

 

CREATE TABLE words (
  user_id INT NOT NULL,
  word_id VARCHAR(24) NOT NULL, -- http://en.wikipedia.org/wiki/Longest_word_in_English
  PRIMARY KEY (user_id, word_id)
);

 

When someone enters some text, run it through some class that strips common words and sort all others in descending order by the number of occurrences. Take the top-5 (array_slice) and run it through your database:

 

SELECT first_name, last_name, email_address FROM words JOIN users USING (user_id) WHERE word_id IN (..)

 

Don't worry about stuff that isn't yet important.

Link to comment
Share on other sites

If I HAD to do this I would use queue for texts and a few cron jobs to manage everything. This is the kind of process that would be better sent off to a completely different server. I mean, really, how else can you compare text?

 

would have done something like this aswell, but would have done it in C and made it multithreaded like this:

 

Load Keywords to search for

Split Keyword list into arrays  Maximum 200keywords/array

Load all queued texts

Start one thread / array

Store hits in a new array , keyword,link,email

Loop through all hits and send the emails

 

not too resource friendly, might need an extra server for this.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.