Jump to content

Headache problem with searching and common words.


elgoog

Recommended Posts

I am trying to do the following and have no idea where to start with this one, if anyone has tackled something like this or could set me on the right path, it would be massively appreciated.

 

Info

Say i have the following table called 'items', with three columns

 

itemID(int), userID(int), Description(text)

 

any row could have thousands of words in the description and there could be hundreds of entrys.

Problem

If i wanted to extract a list of the top 50 most common words of a paticular users descriptions, where should i begin.

 

I will also be wanting to exclude words under a certain number and ignore words such as a, the, and etc.

 

Im not even sure what to search for on google to help with this one either, and what sort of methodology i should be approaching this problem with.

 

Thanks in advance.

 

off the top of my head: i would create another table, descWords, with columns

id - INT unsigned auto-increment primary index

descWord - varchar(64), indexed

occurrences - INT unsigned

 

then occasionally run a script that explodes each user's description into an array of words. the script would first make the array of words unique, and ensure there aren't any 'non-words', like commas, spaces, question marks, etc. after the array is unique, i'd begin updating the descWords table for each word.

 

i wouldn't make this happen every time someone does a search, because it might take too long for our script to run over all the records and update descWords for each use.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.