Jump to content

Archived

This topic is now archived and is closed to further replies.

nmeyer47

best way to search large tables? -need feedback on an idea

Recommended Posts

I have a couple large tables (77,055 records-79 megs & 111,240 records -69 megs & 20,542 records) that need to be searched by site visitors.

 

I was doing a (like \'%term%\' ) on the one table i\'m searching now but when I have to add the other tables in the mix I need something faster.

 

I was going to index all the words in the fields I want to search in one table(tableA (wordid,word)) then have tableB (id,wordid,recordid, appearences,tableid). i\'ll search tableA for matches to search terms, get the wordid, then search tableB for records matching wordid. Do some sort of rating, like number of search words that appear in a particular record or number of times a word appear as indicated by the appearances. not sure, gotta figure that part out exactly.

 

one of my concerns is that the index tables get really big, lots of records anyway. but i have to imagine that searching a field that only contains one word, as in: (word = \"$term\"); is much more efficient than searching a large text field, as in: (description like \"%$term%\").

 

anyway, i just wanted some feedback on this method. I haven\'t been able to find my sites that discuss search log as it would apply to my problem. figured this was my best source for info.

 

thanks, peace... :arrow:

Share this post


Link to post
Share on other sites

The LIKE operator is quite slow, especially if you\'re saying

 

LIKE \"%word%\"...

 

LIKE \"WORD%\" is MUCH faster!

 

Anyway, your idea of an index table with words and a table with reference to the index and the records is ok...

 

BUT table B would be big... (It would basically mean that you\'re just assigning a number of keywords to the record, and you could skip the large text field (then you would loose the word order, though). I\'m NOT sure of the performance.

 

It a little much to ask for massive speed when working on 100k+ records... Please post what you find (I think it an interesting problem).

 

P.

Share this post


Link to post
Share on other sites

first off thanks biopv for the big help. Otherwise I probably would have given up and become a farmer by now.

 

I am using the Full-Text indexing that MySQL has available that biopv pointed out. Only I have to do some sorting with arrays to get only the relevent results when i search by more than one word.

 

so what i\'m doing is searching by each word like \"+keyword\". if my search term is 3 words long i do 3 searchs \"+keyword1\" \"+keyword2\" \"+keyword3\". I load the ids that are returned by each search into there own array, containing id,points and tableid, then i compare the arrays and only return the records that match all the terms. I add the points they receive along the way, sorting them by points just before I print the results to the screen. this is the only way i can get only the most relevent results. it doesn\'t work well enough to just search by \"+keyword1 +keyword2 +keyword3\". mysql treats keyword1 as more important than the other words so you may not get the results you\'re looking for.

 

Now the ONLY problem I have is that mysql doesn\'t index words that are 3 chars or less. so I can\'t search for usb or hub or WWF just to name a few. what do I do about that?

 

if anybody has any questions post them here and I\'ll do my best to respond.

 

or if you have any suggestions i\'d love to hear them.

 

thanks again BIOPV, these forums are the best!@!!!

 

nate

Share this post


Link to post
Share on other sites

With mysql 4.0, you don\'t need to recompile to change ft_min_word_len. (full text minimum word length!)

 

Just change it in my.cnf, restart, and force the table to get reindexed. (\'alter table foo\' is enough, i believe.)

 

P.

Share this post


Link to post
Share on other sites

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.