Jump to content

Full text search optimization


Go to solution Solved by vinny42,

Recommended Posts

Hi all I have a full text product search query that is built up via php and certain elements are changed based on the search query.

 

The query runs well and the results are very accurate and what we want, however it takes roughly 0.5s - 1.5s to run, as this is used in an ajax search there is a noticeable delay. So im looking for some advice on how I can improve the query.

 

I think I have isolated the problem, some of our products are named slightly different but to the end user are the same thing, so for example an mtb helmet is the same as a mountain bike helmet.

 

As such I wrote in some php to add in the additional keywords when the opposite is being search for.

SELECT Product_Name, URL, ImageURL, StockPrice, Brand , 

MATCH (Product_NAME) AGAINST ('2014') + Rating * 0.1 + views * 0.1 + addedtobasket + 

MATCH (Product_Name,Cat1,Cat2,Cat3,Colour,Size) AGAINST ('+mtb*' IN BOOLEAN MODE) + #This line here is added when the term mountain is search for

MATCH (Product_Name,Cat1,Cat2,Cat3,Colour,Size) AGAINST ('+mountain* +bike* +helmet*' IN BOOLEAN MODE) as Score 

FROM products 

WHERE  

MATCH (Product_Name,Cat1,Cat2,Cat3,Colour,Size) AGAINST ('+mtb*' IN BOOLEAN MODE) +  #This line here is added when the term mountain is search for

MATCH (Product_Name,Cat1,Cat2,Cat3,Colour,Size) AGAINST ('+mountain* +bike* +helmet*' IN BOOLEAN MODE) 

AND Cat1 != 'Admin' AND Status = 'True' AND Cat1 NOT LIKE '%prodimg%' AND Stock_Level > 0 AND StockPrice > 0 AND date_updated > '2013-09-26 10:48:03'

AND ProductID != '' 

GROUP BY ProductID 

ORDER BY Score DESC, Rating DESC 
When the addition line is added the query takes roughly 0.875s to run, without it comes back as 0.05s there is a large improvement on speed but an impact on the results.
 
I have the following index as well :
index.jpg
 
Any suggestions?
 
Thanks.
Link to comment
https://forums.phpfreaks.com/topic/282679-full-text-search-optimization/
Share on other sites

One thing that is never good is this:

 

AND Cat1 NOT LIKE '%prodimg%'

 

Because a LIKE that uses an expression starting with  a % will always trigger a sequential scan (all records must be visited and parsed, no indexes can be used to solve this).

 

 

 


I think I have isolated the problem, some of our products are named slightly different but to the end user are the same thing, so for example an mtb helmet is the same as a mountain bike helmet.

 

Then perhaps it's a good idea to prepare for thism by creating a table of synonyms. When a product "mtb helmet" is added, you look it up in the synonyms table and see that it's actually a "mountain bike helmet", so you link the product to "mountain bike helmet" instead of "mbt helmet". Then when the user searches for "mbt helmet" you do the same thing and search instead for "mountain bike helmet".

 

I'm a big fan of tagging products with known keywords, rather than doing fulltext searches, because someone looking for "mbt helmet" will not find anything because "mbt" should be spelled "mtb". If you have a known list of keywords you can tell the user that the term does not exist, rather than pretend you have nothing to show, ald suggest alternatives. Like a spellchecker.

Yes I know the LIKE isn't a great idea, some idiot made a mess of the data and put image urls as the product name for some items (this gets updated regularly so If I delete them from the DB they will come back). That aside when removing that from the query it doesn't seem to aid in the performance at all.

 

The problem is I need to return results for both mtb and mountain bike, as depending the type of product there will be occurrences of both. Unfortunately I don't have control over the product titles so I couldn't standardise them either. There is a spellchecker in place as well, so if someone does spell mtb or mountain wrong it is normally corrected.

 

Thanks.

I mean replacing

MATCH(..)  + MATCH (..)

with

MATCH (..) AND MATCH(...)

 

That way, MySQL may be able to deduce that records that have not been "found" by the first match(), don't have to be examined by the second match().

If you do +, MySQL is forced to run both MATCH() statements and add them together to see if the result is nonzero.

 

Ofcourse you could also just rewrite the second MATCH() so it also searches for the text from the first MATCH()...

When I do MATCH() AND MATCH() it does run faster however I dont get the results from the products with mtb.

 

I have also tried doing this :

MATCH (Product_Name,Cat1,Cat2,Cat3,Colour,Size) AGAINST ('mtb +mountain* +bike* +helmet*' IN BOOLEAN MODE)

And removing the other match from the query, but I get the same results as MATCH() AND MATCH().

 

Thanks.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.