Full text search optimization

gvp16 · October 3, 2013

Hi all I have a full text product search query that is built up via php and certain elements are changed based on the search query.

The query runs well and the results are very accurate and what we want, however it takes roughly 0.5s - 1.5s to run, as this is used in an ajax search there is a noticeable delay. So im looking for some advice on how I can improve the query.

I think I have isolated the problem, some of our products are named slightly different but to the end user are the same thing, so for example an mtb helmet is the same as a mountain bike helmet.

As such I wrote in some php to add in the additional keywords when the opposite is being search for.

SELECT Product_Name, URL, ImageURL, StockPrice, Brand , 

MATCH (Product_NAME) AGAINST ('2014') + Rating * 0.1 + views * 0.1 + addedtobasket + 

MATCH (Product_Name,Cat1,Cat2,Cat3,Colour,Size) AGAINST ('+mtb*' IN BOOLEAN MODE) + #This line here is added when the term mountain is search for

MATCH (Product_Name,Cat1,Cat2,Cat3,Colour,Size) AGAINST ('+mountain* +bike* +helmet*' IN BOOLEAN MODE) as Score 

FROM products 

WHERE  

MATCH (Product_Name,Cat1,Cat2,Cat3,Colour,Size) AGAINST ('+mtb*' IN BOOLEAN MODE) +  #This line here is added when the term mountain is search for

MATCH (Product_Name,Cat1,Cat2,Cat3,Colour,Size) AGAINST ('+mountain* +bike* +helmet*' IN BOOLEAN MODE) 

AND Cat1 != 'Admin' AND Status = 'True' AND Cat1 NOT LIKE '%prodimg%' AND Stock_Level > 0 AND StockPrice > 0 AND date_updated > '2013-09-26 10:48:03'

AND ProductID != '' 

GROUP BY ProductID 

ORDER BY Score DESC, Rating DESC

When the addition line is added the query takes roughly 0.875s to run, without it comes back as 0.05s there is a large improvement on speed but an impact on the results.

I have the following index as well :

Any suggestions?

Thanks.

vinny42 · October 3, 2013

One thing that is never good is this:

AND Cat1 NOT LIKE '%prodimg%'

Because a LIKE that uses an expression starting with a % will always trigger a sequential scan (all records must be visited and parsed, no indexes can be used to solve this).

I think I have isolated the problem, some of our products are named slightly different but to the end user are the same thing, so for example an mtb helmet is the same as a mountain bike helmet.

Then perhaps it's a good idea to prepare for thism by creating a table of synonyms. When a product "mtb helmet" is added, you look it up in the synonyms table and see that it's actually a "mountain bike helmet", so you link the product to "mountain bike helmet" instead of "mbt helmet". Then when the user searches for "mbt helmet" you do the same thing and search instead for "mountain bike helmet".

I'm a big fan of tagging products with known keywords, rather than doing fulltext searches, because someone looking for "mbt helmet" will not find anything because "mbt" should be spelled "mtb". If you have a known list of keywords you can tell the user that the term does not exist, rather than pretend you have nothing to show, ald suggest alternatives. Like a spellchecker.

gvp16 · October 3, 2013

Yes I know the LIKE isn't a great idea, some idiot made a mess of the data and put image urls as the product name for some items (this gets updated regularly so If I delete them from the DB they will come back). That aside when removing that from the query it doesn't seem to aid in the performance at all.

The problem is I need to return results for both mtb and mountain bike, as depending the type of product there will be occurrences of both. Unfortunately I don't have control over the product titles so I couldn't standardise them either. There is a spellchecker in place as well, so if someone does spell mtb or mountain wrong it is normally corrected.

Thanks.

vinny42 · October 3, 2013

Have you tried not adding the MATCH() values up in the WHERE clause, and just using AND?

gvp16 · October 4, 2013

In what way : like

WHERE Product_Name LIKE '%mtb%' AND Product_Name LIKE 'Mountain bike'

for example.

vinny42 · October 4, 2013

I mean replacing

MATCH(..) + MATCH (..)

with

MATCH (..) AND MATCH(...)

That way, MySQL may be able to deduce that records that have not been "found" by the first match(), don't have to be examined by the second match().

If you do +, MySQL is forced to run both MATCH() statements and add them together to see if the result is nonzero.

Ofcourse you could also just rewrite the second MATCH() so it also searches for the text from the first MATCH()...

gvp16 · October 4, 2013

When I do MATCH() AND MATCH() it does run faster however I dont get the results from the products with mtb.

I have also tried doing this :

MATCH (Product_Name,Cat1,Cat2,Cat3,Colour,Size) AGAINST ('mtb +mountain* +bike* +helmet*' IN BOOLEAN MODE)

And removing the other match from the query, but I get the same results as MATCH() AND MATCH().

Thanks.

vinny42 · October 4, 2013

then perhaps you should lookup how you can do an OR in fulltext, or do as I suggested and build a schadow index of synonims that you can query.

gvp16 · October 4, 2013

Yes, what I have just tried and seems to work well is adding in a "keyword" column to the table, so for example all mountain bike helmets now have mtb helmets as a keyword and so on, the performance is 100% better as well.

Thanks for all the advice!

Sign In

Full text search optimization

Recommended Posts

gvp16

Link to comment

Share on other sites

vinny42

Link to comment

Share on other sites

gvp16

Link to comment

Share on other sites

vinny42

Link to comment

Share on other sites

gvp16

Link to comment

Share on other sites

vinny42

Link to comment

Share on other sites

gvp16

Link to comment

Share on other sites

vinny42

Link to comment

Share on other sites

gvp16

Link to comment

Share on other sites

Join the conversation

Browse

Activity

Important Information