Jump to content

[SOLVED] Weighted Full-Text Searching (SLOW)


Recommended Posts

I'm currently working on a website that is made up of numerous areas (blog, events, news, case studies, articles etc.), of which the site search feature is to cover all of these areas and order the results by weighted relevance. I.e. If the search term is found in the title,  it is more relevant than if it was found in the content.

 

I'm using the full-text search capabilities of MySQL with MATCH() AGAINST() and have created and rebuilt FULLTEXT indexes on fields that I am searching. Below is an example of one of my auto-generated queries that covers one of the many areas...

 

SELECT t.*, (
        (1.4 * (MATCH(c.`company`) AGAINST('"phpfreaks"' IN BOOLEAN MODE))) + 
        (0.4 * (MATCH(c.`company`) AGAINST('phpfreaks' IN BOOLEAN MODE))) + 
        (1.3 * (MATCH(t.`title`) AGAINST('"phpfreaks"' IN BOOLEAN MODE))) + 
        (0.3 * (MATCH(t.`title`) AGAINST('phpfreaks' IN BOOLEAN MODE))) + 
        (1.1 * (MATCH(ser.`name`,sec.`name`) AGAINST('"phpfreaks"' IN BOOLEAN MODE))) + 
        (0.1 * (MATCH(ser.`name`,sec.`name`) AGAINST('phpfreaks' IN BOOLEAN MODE))) + 
        (1.1 * (MATCH(t.`content`) AGAINST('"phpfreaks"' IN BOOLEAN MODE))) + 
        (0.1 * (MATCH(t.`content`) AGAINST('phpfreaks' IN BOOLEAN MODE)))
    ) as `relevance` 
FROM 
    `article` as t, 
    `user` as c, 
    `article_category` as tc, 
    `service` as ser, 
    `sector` as sec 
WHERE 
    MATCH(c.`company`,t.`title`,ser.`name`,sec.`name`,t.`content`) AGAINST('phpfreaks' IN BOOLEAN MODE) 
    AND t.`user`=c.`id` 
    AND tc.`thought_leadership`=t.`id` 
    AND tc.`sector`=sec.`id` 
    AND tc.`service`=ser.`id` 
GROUP BY t.`id` 
HAVING `relevance` > 0 
ORDER BY `relevance` DESC, t.`title` ASC

 

But this query is searching a few thousand (and counting) articles - It takes a good 30-seconds or so to complete the search. I'm assuming that it is the 8 levels of weighting that is making the query slow. Am I approaching the weighting in an inefficient way? Is there anything else I am doing inefficiently? Is there an alternative solution?

This is always going to be a problem searching through text fields where there are many thousand rows.

I had the same problem with a popular article site.

 

The solution was to take the search capabilities away from MySql and use a full text search engine that can index your database records. I used Lucene which is integrated into the Zend Framework (was at the time anyway) but there are others.

Thanks for the input Neil, I didn't even consider using a tool like that - To be fair, I didn't even know of their existence... I have just been reading up on Apache Solr (based on Lucene) and Sphinx.

 

Sphinx looks easier to implement as it appears to have a better PHP API from what I can tell. I'll install, integrate and see how it works out.

Well, I went with Sphinx - Straight forward to integrate and excellent results. Running a query which would probably take MySQL ~10 minutes to execute (a calculated guess - I never finished coding it), takes less than a second with Sphinx.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.