Jump to content

[SOLVED] Weighted Full-Text Searching (SLOW)


zq29

Recommended Posts

I'm currently working on a website that is made up of numerous areas (blog, events, news, case studies, articles etc.), of which the site search feature is to cover all of these areas and order the results by weighted relevance. I.e. If the search term is found in the title,  it is more relevant than if it was found in the content.

 

I'm using the full-text search capabilities of MySQL with MATCH() AGAINST() and have created and rebuilt FULLTEXT indexes on fields that I am searching. Below is an example of one of my auto-generated queries that covers one of the many areas...

 

SELECT t.*, (
        (1.4 * (MATCH(c.`company`) AGAINST('"phpfreaks"' IN BOOLEAN MODE))) + 
        (0.4 * (MATCH(c.`company`) AGAINST('phpfreaks' IN BOOLEAN MODE))) + 
        (1.3 * (MATCH(t.`title`) AGAINST('"phpfreaks"' IN BOOLEAN MODE))) + 
        (0.3 * (MATCH(t.`title`) AGAINST('phpfreaks' IN BOOLEAN MODE))) + 
        (1.1 * (MATCH(ser.`name`,sec.`name`) AGAINST('"phpfreaks"' IN BOOLEAN MODE))) + 
        (0.1 * (MATCH(ser.`name`,sec.`name`) AGAINST('phpfreaks' IN BOOLEAN MODE))) + 
        (1.1 * (MATCH(t.`content`) AGAINST('"phpfreaks"' IN BOOLEAN MODE))) + 
        (0.1 * (MATCH(t.`content`) AGAINST('phpfreaks' IN BOOLEAN MODE)))
    ) as `relevance` 
FROM 
    `article` as t, 
    `user` as c, 
    `article_category` as tc, 
    `service` as ser, 
    `sector` as sec 
WHERE 
    MATCH(c.`company`,t.`title`,ser.`name`,sec.`name`,t.`content`) AGAINST('phpfreaks' IN BOOLEAN MODE) 
    AND t.`user`=c.`id` 
    AND tc.`thought_leadership`=t.`id` 
    AND tc.`sector`=sec.`id` 
    AND tc.`service`=ser.`id` 
GROUP BY t.`id` 
HAVING `relevance` > 0 
ORDER BY `relevance` DESC, t.`title` ASC

 

But this query is searching a few thousand (and counting) articles - It takes a good 30-seconds or so to complete the search. I'm assuming that it is the 8 levels of weighting that is making the query slow. Am I approaching the weighting in an inefficient way? Is there anything else I am doing inefficiently? Is there an alternative solution?

This is always going to be a problem searching through text fields where there are many thousand rows.

I had the same problem with a popular article site.

 

The solution was to take the search capabilities away from MySql and use a full text search engine that can index your database records. I used Lucene which is integrated into the Zend Framework (was at the time anyway) but there are others.

Thanks for the input Neil, I didn't even consider using a tool like that - To be fair, I didn't even know of their existence... I have just been reading up on Apache Solr (based on Lucene) and Sphinx.

 

Sphinx looks easier to implement as it appears to have a better PHP API from what I can tell. I'll install, integrate and see how it works out.

Well, I went with Sphinx - Straight forward to integrate and excellent results. Running a query which would probably take MySQL ~10 minutes to execute (a calculated guess - I never finished coding it), takes less than a second with Sphinx.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.