Jump to content

Full text index relevancy issue


kickstart

Recommended Posts

Hi

 

I have a full text search I am trying to use to search against a string of important search terms. However the ranking of the results is a bit strange.

 

For example, search for "d-link router" against this column it is bringing back a fair few rows, but ranks a row containing tp-link but not d-link higher than one that contains d-link.

 

If, this row is ranked 9.4198112487793

 

Routers-and-Switches TP-Link TL-MR3220 TP-TL-MR3220 ROUTER tlw&tlw tlwAVtlw BUNDLE tlw3Gtlw N-LITE ADSL ROUTER tlw&tlw tlw1YRtlw BULLGUARD tlwAVtlw TP-LINK TP-Link TL-MR3220 3G/3.75G 150Mbps Wireless Lite tlwNtlw Router 6935364051501

 

while this row is ranked 8.55044555664062

 

Routers-and-Switches D-Link DSL-2680/UK DL-DSL-2680 D-LINK ADSL ROUTER WIRELESS tlwNtlw tlw150tlw ADSL2+ ROUTER DLINK D-Link DSL-2680 Wireless tlwNtlw tlw150tlw ADSL2+ Modem Router 790069334535

 

The match statement is as follows:-

 

SELECT item_keyword_search, MATCH (item_keyword_search) AGAINST ('d-link* router*' )
FROM item_import
AND MATCH (item_keyword_search) AGAINST ('d-link* router*' )

 

Eliminating the * wildcards doesn't change this, nor does splitting the words with a comma.

 

Any suggestions?

 

All the best

 

Keith

Link to comment
Share on other sites

Hi

 

I do want to be able to sort them, but it is also useful to see how it is rating matches.

 

Problem appears to be that match assumes a hyphen separates words. Also it ignores words less than 4 characters long so D-LINK and TP-LINK are taken as being the same.

 

All the best

 

Keith

Link to comment
Share on other sites

if you have access to the config file;

 

ft_min_word_len = 3

 

If a word is specified with the truncation operator' date=' it is not stripped from a boolean query, even if it is too short (as determined from the ft_min_word_len setting) or a stopword. This occurs because the word is not seen as too short or a stopword, but as a prefix that must be present in the document in the form of a word that begins with the prefix. Suppose that ft_min_word_len=4. ft_min_word_len=4. Then a search for '+word +the*' will likely return fewer rows than a search for '+word +the'[/quote']

 

Possibility:

 

Modify a character set file: This requires no recompilation. The true_word_char()  macro uses a ?character type? table to distinguish letters and numbers from other characters. . You can edit the <ctype><map> contents in one of the character set XML files to specify that '-' is a ?letter.? Then use the given character set for your FULLTEXT indexes.
Link to comment
Share on other sites

Hi

 

I do want to be able to sort them, but it is also useful to see how it is rating matches.

 

Problem appears to be that match assumes a hyphen separates words. Also it ignores words less than 4 characters long so D-LINK and TP-LINK are taken as being the same.

 

All the best

 

Keith

 

Get dat Sphinx?!

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.