Jump to content

Full text index relevancy issue


kickstart

Recommended Posts

Hi

 

I have a full text search I am trying to use to search against a string of important search terms. However the ranking of the results is a bit strange.

 

For example, search for "d-link router" against this column it is bringing back a fair few rows, but ranks a row containing tp-link but not d-link higher than one that contains d-link.

 

If, this row is ranked 9.4198112487793

 

Routers-and-Switches TP-Link TL-MR3220 TP-TL-MR3220 ROUTER tlw&tlw tlwAVtlw BUNDLE tlw3Gtlw N-LITE ADSL ROUTER tlw&tlw tlw1YRtlw BULLGUARD tlwAVtlw TP-LINK TP-Link TL-MR3220 3G/3.75G 150Mbps Wireless Lite tlwNtlw Router 6935364051501

 

while this row is ranked 8.55044555664062

 

Routers-and-Switches D-Link DSL-2680/UK DL-DSL-2680 D-LINK ADSL ROUTER WIRELESS tlwNtlw tlw150tlw ADSL2+ ROUTER DLINK D-Link DSL-2680 Wireless tlwNtlw tlw150tlw ADSL2+ Modem Router 790069334535

 

The match statement is as follows:-

 

SELECT item_keyword_search, MATCH (item_keyword_search) AGAINST ('d-link* router*' )
FROM item_import
AND MATCH (item_keyword_search) AGAINST ('d-link* router*' )

 

Eliminating the * wildcards doesn't change this, nor does splitting the words with a comma.

 

Any suggestions?

 

All the best

 

Keith

Link to comment
https://forums.phpfreaks.com/topic/268111-full-text-index-relevancy-issue/
Share on other sites

putting a match in the field list is useless, unless you want to sort by it.

 

 

SELECT item_keyword_search, (MATCH (item_keyword_search) AGAINST ('d-link* router*' IN BOOLEAN MODE)) as score 
FROM item_import 
WHERE MATCH (item_keyword_search) AGAINST ('d-link* router*' IN BOOLEAN MODE) order by score desc

Hi

 

I do want to be able to sort them, but it is also useful to see how it is rating matches.

 

Problem appears to be that match assumes a hyphen separates words. Also it ignores words less than 4 characters long so D-LINK and TP-LINK are taken as being the same.

 

All the best

 

Keith

if you have access to the config file;

 

ft_min_word_len = 3

 

If a word is specified with the truncation operator' date=' it is not stripped from a boolean query, even if it is too short (as determined from the ft_min_word_len setting) or a stopword. This occurs because the word is not seen as too short or a stopword, but as a prefix that must be present in the document in the form of a word that begins with the prefix. Suppose that ft_min_word_len=4. ft_min_word_len=4. Then a search for '+word +the*' will likely return fewer rows than a search for '+word +the'[/quote']

 

Possibility:

 

Modify a character set file: This requires no recompilation. The true_word_char()  macro uses a ?character type? table to distinguish letters and numbers from other characters. . You can edit the <ctype><map> contents in one of the character set XML files to specify that '-' is a ?letter.? Then use the given character set for your FULLTEXT indexes.

Hi

 

I do want to be able to sort them, but it is also useful to see how it is rating matches.

 

Problem appears to be that match assumes a hyphen separates words. Also it ignores words less than 4 characters long so D-LINK and TP-LINK are taken as being the same.

 

All the best

 

Keith

 

Get dat Sphinx?!

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.