kickstart Posted September 7, 2012 Share Posted September 7, 2012 Hi I have a full text search I am trying to use to search against a string of important search terms. However the ranking of the results is a bit strange. For example, search for "d-link router" against this column it is bringing back a fair few rows, but ranks a row containing tp-link but not d-link higher than one that contains d-link. If, this row is ranked 9.4198112487793 Routers-and-Switches TP-Link TL-MR3220 TP-TL-MR3220 ROUTER tlw&tlw tlwAVtlw BUNDLE tlw3Gtlw N-LITE ADSL ROUTER tlw&tlw tlw1YRtlw BULLGUARD tlwAVtlw TP-LINK TP-Link TL-MR3220 3G/3.75G 150Mbps Wireless Lite tlwNtlw Router 6935364051501 while this row is ranked 8.55044555664062 Routers-and-Switches D-Link DSL-2680/UK DL-DSL-2680 D-LINK ADSL ROUTER WIRELESS tlwNtlw tlw150tlw ADSL2+ ROUTER DLINK D-Link DSL-2680 Wireless tlwNtlw tlw150tlw ADSL2+ Modem Router 790069334535 The match statement is as follows:- SELECT item_keyword_search, MATCH (item_keyword_search) AGAINST ('d-link* router*' ) FROM item_import AND MATCH (item_keyword_search) AGAINST ('d-link* router*' ) Eliminating the * wildcards doesn't change this, nor does splitting the words with a comma. Any suggestions? All the best Keith Link to comment https://forums.phpfreaks.com/topic/268111-full-text-index-relevancy-issue/ Share on other sites More sharing options...
The Little Guy Posted September 7, 2012 Share Posted September 7, 2012 putting a match in the field list is useless, unless you want to sort by it. SELECT item_keyword_search, (MATCH (item_keyword_search) AGAINST ('d-link* router*' IN BOOLEAN MODE)) as score FROM item_import WHERE MATCH (item_keyword_search) AGAINST ('d-link* router*' IN BOOLEAN MODE) order by score desc Link to comment https://forums.phpfreaks.com/topic/268111-full-text-index-relevancy-issue/#findComment-1376067 Share on other sites More sharing options...
kickstart Posted September 7, 2012 Author Share Posted September 7, 2012 Hi I do want to be able to sort them, but it is also useful to see how it is rating matches. Problem appears to be that match assumes a hyphen separates words. Also it ignores words less than 4 characters long so D-LINK and TP-LINK are taken as being the same. All the best Keith Link to comment https://forums.phpfreaks.com/topic/268111-full-text-index-relevancy-issue/#findComment-1376073 Share on other sites More sharing options...
The Little Guy Posted September 7, 2012 Share Posted September 7, 2012 if you have access to the config file; ft_min_word_len = 3 If a word is specified with the truncation operator' date=' it is not stripped from a boolean query, even if it is too short (as determined from the ft_min_word_len setting) or a stopword. This occurs because the word is not seen as too short or a stopword, but as a prefix that must be present in the document in the form of a word that begins with the prefix. Suppose that ft_min_word_len=4. ft_min_word_len=4. Then a search for '+word +the*' will likely return fewer rows than a search for '+word +the'[/quote'] Possibility: Modify a character set file: This requires no recompilation. The true_word_char() macro uses a ?character type? table to distinguish letters and numbers from other characters. . You can edit the <ctype><map> contents in one of the character set XML files to specify that '-' is a ?letter.? Then use the given character set for your FULLTEXT indexes. Link to comment https://forums.phpfreaks.com/topic/268111-full-text-index-relevancy-issue/#findComment-1376077 Share on other sites More sharing options...
xyph Posted September 7, 2012 Share Posted September 7, 2012 Hi I do want to be able to sort them, but it is also useful to see how it is rating matches. Problem appears to be that match assumes a hyphen separates words. Also it ignores words less than 4 characters long so D-LINK and TP-LINK are taken as being the same. All the best Keith Get dat Sphinx?! Link to comment https://forums.phpfreaks.com/topic/268111-full-text-index-relevancy-issue/#findComment-1376094 Share on other sites More sharing options...
fenway Posted September 8, 2012 Share Posted September 8, 2012 Yeah, FT is mysql is rather limited -- you'll have to mess with internals to trick it into using a hyphen a part of a word. I'd vote for Sphinx, too. Link to comment https://forums.phpfreaks.com/topic/268111-full-text-index-relevancy-issue/#findComment-1376300 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.