kickstart Posted September 7, 2012 Share Posted September 7, 2012 Hi I have a full text search I am trying to use to search against a string of important search terms. However the ranking of the results is a bit strange. For example, search for "d-link router" against this column it is bringing back a fair few rows, but ranks a row containing tp-link but not d-link higher than one that contains d-link. If, this row is ranked 9.4198112487793 Routers-and-Switches TP-Link TL-MR3220 TP-TL-MR3220 ROUTER tlw&tlw tlwAVtlw BUNDLE tlw3Gtlw N-LITE ADSL ROUTER tlw&tlw tlw1YRtlw BULLGUARD tlwAVtlw TP-LINK TP-Link TL-MR3220 3G/3.75G 150Mbps Wireless Lite tlwNtlw Router 6935364051501 while this row is ranked 8.55044555664062 Routers-and-Switches D-Link DSL-2680/UK DL-DSL-2680 D-LINK ADSL ROUTER WIRELESS tlwNtlw tlw150tlw ADSL2+ ROUTER DLINK D-Link DSL-2680 Wireless tlwNtlw tlw150tlw ADSL2+ Modem Router 790069334535 The match statement is as follows:- SELECT item_keyword_search, MATCH (item_keyword_search) AGAINST ('d-link* router*' ) FROM item_import AND MATCH (item_keyword_search) AGAINST ('d-link* router*' ) Eliminating the * wildcards doesn't change this, nor does splitting the words with a comma. Any suggestions? All the best Keith Quote Link to comment https://forums.phpfreaks.com/topic/268111-full-text-index-relevancy-issue/ Share on other sites More sharing options...
The Little Guy Posted September 7, 2012 Share Posted September 7, 2012 putting a match in the field list is useless, unless you want to sort by it. SELECT item_keyword_search, (MATCH (item_keyword_search) AGAINST ('d-link* router*' IN BOOLEAN MODE)) as score FROM item_import WHERE MATCH (item_keyword_search) AGAINST ('d-link* router*' IN BOOLEAN MODE) order by score desc Quote Link to comment https://forums.phpfreaks.com/topic/268111-full-text-index-relevancy-issue/#findComment-1376067 Share on other sites More sharing options...
kickstart Posted September 7, 2012 Author Share Posted September 7, 2012 Hi I do want to be able to sort them, but it is also useful to see how it is rating matches. Problem appears to be that match assumes a hyphen separates words. Also it ignores words less than 4 characters long so D-LINK and TP-LINK are taken as being the same. All the best Keith Quote Link to comment https://forums.phpfreaks.com/topic/268111-full-text-index-relevancy-issue/#findComment-1376073 Share on other sites More sharing options...
The Little Guy Posted September 7, 2012 Share Posted September 7, 2012 if you have access to the config file; ft_min_word_len = 3 If a word is specified with the truncation operator' date=' it is not stripped from a boolean query, even if it is too short (as determined from the ft_min_word_len setting) or a stopword. This occurs because the word is not seen as too short or a stopword, but as a prefix that must be present in the document in the form of a word that begins with the prefix. Suppose that ft_min_word_len=4. ft_min_word_len=4. Then a search for '+word +the*' will likely return fewer rows than a search for '+word +the'[/quote'] Possibility: Modify a character set file: This requires no recompilation. The true_word_char() macro uses a ?character type? table to distinguish letters and numbers from other characters. . You can edit the <ctype><map> contents in one of the character set XML files to specify that '-' is a ?letter.? Then use the given character set for your FULLTEXT indexes. Quote Link to comment https://forums.phpfreaks.com/topic/268111-full-text-index-relevancy-issue/#findComment-1376077 Share on other sites More sharing options...
xyph Posted September 7, 2012 Share Posted September 7, 2012 Hi I do want to be able to sort them, but it is also useful to see how it is rating matches. Problem appears to be that match assumes a hyphen separates words. Also it ignores words less than 4 characters long so D-LINK and TP-LINK are taken as being the same. All the best Keith Get dat Sphinx?! Quote Link to comment https://forums.phpfreaks.com/topic/268111-full-text-index-relevancy-issue/#findComment-1376094 Share on other sites More sharing options...
fenway Posted September 8, 2012 Share Posted September 8, 2012 Yeah, FT is mysql is rather limited -- you'll have to mess with internals to trick it into using a hyphen a part of a word. I'd vote for Sphinx, too. Quote Link to comment https://forums.phpfreaks.com/topic/268111-full-text-index-relevancy-issue/#findComment-1376300 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.