phporcaffeine Posted November 16, 2011 Share Posted November 16, 2011 I can't post/contribute in the code repo forum, so I'll post it here and maybe a mod will move it for me? Anyway, this is an algorithm for phonetic search scoring. Basically, it will evaluate a search string against an array of strings (list) and then return an array of ranked results from the list, in order of phonetic likeliness to the search string. In the returned array, the key is the score, the closer to 0, the more likely the phonetic match is; 0 being an exact match. Note: this is not pattern matching, this will match phonetic likeliness only. This is built around the English language as it stands, I am working on multi-lingual considerations. <?php /** * @NAME: Phonetic Search Score * @AUTHOR: Ryan Huff - MyCodeTree.com * @COPYRIGHT: (C)2011 * * The idea here is to supply a list (array) of strings (pressumably from a database * record set) and a single string. The class is designed to return an array of results * with X number of strings from the list that are the closest or equal phonetic * matches to the single string, ranked in order of closest phonetic match. * * The uniqueness of the Phonetic Search Score is the method in which it scores matches. The * scoring is based on the phonetic likeness of the single search string and each string * in the list, and is ranked accordingly. Scoring is not based on string length or * pattern matching like traditional REGEX pattern searching. * * A practical use for this would be a scenario where a user types in a search string, and * then the Phonetic Search Score provides a list of matches, ranked in order of their * phonetic likeliness to the search string. In the returned array, the key is the score, * the closer the key\score is to 0, the more likely the phonetic match is; 0 being an * exact match. * * An example use case is provided below. If you have any questions about the Phonetic * Search Score, you can contact me at support@mycodetree.com * */ class matcher { var $searchCriteria = NULL; //TEXT TO SEARCH FOR var $searchSet = array(); //SET TO SEARCH WITHIN var $numberOfMatches = 5; //NUMBER OF MATCHES TO RETURN var $removalChars = array('.','?','!','@','*','&','%','$','#',',',';',':','"','\''); //PUNCTUATION TO REMOVE var $returnMatches = array(); function scorer() { if (is_array($this->searchSet) && !empty($this->searchCriteria)) { $distal = array(); foreach ($this->removalChars as $val) { $replace[] = ""; } //REMOVE PUNCTUATION, CONVERT TO ALL LOWERCASE AND REMOVE LEADING AND ENDING SPACES $this->searchCriteria = trim(strtolower( str_replace($this->removalChars, $replace, $this->searchCriteria))); //GET METAPHONE KEY FOR SEARCH CRITERIA $scm = metaphone($this->searchCriteria); if ($this->numberOfMatches <= count($this->searchSet)) { for ($i=0; $i < count($this->searchSet); $i++) { $distal[levenshtein($scm, metaphone($this->searchSet[$i]))] = $this->searchSet[$i]; } } else { for ($i=0; $i < $this->numberOfMatches; $i++) { $distal[levenshtein($scm, metaphone($this->searchSet[$i]))] = $this->searchSet[$i]; } } ksort($distal); $this->returnMatches = $distal; } return false; } } /*INSTANTIATE CLASS*/ $score = new matcher(); /*EXAMPLE USE CASE*/ /*SETUP THE LIST OF STRING TO COMPARE THE SEARCH CRITERIA AGAINST*/ $score->searchSet = array( 'this is one item from a result set', 'this is another test item', 'more testing', 'Hello world, I am a test sentence.', 'So, do you have any mustard that I can borrow?' ); /*SETUP THE SEARCH CRITERIA*/ $score->searchCriteria = "Hello world, I am a test sentence."; /*FIRE THE SCORER METHOD (THIS METHOD WOULD BE EXPENSIVE TO CALL IN A LOOP, USE CAUTION)*/ $score->scorer(); /*DUMP THE RESULTS*/ print_r($score->returnMatches); ?> Quote Link to comment https://forums.phpfreaks.com/topic/251231-phonetic-search-score/ Share on other sites More sharing options...
QuickOldCar Posted November 16, 2011 Share Posted November 16, 2011 Tried out the code, although I have no use for phonetic matched ranks, maybe one day I will. Somewhere here I made a pile of similarity functions in ways that scored them , using metaphone, soundex, repitition of same words, and others to find similar occurances or to use for suggestions. Looks useful, thanks for the class. Ever try this with thousands or more of results? Quote Link to comment https://forums.phpfreaks.com/topic/251231-phonetic-search-score/#findComment-1288672 Share on other sites More sharing options...
phporcaffeine Posted November 16, 2011 Author Share Posted November 16, 2011 Tried out the code, although I have no use for phonetic matched ranks, maybe one day I will. Somewhere here I made a pile of similarity functions in ways that scored them , using metaphone, soundex, repitition of same words, and others to find similar occurances or to use for suggestions. Looks useful, thanks for the class. Ever try this with thousands or more of results? Yes, I have tried it with a > 3,000 record set. It isn't hateful, as long as your not trying to match blob's or longtext's .... as long as you stick to the traditional 255 .. varchar length, it seems to be relatively tolerable. Again, though, as you eluded to, metaphone and levenshtein and math-intense constructs. So it can be be super expensive in loops. You could eliminate the scoring mechanism (levenshtein) and just do a simple <> match, and that would reduce calc time. If you go that route though, I would just switch to pattern matching altogether. Quote Link to comment https://forums.phpfreaks.com/topic/251231-phonetic-search-score/#findComment-1288705 Share on other sites More sharing options...
ManiacDan Posted November 16, 2011 Share Posted November 16, 2011 When I did phonetic search suggestions I stored my word list along with its metaphone representation, then just did a SELECT word FROM words WHERE LEVENSHTEIN( metaphone, '$mySearchTerm') <= 2; With a properly configured (and fast) database box, it will work pretty well. Especially with very large word sets to compare against. Quote Link to comment https://forums.phpfreaks.com/topic/251231-phonetic-search-score/#findComment-1288710 Share on other sites More sharing options...
phporcaffeine Posted November 16, 2011 Author Share Posted November 16, 2011 When I did phonetic search suggestions I stored my word list along with its metaphone representation, then just did a SELECT word FROM words WHERE LEVENSHTEIN( metaphone, '$mySearchTerm') <= 2; With a properly configured (and fast) database box, it will work pretty well. Especially with very large word sets to compare against. Great, suggestion! The class I wrote above is, as is most of my work, in the spirit of being database agnostic. If however, you are using MySQL and you have levenshtein or metaphone UDF's compiled in MySQL (or non-compiled UDF's), I would also advocate calculating distal values using clauses in the query. Here is a simple way to add a levenshtein function to MySQL without recompiling any binaries: CREATE FUNCTION levenshtein( s1 VARCHAR(255), s2 VARCHAR(255) ) RETURNS INT DETERMINISTIC BEGIN DECLARE s1_len, s2_len, i, j, c, c_temp, cost INT; DECLARE s1_char CHAR; -- max strlen=255 DECLARE cv0, cv1 VARBINARY(256); SET s1_len = CHAR_LENGTH(s1), s2_len = CHAR_LENGTH(s2), cv1 = 0x00, j = 1, i = 1, c = 0; IF s1 = s2 THEN RETURN 0; ELSEIF s1_len = 0 THEN RETURN s2_len; ELSEIF s2_len = 0 THEN RETURN s1_len; ELSE WHILE j <= s2_len DO SET cv1 = CONCAT(cv1, UNHEX(HEX(j))), j = j + 1; END WHILE; WHILE i <= s1_len DO SET s1_char = SUBSTRING(s1, i, 1), c = i, cv0 = UNHEX(HEX(i)), j = 1; WHILE j <= s2_len DO SET c = c + 1; IF s1_char = SUBSTRING(s2, j, 1) THEN SET cost = 0; ELSE SET cost = 1; END IF; SET c_temp = CONV(HEX(SUBSTRING(cv1, j, 1)), 16, 10) + cost; IF c > c_temp THEN SET c = c_temp; END IF; SET c_temp = CONV(HEX(SUBSTRING(cv1, j+1, 1)), 16, 10) + 1; IF c > c_temp THEN SET c = c_temp; END IF; SET cv0 = CONCAT(cv0, UNHEX(HEX(c))), j = j + 1; END WHILE; SET cv1 = cv0, i = i + 1; END WHILE; END IF; RETURN c; END// Quote Link to comment https://forums.phpfreaks.com/topic/251231-phonetic-search-score/#findComment-1288718 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.