valentinp Posted May 8, 2009 Share Posted May 8, 2009 I would like opinions about building a server side spelling api to use within an application that I am trying to build. Sinfully aspell and pspell are not really within my requirements because I need it to work for all languages not just the ones supported by aspell. I don't need libraries with words since they might be built after the engine is working. What I know: - most spelling engines use double metaphone or similar algorithms to recognize the words that sound like other words that might be the replacement for the word in subject but sinfully the double metaphone, soundex, metaphone and so on and so forth are mostly for english words and words that come from spanish and imported into english. - levenshtein would be best method to recognize words that look like other words or are misspelled by touching another letter on the keyboard or switching two letters (the classic 'teh' and 'alogrithm' are very well known). some of these words are found by metaphone or soundex but under certain circumstances in the european languages (for example romanian, my own language), some of the words that fall in this category are getting marked as misspelled yet no suggestions can be made by similarity with the actual word since they're not found on the metaphone checking of aspell What I don't know: - a way to calculate levenshtein distance between the subject word and all the database words... or filter this amount of words somehow but not by using the metaphone or soundex because while testing with these algorithms the results were not satisfying; and doing this fast enough to make this engine usable on a server environment for an api. If you guys have any other idea or ... maybe we could brainstorm a bit on this subject in order to build this. Thanks in advance for your help Link to comment https://forums.phpfreaks.com/topic/157339-spelling-algorithms/ Share on other sites More sharing options...
thebadbad Posted May 8, 2009 Share Posted May 8, 2009 Relevant functions: levenshtein(), soundex(), metaphone(), similar_text(). Link to comment https://forums.phpfreaks.com/topic/157339-spelling-algorithms/#findComment-829307 Share on other sites More sharing options...
valentinp Posted May 8, 2009 Author Share Posted May 8, 2009 Well as I said in my post: I know those functions and tested them a lot before posting here and asking for other ideas but the problem usually comes in languages that are not english nor similar to english (see romanian : reads letters as they're spelled without having many rules - only ce reads like english che from word check and a few other though they also have special letters and characters which are totally removed or replaced with the letters that they are derived from ) making all those algorithms at least not accurate if not unusable for these languages - worse when you try to check russian, greek and languages where all characters are special not just some. ( russian has a few intonation characters even) Link to comment https://forums.phpfreaks.com/topic/157339-spelling-algorithms/#findComment-829320 Share on other sites More sharing options...
thebadbad Posted May 8, 2009 Share Posted May 8, 2009 Oh, okay. I have no idea how to help then. Link to comment https://forums.phpfreaks.com/topic/157339-spelling-algorithms/#findComment-829323 Share on other sites More sharing options...
Mark Baker Posted May 8, 2009 Share Posted May 8, 2009 For English language, look at Porter Stemming. The article on stemming on Wikipedia also provides links to research on stemming algorithms for othe rlanguages such as French, Portuguese, German and Hungarian. Link to comment https://forums.phpfreaks.com/topic/157339-spelling-algorithms/#findComment-829355 Share on other sites More sharing options...
Axeia Posted May 8, 2009 Share Posted May 8, 2009 I doubt anyone tried to do this in PHP itself as I don't think it would be the tool for the job. If I were you I'd try to find a solution in a desktop programming language, such as java/c#/c++ Bound to be libraries tackling this kind of problem for those, and you use them to create the output in a way PHP can easily make use of it. Catch is ofcourse.. that knowing one of those languages would come in handy Link to comment https://forums.phpfreaks.com/topic/157339-spelling-algorithms/#findComment-829370 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.