Jump to content

spelling algorithms


valentinp

Recommended Posts

I would like opinions about building a server side spelling api to use within an application that I am trying to build. Sinfully aspell and pspell are not really within my requirements because I need it to work for all languages not just the ones supported by aspell. I don't need libraries with words since they might be built after the engine is working.

What I know:

- most spelling engines use double metaphone or similar algorithms to recognize the words that sound like other words that might be the replacement for the word in subject but sinfully the double metaphone, soundex, metaphone and so on and so forth are mostly for english words and words that come from spanish and imported into english.

- levenshtein would be best method to recognize words that look like other words or are misspelled by touching another letter on the keyboard or switching two letters (the classic 'teh' and 'alogrithm' are very well known). some of these words are found by metaphone or soundex but under certain circumstances in the european languages (for example romanian, my own language), some of the words that fall in this category are getting marked as misspelled yet no suggestions can be made by similarity with the actual word since they're not found on the metaphone checking of aspell

 

What I don't know:

- a way to calculate levenshtein distance between the subject word and all the database words... or filter this amount of words somehow but not by using the metaphone or soundex because while testing with these algorithms the results were not satisfying; and doing this fast enough to make this engine usable on a server environment for an api.

 

If you guys have any other idea or ... maybe we could brainstorm a bit on this subject in order to build this.

 

Thanks in advance for your help

 

Link to comment
https://forums.phpfreaks.com/topic/157339-spelling-algorithms/
Share on other sites

Well as I said in my post: I know those functions and tested them a lot before posting here and asking for other ideas but the problem usually comes in languages that are not english nor similar to english (see romanian : reads letters as they're spelled without having many rules - only ce reads like english che from word check and a few other though they also have special letters and characters which are totally removed or replaced with the letters that they are derived from ) making all those algorithms at least not accurate if not unusable for these languages - worse when you try to check russian, greek and languages where all characters are special not just some. ( russian has a few intonation characters even)

Link to comment
https://forums.phpfreaks.com/topic/157339-spelling-algorithms/#findComment-829320
Share on other sites

I doubt anyone tried to do this in PHP itself as I don't think it would be the tool for the job.

 

If I were you  I'd try to find a solution in a desktop programming language, such as java/c#/c++

Bound to be libraries tackling this kind of problem for those, and you use them to create the output in a way PHP can easily make use of it.

Catch is ofcourse.. that knowing one of those languages would come in handy ;)

Link to comment
https://forums.phpfreaks.com/topic/157339-spelling-algorithms/#findComment-829370
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.