miseleigh Posted January 4, 2008 Share Posted January 4, 2008 I'm just wondering if there are any functions or scripts out there to convert non-English characters to their English equivalents (ex: ñ->n, or ä->a, and hopefully æ->ae.) I haven't been able to find much. The best I've found was a quick link to a Unicode normalizer, but it doesn't seem to do what I'm looking for - or at least, if it can, I can't figure out how. This isn't critical or anything, just a question, but if you happen to know the answer... Thanks! Quote Link to comment https://forums.phpfreaks.com/topic/84517-solved-converting-unicode-characters/ Share on other sites More sharing options...
effigy Posted January 4, 2008 Share Posted January 4, 2008 If your data is in UTF-8 you should be able to decompose the characters, then remove the marks with a regular expression. There's a module mentioned at the bottom of my post here and an example of detecting marks here. I don't think "æ" will decompose because it does not involve any markings; you might have to write your own for this. Quote Link to comment https://forums.phpfreaks.com/topic/84517-solved-converting-unicode-characters/#findComment-430619 Share on other sites More sharing options...
miseleigh Posted January 4, 2008 Author Share Posted January 4, 2008 Actually, I've already read that whole thread (that's where I got the link to the normalizer I pointed to - thanks!) but I don't quite understand how to do it. Would this work? $newfoo = I18N_UnicodeNormalizer::toNFD('foo'); preg_replace('/\p{M}/u','', $newfoo); Is NFD the right one? I found one source that says that one means decompose... Unicode's interesting, but it's a bit confusing for someone who's never had to deal with it before. Thanks for helping. Quote Link to comment https://forums.phpfreaks.com/topic/84517-solved-converting-unicode-characters/#findComment-430667 Share on other sites More sharing options...
effigy Posted January 4, 2008 Share Posted January 4, 2008 I haven't used the module myself, but yes, that looks correct. Make sure you assign the preg_replace results back to $newfoo if you want the variable to be modified. NFD stands for "Normalization Form D"--the "D" for decomposition. On second thought, you may be able to pull off the æ decomposition by using NKFD, which is a compatibility decomposition rather than a canonical one. It's interesting indeed. The basics aren't so bad Quote Link to comment https://forums.phpfreaks.com/topic/84517-solved-converting-unicode-characters/#findComment-430674 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.