miseleigh Posted January 4, 2008 Share Posted January 4, 2008 I'm just wondering if there are any functions or scripts out there to convert non-English characters to their English equivalents (ex: ñ->n, or ä->a, and hopefully æ->ae.) I haven't been able to find much. The best I've found was a quick link to a Unicode normalizer, but it doesn't seem to do what I'm looking for - or at least, if it can, I can't figure out how. This isn't critical or anything, just a question, but if you happen to know the answer... Thanks! Link to comment https://forums.phpfreaks.com/topic/84517-solved-converting-unicode-characters/ Share on other sites More sharing options...
effigy Posted January 4, 2008 Share Posted January 4, 2008 If your data is in UTF-8 you should be able to decompose the characters, then remove the marks with a regular expression. There's a module mentioned at the bottom of my post here and an example of detecting marks here. I don't think "æ" will decompose because it does not involve any markings; you might have to write your own for this. Link to comment https://forums.phpfreaks.com/topic/84517-solved-converting-unicode-characters/#findComment-430619 Share on other sites More sharing options...
miseleigh Posted January 4, 2008 Author Share Posted January 4, 2008 Actually, I've already read that whole thread (that's where I got the link to the normalizer I pointed to - thanks!) but I don't quite understand how to do it. Would this work? $newfoo = I18N_UnicodeNormalizer::toNFD('foo'); preg_replace('/\p{M}/u','', $newfoo); Is NFD the right one? I found one source that says that one means decompose... Unicode's interesting, but it's a bit confusing for someone who's never had to deal with it before. Thanks for helping. Link to comment https://forums.phpfreaks.com/topic/84517-solved-converting-unicode-characters/#findComment-430667 Share on other sites More sharing options...
effigy Posted January 4, 2008 Share Posted January 4, 2008 I haven't used the module myself, but yes, that looks correct. Make sure you assign the preg_replace results back to $newfoo if you want the variable to be modified. NFD stands for "Normalization Form D"--the "D" for decomposition. On second thought, you may be able to pull off the æ decomposition by using NKFD, which is a compatibility decomposition rather than a canonical one. It's interesting indeed. The basics aren't so bad Link to comment https://forums.phpfreaks.com/topic/84517-solved-converting-unicode-characters/#findComment-430674 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.