Jump to content

[SOLVED] converting Unicode characters


miseleigh

Recommended Posts

I'm just wondering if there are any functions or scripts out there to convert non-English characters to their English equivalents (ex: ñ->n, or ä->a, and hopefully æ->ae.)  I haven't been able to find much.  The best I've found was a quick link to a Unicode normalizer, but it doesn't seem to do what I'm looking for - or at least, if it can, I can't figure out how.  This isn't critical or anything, just a question, but if you happen to know the answer...

 

Thanks!

Link to comment
Share on other sites

If your data is in UTF-8 you should be able to decompose the characters, then remove the marks with a regular expression. There's a module mentioned at the bottom of my post here and an example of detecting marks here. I don't think "æ" will decompose because it does not involve any markings; you might have to write your own for this.

Link to comment
Share on other sites

Actually, I've already read that whole thread (that's where I got the link to the normalizer I pointed to - thanks!) but I don't quite understand how to do it. Would this work?

$newfoo = I18N_UnicodeNormalizer::toNFD('foo'); 
preg_replace('/\p{M}/u','', $newfoo);

Is NFD the right one?  I found one source that says that one means decompose...

 

Unicode's interesting, but it's a bit confusing for someone who's never had to deal with it before.  Thanks for helping.

Link to comment
Share on other sites

I haven't used the module myself, but yes, that looks correct. Make sure you assign the preg_replace results back to $newfoo if you want the variable to be modified. NFD stands for "Normalization Form D"--the "D" for decomposition. On second thought, you may be able to pull off the æ decomposition by using NKFD, which is a compatibility decomposition rather than a canonical one.

 

It's interesting indeed. The basics aren't so bad :)

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.