Jump to content

Replacing UTF8 codes in private use area


Yorick

Recommended Posts

Hello,

 

I've got a huge database that is filled with text. It is encoded in UTF8 and some of the symbols used (like emoticons) are encoded in the private use area of UTF8 (http://www.fileformat.info/info/unicode/block/private_use_area/utf8test.htm). Now I want to replace those codes of the private use area with the corresponding smilies etcetera.

 

So actually my question is, how do I replace specific UTF8 codes with something else in PHP?

 

Thanks in advance!

UTF-8 is just an encoding. Behind it are actual bytes of data.

 

Hopefully utf8_encode() allows you to convert private use Unicode characters into UTF-8 sequences. Can't test where I am. U+E8B9 should be... 0xEEA2B9 I think.

Get the byte encoding of whatever character, if you don't have that already, and do a binary-safe search-and-replace for each emoticon. If you want to do it in PHP,

//$text = str_replace(utf8_encode("\xE8\xB9"), ":)", $text);
$text = str_replace("\xEE\xA2\xB9", ":)", $text);

UTF-8 is just an encoding. Behind it are actual bytes of data.

 

Hopefully utf8_encode() allows you to convert private use Unicode characters into UTF-8 sequences. Can't test where I am. U+E8B9 should be... 0xEEA2B9 I think.

Get the byte encoding of whatever character, if you don't have that already, and do a binary-safe search-and-replace for each emoticon. If you want to do it in PHP,

//$text = str_replace(utf8_encode("\xE8\xB9"), "", $text);
$text = str_replace("\xEE\xA2\xB9", "", $text);

 

Great! That worked, thank you very much for the quick reply!

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.