Jump to content

How to convert signs and exotic letters in UTF-8 to NCRs?


BagoZonde

Recommended Posts

Hello freaks ;), I have issue with converting special chars. This is example of string I'm working with:

 

„Quảng Trị”

 

So here we go with bdquo and rdquo quotes and some other stuff with exotic letters (Vietnamese alphabet letters).

 

Of course I'm working with UTF-8 character set.

 

And that's what I need to do with this:

 

1. I'm using some WYSIWYG script for textarea so that exotic letters should be displayed as Decimal NCRs. I'm saving that field to MySQL database (5.5):

 

[Engine] => InnoDB

[Version] => 10

[table_collation] => latin2_general_ci

 

2. Then I want to retrieve that string from database and display as it is.

 

I was trying with htmlentities(), html_entity_decode(), mb_encode_numericentity() but I'm still confused. I can get these exotic chars with ord() (for example first quote stands for char: 226, 128, 158) then replace to Decimal NCR or name with ampersand, but I have no time and strength to build whole table with conversions, besides re-inventing wheel it's nothing I need here ;).

 

I found great site to convert strings so this converter for Decimal NCRs working like a charm: http://rishida.net/tools/conversion/

 

I hope my problem is clearly explained and there's some solution for such things.

Thanks for interesting!

Thanks Christian, your point about collate is well taken. I changed some collations and charsets but it still not working properly.

Here's details about my database after few operations you've suggested:

 

 

[b]CHARACTER SET:[/b]
Array
(
   [0] => Array
       (
           [Variable_name] => character_set_client
           [Value] => utf8
       )

   [1] => Array
       (
           [Variable_name] => character_set_connection
           [Value] => utf8
       )

   [2] => Array
       (
           [Variable_name] => character_set_database
           [Value] => utf8
       )

   [3] => Array
       (
           [Variable_name] => character_set_filesystem
           [Value] => binary
       )

   [4] => Array
       (
           [Variable_name] => character_set_results
           [Value] => utf8
       )

   [5] => Array
       (
           [Variable_name] => character_set_server
           [Value] => latin2
       )

   [6] => Array
       (
           [Variable_name] => character_set_system
           [Value] => utf8
       )

   [7] => Array
       (
           [Variable_name] => character_sets_dir
           [Value] => /home/mysql55/share/charsets/
       )

)

[b]COLLATION:[/b]
Array
(
   [0] => Array
       (
           [Variable_name] => collation_connection
           [Value] => utf8_general_ci
       )

   [1] => Array
       (
           [Variable_name] => collation_database
           [Value] => utf8_general_ci
       )

   [2] => Array
       (
           [Variable_name] => collation_server
           [Value] => latin2_general_ci
       )

)


[b]TABLE:[/b]
[table_collation] => utf8_unicode_ci

 

What I'm doing it's just a simple submitter for quick-test purposes only of course:

   if ($_POST['string']){
       //GET & SAVE
       print 'Input string: ' . $_POST['string'] . '<br />';
       $sql='UPDATE sites SET content=? WHERE id=?';
       $params=array($_POST['string'], ;
       $result=Database::query($sql, $params);
   }

   //CHECK
   $sql='SELECT * FROM sites WHERE id=?';
   $params=array(;
   $result=Database::query($sql, $params);
   print 'Output string: ' . $result[0]['content'];

   //WHAT I WANT
   $string='„Quảng Trị”';
   print <<<FORM
   <form action='/iterator' method="POST">
   <textarea name="string">{$string}</textarea>
   <input type="submit" value="Submit">
   </form>
FORM;

 

I hope there's something simple I've missing. I'm not good in MySQL at all.

Thanks!

Was your data saved to the DB before you set the correct collate? If so, then said data is already corrupted, and you need to fix it. This might be as simple as running a query to update it, but it may also require a manual intervention.

I'm afraid I can't tell you exactly how to fix it, as it depends upon the data itself. Though, all new content should be OK.

 

PS: I'm assuming you're sending the correct charset in the HTTP headers as well.

Thank you so much!

 

I was trying with new table and it works like a charm!

 

So I found that for existing tables is needed to CONVERT to other CHARSET like that:

 

   $table='sites';
   $charset='utf8';
   $sql='ALTER TABLE ' . $table . ' CONVERT TO CHARACTER SET ' . $charset;

 

Thank you so much for help!

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.