Jump to content

Archived

This topic is now archived and is closed to further replies.

ctiberg

Russian chars part II: Moving to UTF8

Recommended Posts

Hello!

I posted earlier about having problems with russian characters. I now have decided to move to UTF8, but can't seem to get this to work. My test system contains 3 scripts - an editor (a form), a storer, and a viewer.

I seem to be able to get the stuff into the database in UTF8, but then I can't show it on screen - all I get is garbage. So I hope for some help here, preferrably hands-on :)

The editor is just a form, with the following "specials":

[code]
<?php header("Content-type: text/html; charset=utf-8"); ?>
<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=utf-8">
<form name="inputfrm" method="POST" action="lagra_txt.php" accept-charset="utf-8">
[/code]

Despite specifying utf-8 in the accept-charset, I seem to get windows-1252. Why?

On to the storer. Here I've got this:

[code]
// Connect to the DB using mysql_connect and mysql_select_db

  $sql = "SET NAMES 'utf8'";
  mysql_query($sql);

  // The lines below were copied from an article on mysql.com - they check if I got UTF-8
  $test  = $_POST["charset_check"];
  if (bin2hex($test) == "c3a4e284a2c2ae")
    $OK = true;
  elseif (bin2hex($test) == "e499ae")
    $OK = false;
  else
    die("Sorry, I didn't understand the character set of the data you sent!");

  foreach ($_POST as $key => $val)
    {
      if ($key == "charset_check") continue;
      if ($val != "")
        {
          if (!$OK) $val = iconv("windows-1252", "utf-8", $val);
          $sql = "UPDATE luka_texter SET `Text`='".$val."' WHERE ID='".$key."' AND Sprak='ru'";
          mysql_query($sql);
        }
    }
[/code]

As I said, this seems to get the stuff into the DB alright, and I think it's in UTF8 in there (at least it looks like junk, which is what UTF8 seems to me).

The viewer is very simple, like this:

[code]
<?php header("Content-type: text/html; charset=utf-8"); ?>
<html>
<head>
<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=utf-8">
<title>DB-edit</title>
</head>
<body>
<?php
// Connect to the DB using mysql_connect and mysql_select_db

  $sql = "SET NAMES 'utf8'";
  mysql_query($sql);

  $sql = "SELECT ID, Text FROM luka_texter WHERE Sprak='ru'";
  $res = mysql_query($sql) or die(mysql_error());
  while ($rad = mysql_fetch_assoc($res))
    print $rad["ID"]." ".iconv("utf-8", "windows-1252", $rad["Text"])."<br>";
  mysql_free_result($res);
?>
</body></html>
[/code]

The trouble I get is that some texts are truncated, some characters replace by question marks, and so on. So, can anyone point out where I do something wrong?

Share this post


Link to post
Share on other sites
What is the database using?

[code]
SHOW VARIABLES LIKE 'character\_set%'
[/code]

Share this post


Link to post
Share on other sites
Everything is set to latin1 when I do the above in MyDB Studio, except for character_set_system, which is set to utf8.

The Text column have had its character set to UTF8, though, using:

DROP TABLE IF EXISTS `luka_texter`;
CREATE TABLE `luka_texter` (
  `ID` varchar(50) NOT NULL default '',
  `Sprak` char(2) NOT NULL default '',
  `TEXT` text CHARACTER SET utf8,
  PRIMARY KEY  (`ID`,`Sprak`)
) ENGINE=MyISAM DEFAULT CHARACTER SET=latin1;

Share this post


Link to post
Share on other sites
The input is from a form (I gave you the form element syntax above), containing some 40-50 text strings that's been translated into russian from english. I copy them from an Excel sheet one at a time, and then paste them into each form field. Each form field is given a name that is then used as the ID in the MySQL table.

This is of course a very simple example, but I need this to work before I go on to the rest of the site.

Share this post


Link to post
Share on other sites
This is working for me. Note that I changed the table a little.

[code]
<?php header("Content-type: text/html; charset=utf-8"); ?>
<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=utf-8">
<pre>
<?php
if ($_POST) {
### Show what we received and proceed with database interaction.
print_r($_POST);
### Connect, select, drop/create if needed.
mysql_connect('localhost', 'user', 'password') or die;
mysql_select_db('test') or die (mysql_error());
$table_check = mysql_query('DESC `luka_texter`');
if (mysql_error()) {
mysql_query('
CREATE TABLE `luka_texter` (
`ID` INT NOT NULL AUTO_INCREMENT,
`Sprak` char(2) NOT NULL,
`TEXT` text CHARACTER SET utf8,
PRIMARY KEY  (`ID`,`Sprak`)
) ENGINE=MyISAM DEFAULT CHARACTER SET=latin1;
') or die (mysql_error());
}
### Insert.
mysql_query("INSERT INTO `luka_texter` (`Sprak`, `TEXT`) VALUES ('ru', '{$_POST['utf8_textarea']}')") or die (mysql_error());
$query = mysql_query('SELECT TEXT FROM `luka_texter`') or die (mysql_error());
while ($row = mysql_fetch_array($query)) {
echo $row['TEXT'], '<br/>';
}
}

### Create some characters from the Cyrillic block...
$characters  = pack('c*', 0xD0, 0x89);
$characters .= pack('c*', 0xD0, 0x8A);
$characters .= pack('c*', 0xD0, 0x8B);
$characters .= pack('c*', 0xD0, 0x8C);
$characters .= pack('c*', 0xD0, 0x8D);
$characters .= pack('c*', 0xD0, 0x8E);
$characters .= pack('c*', 0xD0, 0x8F);
### ...and put them in the form...
?>

<form name="utf8_test" method="post" action="<?php echo $_SERVER['PHP_SELF']; ?>" accept-charset="utf-8">
<textarea name="utf8_textarea"><?php echo $characters; ?></textarea>
<input type="submit"/>
</form>

</pre>
[/code]

Share this post


Link to post
Share on other sites
I had a rave reply in this textbox, until I tried it out on the production server. There, it has the same problems as my own attempts. That is it gets most of the text right, but some of it is replaced by ?'s.... So I'll try to get a response out of our provider, which I guess will prove very futile. Sigh.

Share this post


Link to post
Share on other sites

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.