Jump to content

Working with russian characters


ctiberg

Recommended Posts

Hello!

We've always been working with english and swedish pages, but have now gotten work with a russian site. The server and MySQL databases are using the latin1 codepage, while the russian text (I think) is in cp1251. When we post a form with russian text, it's translated into html entities (like &1332;). This is far from ideal.

So does anyone know what to do to avoid that translation? Please try to be as specific as possible, this is all new ground to me!
Link to comment
Share on other sites

The below is a sample - it's russian for "Contact information". Each character is represented by its HTML entity.

контактная информация
Link to comment
Share on other sites

Ok, so the browser is converting them, not PHP! So what do you want to do? Normalize your data or convert to some ISO standard. you have to decide, because it will be all or nothing, because you don't want to do things one way in scripting, then not also do the same thing in your database! So utf8 everything or use code page by request?


me!
Link to comment
Share on other sites

Hi there!

Thanks for your reply...

I thought about trying to get the browser to give me the text in the right codepage, and that's my first question. Is it an attribute of the <form> element that needs to be changed?

If that doesn't work, I might try going to UTF-8 for the text. But even then, I would need to tell the browser to submit UTF8, right?

I know how to specify a character set for a whole table in MySQL, so that part is taken care of (I think, I might need more later).
Link to comment
Share on other sites

You can't [b]always[/b] force the browser to send what you want, but it doesn't matter, because you shouldn't trust what it is sending. So the basic rule is to set up converts that can convert special characters back into there real unicode state. You can use a preg_callback on the incoming data for that, then inside your callback do your converting. or just make simple function and pass it your string. A str_replace will faster than using any of the multibyte functions that don't always work correctly under PHP 4 or PHP 5 for that matter. My testing in PHP 6, is totally diffrent, the multibyte character functions all work in that version. But what chance do you or anyone else for that matter have a host or your self running PHP 6!


[code]<?

$string = '&#1082;&#1086;&#1085;&#1090;&#1072;&#1082;&#1090;&#1085;&#1072;&#1103; &#1080;&#1085;&#1092;&#1086;&#1088;&#1084;&#1072;&#1094;&#1080;&#1103;';

function decode_russian_entities ( $str )
{
$out = '';

$rc = split ( '&#', $str );

foreach ( $rc as $cv )
{
$tc = '';

if ( ( $pos = strpos ( $cv, ';' ) ) > 0 )
{
$nc = substr ( $cv, 0, $pos );

if ( $nc >= 1040 && $nc <= 1103 )
{
$tc = chr ( ( $nc - 848 ) );
}
else
{
$tc = '&#' . $nc . ';';
}

$tc .= substr ( $cv, ( $pos + 1 ) );
}
else
{
$tc = $cv;
}

$out .= $tc;
}

return ( $out );
}

echo decode_russian_entities ( $string );

?>
[/code]



me!
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.