Jump to content


Photo

Working with russian characters


  • Please log in to reply
6 replies to this topic

#1 ctiberg

ctiberg
  • Members
  • Pip
  • Newbie
  • 7 posts

Posted 06 October 2006 - 11:35 AM

Hello!

We've always been working with english and swedish pages, but have now gotten work with a russian site. The server and MySQL databases are using the latin1 codepage, while the russian text (I think) is in cp1251. When we post a form with russian text, it's translated into html entities (like &1332;). This is far from ideal.

So does anyone know what to do to avoid that translation? Please try to be as specific as possible, this is all new ground to me!
Best regards, [br] Christian Tiberg

#2 printf

printf
  • Staff Alumni
  • Advanced Member
  • 889 posts

Posted 06 October 2006 - 11:47 AM

How is it being translated?

me!

#3 ctiberg

ctiberg
  • Members
  • Pip
  • Newbie
  • 7 posts

Posted 06 October 2006 - 11:52 AM

The below is a sample - it's russian for "Contact information". Each character is represented by its HTML entity.

контактная информация
Best regards, [br] Christian Tiberg

#4 printf

printf
  • Staff Alumni
  • Advanced Member
  • 889 posts

Posted 06 October 2006 - 12:04 PM

Ok, so the browser is converting them, not PHP! So what do you want to do? Normalize your data or convert to some ISO standard. you have to decide, because it will be all or nothing, because you don't want to do things one way in scripting, then not also do the same thing in your database! So utf8 everything or use code page by request?


me!

#5 ctiberg

ctiberg
  • Members
  • Pip
  • Newbie
  • 7 posts

Posted 06 October 2006 - 12:10 PM

Hi there!

Thanks for your reply...

I thought about trying to get the browser to give me the text in the right codepage, and that's my first question. Is it an attribute of the <form> element that needs to be changed?

If that doesn't work, I might try going to UTF-8 for the text. But even then, I would need to tell the browser to submit UTF8, right?

I know how to specify a character set for a whole table in MySQL, so that part is taken care of (I think, I might need more later).
Best regards, [br] Christian Tiberg

#6 printf

printf
  • Staff Alumni
  • Advanced Member
  • 889 posts

Posted 06 October 2006 - 01:39 PM

You can't always force the browser to send what you want, but it doesn't matter, because you shouldn't trust what it is sending. So the basic rule is to set up converts that can convert special characters back into there real unicode state. You can use a preg_callback on the incoming data for that, then inside your callback do your converting. or just make simple function and pass it your string. A str_replace will faster than using any of the multibyte functions that don't always work correctly under PHP 4 or PHP 5 for that matter. My testing in PHP 6, is totally diffrent, the multibyte character functions all work in that version. But what chance do you or anyone else for that matter have a host or your self running PHP 6!


<?

$string = '&#1082;&#1086;&#1085;&#1090;&#1072;&#1082;&#1090;&#1085;&#1072;&#1103; &#1080;&#1085;&#1092;&#1086;&#1088;&#1084;&#1072;&#1094;&#1080;&#1103;';

function decode_russian_entities ( $str )
{
	$out = '';

	$rc = split ( '&#', $str );

	foreach ( $rc as $cv )
	{
		$tc = '';

		if ( ( $pos = strpos ( $cv, ';' ) ) > 0 )
		{
			$nc = substr ( $cv, 0, $pos );

			if ( $nc >= 1040 && $nc <= 1103 )
			{
				$tc = chr ( ( $nc - 848 ) );
			}
			else
			{
				$tc = '&#' . $nc . ';';
			}

			$tc .= substr ( $cv, ( $pos + 1 ) );
		}
		else
		{
			$tc = $cv;
		}

		$out .= $tc;
	}

	return ( $out );
}

echo decode_russian_entities ( $string );

?>



me!

#7 Daniel0

Daniel0
  • Staff Alumni
  • Advanced Member
  • 11,956 posts

Posted 06 October 2006 - 02:30 PM

Put the charset inside this:

<?php
header("Content-type: text/html; charset=**the charset here**");
?>

Note that it will have to be run BEFORE anything is output except if you use output control. Just put it on the very top and you'll be fine.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users