ctiberg Posted October 6, 2006 Share Posted October 6, 2006 Hello!We've always been working with english and swedish pages, but have now gotten work with a russian site. The server and MySQL databases are using the latin1 codepage, while the russian text (I think) is in cp1251. When we post a form with russian text, it's translated into html entities (like &1332;). This is far from ideal.So does anyone know what to do to avoid that translation? Please try to be as specific as possible, this is all new ground to me! Quote Link to comment https://forums.phpfreaks.com/topic/23170-working-with-russian-characters/ Share on other sites More sharing options...
printf Posted October 6, 2006 Share Posted October 6, 2006 How is it being translated?me! Quote Link to comment https://forums.phpfreaks.com/topic/23170-working-with-russian-characters/#findComment-104951 Share on other sites More sharing options...
ctiberg Posted October 6, 2006 Author Share Posted October 6, 2006 The below is a sample - it's russian for "Contact information". Each character is represented by its HTML entity.контактная информация Quote Link to comment https://forums.phpfreaks.com/topic/23170-working-with-russian-characters/#findComment-104955 Share on other sites More sharing options...
printf Posted October 6, 2006 Share Posted October 6, 2006 Ok, so the browser is converting them, not PHP! So what do you want to do? Normalize your data or convert to some ISO standard. you have to decide, because it will be all or nothing, because you don't want to do things one way in scripting, then not also do the same thing in your database! So utf8 everything or use code page by request?me! Quote Link to comment https://forums.phpfreaks.com/topic/23170-working-with-russian-characters/#findComment-104962 Share on other sites More sharing options...
ctiberg Posted October 6, 2006 Author Share Posted October 6, 2006 Hi there!Thanks for your reply...I thought about trying to get the browser to give me the text in the right codepage, and that's my first question. Is it an attribute of the <form> element that needs to be changed?If that doesn't work, I might try going to UTF-8 for the text. But even then, I would need to tell the browser to submit UTF8, right?I know how to specify a character set for a whole table in MySQL, so that part is taken care of (I think, I might need more later). Quote Link to comment https://forums.phpfreaks.com/topic/23170-working-with-russian-characters/#findComment-104966 Share on other sites More sharing options...
printf Posted October 6, 2006 Share Posted October 6, 2006 You can't [b]always[/b] force the browser to send what you want, but it doesn't matter, because you shouldn't trust what it is sending. So the basic rule is to set up converts that can convert special characters back into there real unicode state. You can use a preg_callback on the incoming data for that, then inside your callback do your converting. or just make simple function and pass it your string. A str_replace will faster than using any of the multibyte functions that don't always work correctly under PHP 4 or PHP 5 for that matter. My testing in PHP 6, is totally diffrent, the multibyte character functions all work in that version. But what chance do you or anyone else for that matter have a host or your self running PHP 6![code]<?$string = 'контактная информация';function decode_russian_entities ( $str ){ $out = ''; $rc = split ( '&#', $str ); foreach ( $rc as $cv ) { $tc = ''; if ( ( $pos = strpos ( $cv, ';' ) ) > 0 ) { $nc = substr ( $cv, 0, $pos ); if ( $nc >= 1040 && $nc <= 1103 ) { $tc = chr ( ( $nc - 848 ) ); } else { $tc = '&#' . $nc . ';'; } $tc .= substr ( $cv, ( $pos + 1 ) ); } else { $tc = $cv; } $out .= $tc; } return ( $out );}echo decode_russian_entities ( $string );?>[/code]me! Quote Link to comment https://forums.phpfreaks.com/topic/23170-working-with-russian-characters/#findComment-105017 Share on other sites More sharing options...
Daniel0 Posted October 6, 2006 Share Posted October 6, 2006 Put the charset inside this:[code]<?phpheader("Content-type: text/html; charset=**the charset here**");?>[/code]Note that it will have to be run BEFORE anything is output except if you use output control. Just put it on the very top and you'll be fine. Quote Link to comment https://forums.phpfreaks.com/topic/23170-working-with-russian-characters/#findComment-105052 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.