junk@alf2.com Posted February 23, 2008 Share Posted February 23, 2008 This has been driving me nuts over the past few hours, so am hoping someone can help me understand what's going on here. I have two servers, one online (linux) and my pc that I test on (windows), both running php 5.1.6. The same script on the two servers produces different output but I can't figure out why. SCRIPT: $input = "Bóthar greatest"; printf("original: %s<br>\r\n", $input); printf("iconv: %s<br>\r\n", iconv('', 'UTF-8', $input)); printf("utf8 decoded: %s<br>\r\n", utf8_decode($input)); printf("iconv trans decoded: %s<br>\r\n", iconv('UTF-8', 'ISO-8859-1//TRANSLIT', $input)); preg_match('/([\w]{2,}|\d+)/i', utf8_decode($input), $match, PREG_OFFSET_CAPTURE); printf("<pre>%s</pre><br>\r\n", print_r($match, true)); preg_match('/([\w]{2,}|\d+)/i', $input, $match, PREG_OFFSET_CAPTURE); printf("<pre>%s</pre><br>\r\n", print_r($match, true)); WINDOWS OUTPUT: original: Bóthar greatest<br> iconv: Bóthar greatest<br> utf8 decoded: Bóthar greatest<br> iconv trans decoded: Bóthar greatest<br> <pre>Array ( [0] => Array ( [0] => Bóthar [1] => 0 ) [1] => Array ( [0] => Bóthar [1] => 0 ) ) </pre><br> <pre>Array ( [0] => Array ( [0] => Bóthar [1] => 0 ) [1] => Array ( [0] => Bóthar [1] => 0 ) ) </pre><br> LINUX OUTPUT: original: Bóthar greatest<br> iconv: B<br> utf8 decoded: Bóthar greatest<br> iconv trans decoded: Bóthar greatest<br> <pre>Array ( [0] => Array ( [0] => thar [1] => 2 ) [1] => Array ( [0] => thar [1] => 2 ) ) </pre><br> <pre>Array ( [0] => Array ( [0] => thar [1] => 3 ) [1] => Array ( [0] => thar [1] => 3 ) ) </pre><br> My problem appears to be that the linux preg_match is not matching anything outside ascii, whereas the windows preg_match has no problem matching the "ó". Also on Linux iconv converts "Bóthar" to "B" whereas windows converts to "Bóthar". Why's that? This has been driving me nuts over the past few hours, so am hoping someone can help me understand what's going on here. In my php.ini file the iconv settings are the same for both platforms (i.e. everything is: ISO-8859-1), except that windows uses "libiconv" (1.9) and linux uses "glibc" (2.3.4). Has anyone any pointers as to where I should be looking, so that I can get my two platforms aligned or why the preg_match function can "see" non ascii characters? Quote Link to comment Share on other sites More sharing options...
junk@alf2.com Posted February 24, 2008 Author Share Posted February 24, 2008 nevermind found it. the locale info was set to "C" on linux and on windows was set to "1252" Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.