[email protected] Posted February 23, 2008 Share Posted February 23, 2008 This has been driving me nuts over the past few hours, so am hoping someone can help me understand what's going on here. I have two servers, one online (linux) and my pc that I test on (windows), both running php 5.1.6. The same script on the two servers produces different output but I can't figure out why. SCRIPT: $input = "Bóthar greatest"; printf("original: %s<br>\r\n", $input); printf("iconv: %s<br>\r\n", iconv('', 'UTF-8', $input)); printf("utf8 decoded: %s<br>\r\n", utf8_decode($input)); printf("iconv trans decoded: %s<br>\r\n", iconv('UTF-8', 'ISO-8859-1//TRANSLIT', $input)); preg_match('/([\w]{2,}|\d+)/i', utf8_decode($input), $match, PREG_OFFSET_CAPTURE); printf("<pre>%s</pre><br>\r\n", print_r($match, true)); preg_match('/([\w]{2,}|\d+)/i', $input, $match, PREG_OFFSET_CAPTURE); printf("<pre>%s</pre><br>\r\n", print_r($match, true)); WINDOWS OUTPUT: original: Bóthar greatest<br> iconv: Bóthar greatest<br> utf8 decoded: Bóthar greatest<br> iconv trans decoded: Bóthar greatest<br> <pre>Array ( [0] => Array ( [0] => Bóthar [1] => 0 ) [1] => Array ( [0] => Bóthar [1] => 0 ) ) </pre><br> <pre>Array ( [0] => Array ( [0] => Bóthar [1] => 0 ) [1] => Array ( [0] => Bóthar [1] => 0 ) ) </pre><br> LINUX OUTPUT: original: Bóthar greatest<br> iconv: B<br> utf8 decoded: Bóthar greatest<br> iconv trans decoded: Bóthar greatest<br> <pre>Array ( [0] => Array ( [0] => thar [1] => 2 ) [1] => Array ( [0] => thar [1] => 2 ) ) </pre><br> <pre>Array ( [0] => Array ( [0] => thar [1] => 3 ) [1] => Array ( [0] => thar [1] => 3 ) ) </pre><br> My problem appears to be that the linux preg_match is not matching anything outside ascii, whereas the windows preg_match has no problem matching the "ó". Also on Linux iconv converts "Bóthar" to "B" whereas windows converts to "Bóthar". Why's that? This has been driving me nuts over the past few hours, so am hoping someone can help me understand what's going on here. In my php.ini file the iconv settings are the same for both platforms (i.e. everything is: ISO-8859-1), except that windows uses "libiconv" (1.9) and linux uses "glibc" (2.3.4). Has anyone any pointers as to where I should be looking, so that I can get my two platforms aligned or why the preg_match function can "see" non ascii characters? Link to comment https://forums.phpfreaks.com/topic/92639-preg_match-and-whats-going-on-with-encoding/ Share on other sites More sharing options...
[email protected] Posted February 24, 2008 Author Share Posted February 24, 2008 nevermind found it. the locale info was set to "C" on linux and on windows was set to "1252" Link to comment https://forums.phpfreaks.com/topic/92639-preg_match-and-whats-going-on-with-encoding/#findComment-474823 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.