mvfreelance Posted September 29, 2009 Share Posted September 29, 2009 Hey folks! I'm no noobie, but got a problem here that' is driving me nuts... my enviromment: * php 5.2.10 * apache 2.2 * OS Windows Vista * RewriteEngine On * DefaultCharset UTF-8 When requesting any page and parsing values thru $_GET, all works fine, apart when parsing "french strings" (e.g. Chloé / Chlo%E9 , becomes: Chlo , OR Élle ,becomes: lle ) So I have: page.php <?php header("Content-type: text/html; charset=ISO-8859-1"); print("string(".strlen($_GET['var']) .") ".$_GET['var'] ."\n"); print(urldecode($_SERVER['REQUEST_URI'])); ?> request : page.php?var=ABC%25DE expected: string(6) ABC%DE page.php?var=ABC%25DE output : string(6) ABC%DE page.php?var=ABC%25DE ------------------------------------------------------ request : page.php?var=ABC%DE expected: string(6) ABC%DE page.php?var=ABC%DE output : string(6) ABC%DE page.php?var=ABC%DE ------------------------------------------------------ request : page.php?var=ABCDÉF expected: string(6) ABCDÉF page.php?var=ABCDÉF output : string(5) ABCDF page.php?var=ABCDÉF ------------------------------------------------------ request : page.php?var=ABCD%E9F expected: string(6) ABCDÉF page.php?var=ABCD%E9F output : string(5) ABCDF page.php?var=ABCD%E9F ------------------------------------------------------ So, the "É" (and its urlencode equivalent %E9) were simply ignored by PHP. Got the same results for the code <?php header("Content-type: text/html; charset=ISO-8859-1"); print("string(".mb_strlen(utf8_encode($_GET['var'])) .") ".$_GET['var'] ."\n"); print(urldecode($_SERVER['REQUEST_URI'])); ?> Anyone please??? Quote Link to comment Share on other sites More sharing options...
Alex Posted September 29, 2009 Share Posted September 29, 2009 Use UTF-8: <?php header("Content-type: text/html; charset=UTF-8"); print("string(".strlen($_GET['var']) .") ".$_GET['var'] ."\n"); print(urldecode($_SERVER['REQUEST_URI'])); ?> Quote Link to comment Share on other sites More sharing options...
mvfreelance Posted September 29, 2009 Author Share Posted September 29, 2009 My mistake when posting... I did indeed tried to use header(.... utf- .. <?php header("Content-type: text/html; charset=UTF-8"); print("string(".mb_strlen(utf8_encode($_GET['var'])) .") ".$_GET['var'] ."\n"); print(urldecode($_SERVER['REQUEST_URI'])); ?> return the same results.... I'm pretty sure that this is a configurations problem. coz the code works fine when Apache is running under Linux (centos)... thanks for the replay anyways.... Any more suggestions please?? Quote Link to comment Share on other sites More sharing options...
Alex Posted September 29, 2009 Share Posted September 29, 2009 What browser are you using? Because I know that IE has problems with UTF-8. Quote Link to comment Share on other sites More sharing options...
cags Posted September 29, 2009 Share Posted September 29, 2009 I have practically the same setup as you. The only difference being I use XP 64bit rather than Vista. I just tried you code, and it worked perfectly on my computer. %E9 is actually é %C9 is É Thats not really relevant though. Sorry I can't be of any help, just wanted to let you know it works for me (on all modern browsers), so it sounds like a configuration issue. Quote Link to comment Share on other sites More sharing options...
mvfreelance Posted September 29, 2009 Author Share Posted September 29, 2009 no matter the browser I use, the results are the same... I have tried: IE6 IE7 IE8 FF 3.5.3 FF 3.5 curl.exe 7.19.3 plus, the same same code works just fine in a different server.. so def not a browser issue.. any more tips , please??? Quote Link to comment Share on other sites More sharing options...
redarrow Posted September 29, 2009 Share Posted September 29, 2009 /** * Encodes HTML safely for UTF-8. Use instead of htmlentities. * * @param string $var * @return string */ function html_encode($var) { return htmlentities($var, ENT_QUOTES, 'UTF-8') ; } Check through your code for any text-based content-type headers, and append the UTF-8 charset, so the browser knows what it's working with: header('Content-type: text/html; charset=UTF-8') ; You should also repeat this at the top of HTML pages: <meta http-equiv="Content-type" value="text/html; charset=UTF-8" /> Quote Link to comment Share on other sites More sharing options...
mvfreelance Posted September 29, 2009 Author Share Posted September 29, 2009 Redarrow thanks for your input, but... as I exposed in the original post. I test the simple request: page.php?var=abcdeéf the expected result: string(7) abcdeéf page.php?var=abcdeéf page.php , code simply prints the length of the parameter 'var' followed by its value then, prints the requested URL. and no matter if the header is ISO-8859-1 or UTF-8. the output I got is the same, as follow: string(7) abcdef page.php?var=abcdeéf the PHP constant $_SERVER['REQUEST_URl'] works FINE, but $_GET does NOT. really weird <?php header("Content-type: text/html; charset=ISO-8859-1"); print("string(".strlen($_GET['var']) .") ".$_GET['var'] ."\n"); print(urldecode($_SERVER['REQUEST_URI'])); ?> and <?php header("Content-type: text/html; charset=UTF-8"); print("string(".strlen($_GET['var']) .") ".$_GET['var'] ."\n"); print(urldecode($_SERVER['REQUEST_URI'])); ?> Quote Link to comment Share on other sites More sharing options...
cags Posted September 29, 2009 Share Posted September 29, 2009 I suggest you take a look at this aritcle. It seems that many of the inbuilt PHP functions such as strlen just do not work with multi-byte characters. I'm not sure if this is what's causing part of your problem, as your code works on my computer, but it's an interesting read none-the-less and may give you some inspiration. Quote Link to comment Share on other sites More sharing options...
mvfreelance Posted September 29, 2009 Author Share Posted September 29, 2009 I'm fully aware of PHP lack of support and bugs, UTF-8 related ( though it seams that PHP 6 will be fully UTF- as you said, the code works fine in your box (by the way many thanks for taking the time to try it). And it also works fine in a LAMP box I've got. And in a WAMP box of a friend. But my main WAMP box doens't respond as expected. [dammit ] I'm fairly familiar with "internationalization" so well said in the article you recommended (www.phpwact.org/php/i18n/charsets) , that makes me 99% confident that the problem is some configuration in either php.ini or httpd.conf Quote Link to comment Share on other sites More sharing options...
mvfreelance Posted September 30, 2009 Author Share Posted September 30, 2009 Many thanks to AlexWD , cags & redarrow for their inputs! As I suspected at first the problem was configuration related I've got Multibyte String aka mbstring enabled (great extension for internationalization jobs) BUT when encoding_translation is enabled (encoding_translation=On), it will try translate all HTTP requests before sending them into the great PHP engine - roughly speaking and because sometimes it can't translate certain characters, it simply removes those characters from the HTTP request - dunno why... Enables the transparent character encoding filter for the incoming HTTP queries, which performs detection and conversion of the input encoding to the internal character encoding. http://www.php.net/manual/en/mbstring.configuration.php#ini.mbstring.encoding-translation Anyways, really recommend encoding_translation=Off . best 4 all! Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.