Jump to content

[SOLVED] PHP $_GET characters disappear or ignored


mvfreelance

Recommended Posts

Hey folks!

 

I'm no noobie, but got a problem here that' is driving me nuts...

my enviromment:

* php 5.2.10

* apache 2.2

* OS Windows Vista

* RewriteEngine On

* DefaultCharset UTF-8

 

When requesting any page and parsing values thru $_GET, all works fine, apart when parsing "french strings" (e.g. Chloé / Chlo%E9 , becomes: Chlo , OR Élle ,becomes: lle )

So I have:

 

page.php

 

<?php
header("Content-type: text/html; charset=ISO-8859-1");
print("string(".strlen($_GET['var']) .") ".$_GET['var'] ."\n");
print(urldecode($_SERVER['REQUEST_URI']));
?>

 

request : page.php?var=ABC%25DE

expected:

  string(6) ABC%DE

  page.php?var=ABC%25DE 

 

output  :

  string(6) ABC%DE

  page.php?var=ABC%25DE 

 

 

------------------------------------------------------

 

request : page.php?var=ABC%DE

expected:

  string(6) ABC%DE

  page.php?var=ABC%DE 

 

output  :

  string(6) ABC%DE

  page.php?var=ABC%DE 

 

------------------------------------------------------

 

request : page.php?var=ABCDÉF

expected:

  string(6) ABCDÉF

  page.php?var=ABCDÉF

 

output  :

  string(5) ABCDF

  page.php?var=ABCDÉF

 

------------------------------------------------------

 

request : page.php?var=ABCD%E9F

expected:

  string(6) ABCDÉF

  page.php?var=ABCD%E9F

 

output  :

  string(5) ABCDF

  page.php?var=ABCD%E9F

 

------------------------------------------------------

 

So, the "É" (and its urlencode equivalent %E9) were simply ignored by PHP.

 

Got the same results for the code

<?php
header("Content-type: text/html; charset=ISO-8859-1");
print("string(".mb_strlen(utf8_encode($_GET['var'])) .") ".$_GET['var'] ."\n");
print(urldecode($_SERVER['REQUEST_URI']));
?>

 

 

Anyone please???

 

My mistake when posting...

I did indeed tried to use header(.... utf-8) ..

 

<?php
header("Content-type: text/html; charset=UTF-8");
print("string(".mb_strlen(utf8_encode($_GET['var'])) .") ".$_GET['var'] ."\n");
print(urldecode($_SERVER['REQUEST_URI']));
?>

 

return the same results.... I'm pretty sure that this is a configurations problem. coz the code works fine when Apache is running under Linux (centos)...

 

thanks for the replay anyways....

Any more suggestions please??

 

I have practically the same setup as you. The only difference being I use XP 64bit rather than Vista. I just tried you code, and it worked perfectly on my computer.

 

%E9 is actually é

%C9 is É

 

Thats not really relevant though. Sorry I can't be of any help, just wanted to let you know it works for me (on all modern browsers), so it sounds like a configuration issue.

no matter the browser I use, the results are the same...

I have tried:

IE6

IE7

IE8

FF 3.5.3

FF 3.5

curl.exe 7.19.3

 

plus, the same same code works just fine in a different server.. so def not a browser issue..

 

any more tips , please???

 

 

 

 

 

/**

* Encodes HTML safely for UTF-8. Use instead of htmlentities.

*

* @param string $var

* @return string

*/

function html_encode($var)

{

return htmlentities($var, ENT_QUOTES, 'UTF-8') ;

}

 

 

Check through your code for any text-based content-type headers, and append the UTF-8 charset, so the browser knows what it's working with:

 

header('Content-type: text/html; charset=UTF-8') ;

 

You should also repeat this at the top of HTML pages:

 

<meta http-equiv="Content-type" value="text/html; charset=UTF-8" />

Redarrow thanks for your input, but...

as I exposed in the original post.

I test the simple request:

page.php?var=abcdeéf

 

the expected result:

string(7) abcdeéf

page.php?var=abcdeéf

 

page.php , code simply prints the length of the parameter 'var' followed by its value then,

prints the requested URL.

and no matter if the header is ISO-8859-1 or UTF-8. the output I got is the same, as follow:

string(7) abcdef

page.php?var=abcdeéf

 

the PHP constant $_SERVER['REQUEST_URl'] works FINE, but $_GET does NOT.

really weird

 

<?php
header("Content-type: text/html; charset=ISO-8859-1");
print("string(".strlen($_GET['var']) .") ".$_GET['var'] ."\n");
print(urldecode($_SERVER['REQUEST_URI']));
?>

 

and

 

<?php
header("Content-type: text/html; charset=UTF-8");
print("string(".strlen($_GET['var']) .") ".$_GET['var'] ."\n");
print(urldecode($_SERVER['REQUEST_URI']));
?>

I suggest you take a look at this aritcle. It seems that many of the inbuilt PHP functions such as strlen just do not work with multi-byte characters. I'm not sure if this is what's causing part of your problem, as your code works on my computer, but it's an interesting read none-the-less and may give you some inspiration.

I'm fully aware of PHP lack of support and bugs, UTF-8 related ( though it seams that PHP 6 will be fully UTF-8)  8)

 

as you said, the code works fine in your box (by the way many thanks for taking the time to try it). And it also works fine in a LAMP box I've got. And in a WAMP box of a friend.

But my main WAMP box doens't respond as expected. [dammit  >:( ]

 

 

I'm fairly familiar with "internationalization" so well said in the article you recommended (www.phpwact.org/php/i18n/charsets) , that makes me 99%  :confused: confident that the problem is some configuration in either php.ini or httpd.conf

 

 

 

 

Many thanks to AlexWD , cags & redarrow for their inputs!

 

As I suspected at first the problem was configuration related  :rtfm:  :P

I've got Multibyte String aka mbstring enabled (great extension for internationalization jobs)

BUT when encoding_translation is enabled (encoding_translation=On), it will try translate all HTTP requests before sending them into the great PHP engine - roughly speaking  :D

and because sometimes it can't translate certain characters, it simply removes those characters from the HTTP request - dunno why...

Enables the transparent character encoding filter for the incoming HTTP queries, which performs detection and conversion of the input encoding to the internal character encoding.

http://www.php.net/manual/en/mbstring.configuration.php#ini.mbstring.encoding-translation

 

Anyways, really recommend encoding_translation=Off .

 

best 4 all!

 

 

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.