Jump to content

ISO-8859-* characters in GET data


Who

Recommended Posts

Hello,

working on a script that depends on foreign characters being passed across URLs. I think I can best explain the problem with a few examples:

(note: ISO-8859-9 is the turkish character set. I don't speak turkish, but this is where people ran into problems)

index.php:
[code]<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-9" />
</head>
<body>
<?php echo $_GET['word'] ?>
</body>
</html>
[/code]

Now, when accessing the following url:

(Those characters are part of the turkish character set)

<url>/index.php?word=[b]רÞßåæç[/b]

The URL gets converted to
<url>/index.php?word=[b]%C3%97%C3%98%C3%9E%C3%9F%C3%A5%C3%A6%C3%A7[/b]
Which is the percent-encoded version of the characters.

However, the output of the actual page becomes this:

[img]http://img175.imageshack.us/img175/1691/iso88599kz4.png[/img]
(image because I can't actually paste the results here)

This only happens for GET requests. If I submit the same string via a POST request, then it arrives in PHP's $_POST['word'] fine, and echoing it will display the entered text. It also works fine if I specify UTF-8 encoding instead of ISO-8859-*. Using UTF-8, entering stuff like index.php?word=Ββεαομρί will work as expected. (It gets turned into "%CE%92%CE%B2%CE%B5%CE%B1%CE%BF%CE%BC%CF%81%CE%AF")

When using UTF-8 and entering the Turkish character set, the percent-encoded string will actually display fine. So one thought I had was that it converted GET variables to unicode no matter what (that, or the browser does), but I don't know how to get it from UTF-8 percent encoding to ISO-8859-9.

And like I said, it works fine for POST requests, which is the weird part.
Link to comment
https://forums.phpfreaks.com/topic/29492-iso-8859-characters-in-get-data/
Share on other sites

[quote author=r-it link=topic=117393.msg478966#msg478966 date=1165308947]
have u tried urlencode
[/quote]

The problem with urlencode is that it will render the characters to be unreadable. I'm planning to eventually mod_rewrite the URL so it will be http://url/<special characters>/, which rewrites to script.php?word=<special characters>. Most browsers (with the exception of FireFox) are able to display unicode in URLs fine, so it's important to keep it that way to make it human-readable. URL encoding would break that effect.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.