digitalecartoons Posted November 5, 2007 Share Posted November 5, 2007 Where does it say that on php.net, that iso8859 is the default charset for strings? Can't find anything about it. Quote Link to comment Share on other sites More sharing options...
btherl Posted November 5, 2007 Share Posted November 5, 2007 PHP doesn't have a default character set. It defaults to representing strings as 8 bit characters (256 possible values if you include null), which could represent many different character sets. You can even store utf8 strings as long as you remember that functions like strlen() won't count characters, they will count bytes. Quote Link to comment Share on other sites More sharing options...
digitalecartoons Posted November 5, 2007 Author Share Posted November 5, 2007 Ok, so the quote "PHP assumes that strings are ISO-8859-1" like someone told me isn't completely true? Can you explain this: when I have a php file which only echoes the é character, the browser defaults to iso-8859-1. When I change the browser charset to unicode it displays a question mark instead, but when I refresh the page or type in the php link again, it switches back to iso 8859 again. Even though I haven't specifically set it as such. Quote Link to comment Share on other sites More sharing options...
effigy Posted November 5, 2007 Share Posted November 5, 2007 It might be the work of AddDefaultCharset. Quote Link to comment Share on other sites More sharing options...
btherl Posted November 6, 2007 Share Posted November 6, 2007 Well there's a semantic issue here .. strings in php don't have any character set. But the final document produced by php has a character set. When the browser makes a request, the server (where php is running) will usually specify which character set to interpret the document in. And that's affected by what effigy mentioned. So what I would say is "PHP does not assume that strings are ISO-8859-1", BUT php does assume that its output is iso-8859-1, and will tell the browser so. That's nothing to do with strings internally in php, which can be in any character set. They are simply data. To show what I mean, consider the following code: $iso8859 = 'é'; $utf8 = iconv('ISO-8859-1', 'UTF-8', $iso8859); The first string is iso8859-1, but the second is utf8. Both are in the same script. And php neither knows nor cares what encoding each is in. That's up to you to take care of. But when you produce the final HTML document, the browser needs to know the encoding. That gets sent with the response headers (and can be changed manually by the user, as you described). Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.