doubledee Posted February 29, 2012 Share Posted February 29, 2012 Is it suicide to build a website and back-end database and NOT use UTF-8 for International support?! Right now I am using the default "latin1_swedish_ci". My website will be based in the U.S., but that is not to say that it won't grow or have foreign appeal... Suggestions?? Thanks, Debbie Quote Link to comment Share on other sites More sharing options...
requinix Posted February 29, 2012 Share Posted February 29, 2012 It's not suicide, but it is inadvisable to use limited character encodings (like Latin1). You might as well start using it now so you don't run the risk of having to switch to it later. Quote Link to comment Share on other sites More sharing options...
doubledee Posted March 1, 2012 Author Share Posted March 1, 2012 It's not suicide, but it is inadvisable to use limited character encodings (like Latin1). You might as well start using it now so you don't run the risk of having to switch to it later. So which "flavor" should I use? And how might that impact my current code base, which is limited, but still susceptible to changes... Debbie Quote Link to comment Share on other sites More sharing options...
requinix Posted March 1, 2012 Share Posted March 1, 2012 UTF-8 is fine. There are others but this one is the most common. As for how it impacts your code, I don't know. Functions like strlen() and substr() stop working correctly, HTML pages need the encoding specified, and your database tables have to change - for starters. Quote Link to comment Share on other sites More sharing options...
doubledee Posted March 1, 2012 Author Share Posted March 1, 2012 UTF-8 is fine. There are others but this one is the most common. As for how it impacts your code, I don't know. Functions like strlen() and substr() stop working correctly, What happens to them? How would I fix things? Any other PHP functions that would freak out (e.g. Prepared Statements)? HTML pages need the encoding specified, What do you mean? and your database tables have to change - for starters. Since any data in my database is plain English, how would converting to UTF-8 impact the data in my database? Thanks, Debbie Quote Link to comment Share on other sites More sharing options...
kicken Posted March 1, 2012 Share Posted March 1, 2012 What happens to them? Do you remember your previous post about bits, bytes, characters, and how we said a character could consist of multiple bytes? That is where the problem comes in. Functions like substr, strlen, strpos, several others assume that a character is just a single byte. Since UTF-8 is a multi-byte encoding they will cause problems if used. What you need to do is use a multi-byte aware alternative. PHP has an extension full of [m=mbstring]multi-byte aware functions[/m] that you can use. Any other PHP functions that would freak out (e.g. Prepared Statements)? For Mysql, you need to setup the connection between your server and your script to use UTF-8. You do this by issuing a query SET NAMES UTF8 after you connect. HTML pages need the encoding specified, What do you mean? For your browser to interpret the page as UTF-8 you have to tell it that the page is encoded in UTF-8 by sending the appropriate header. I believe it is: header('Content-type: text/html; charset=utf8'); Since any data in my database is plain English, how would converting to UTF-8 impact the data in my database? If your data is just plain english then there is no need to do any kind of conversion on the actual data, as it is the same between latin1 and utf8. You do need to alter your table though so that mysql knows your storing utf8 data in the column and not latin1. Quote Link to comment Share on other sites More sharing options...
doubledee Posted March 2, 2012 Author Share Posted March 2, 2012 kicken, Is all of the stuff you mention above worth the extra hassle?! :-\ And if I didn't want to use UTF-8, what would be the best "Single Byte" Character Set?? Thanks, Debbie Quote Link to comment Share on other sites More sharing options...
requinix Posted March 2, 2012 Share Posted March 2, 2012 In my opinion yes, it is worth it. Windows-1252 is probably the best single-byte encoding. It supports standard ASCII, naturally, and quite a few accented characters - even more than ISO 8859-1/Latin1. Quote Link to comment Share on other sites More sharing options...
doubledee Posted March 2, 2012 Author Share Posted March 2, 2012 What happens to them? Do you remember your previous post about bits, bytes, characters, and how we said a character could consist of multiple bytes? That is where the problem comes in. Functions like substr, strlen, strpos, several others assume that a character is just a single byte. Since UTF-8 is a multi-byte encoding they will cause problems if used. What you need to do is use a multi-byte aware alternative. PHP has an extension full of [m=mbstring]multi-byte aware functions[/m] that you can use. How many Bytes for each Character in UTF-8? Is that constant, or can it change? Ummm, this looks like a pain-in-the-ass... http://us3.php.net/manual/en/mbstring.installation.php Debbie Quote Link to comment Share on other sites More sharing options...
requinix Posted March 2, 2012 Share Posted March 2, 2012 How many Bytes for each Character in UTF-8? One to four. Standard ASCII stuff (like you'd find on an en-us keyboard) is one byte and just about every other "common" character is two or three. Ummm, this looks like a pain-in-the-ass... http://us3.php.net/manual/en/mbstring.installation.php That's for if you were compiling PHP yourself. Somehow I doubt you are. Just enable the extension if it isn't already. Quote Link to comment Share on other sites More sharing options...
doubledee Posted March 2, 2012 Author Share Posted March 2, 2012 Ummm, this looks like a pain-in-the-ass... http://us3.php.net/manual/en/mbstring.installation.php That's for if you were compiling PHP yourself. Somehow I doubt you are. Just enable the extension if it isn't already. Sorry for being a weeny, but I keep biting off more and more and I'm overwhelmed on my website project?! So do you think I'll have to pay GoDaddy money to rebuild PHP for me, or can I just write a lne of code to use the multi-byte thingy you're talking about? From what I read, it sounded like I had to "install" that feature... I'd be willing to switch if I thought it wasn't a lot of work. (I'm all for making my site better, but you guys keep telling em to do more and my simple site is now becoming Amazon.com?!) Thanks, Debbie Quote Link to comment Share on other sites More sharing options...
requinix Posted March 2, 2012 Share Posted March 2, 2012 Is the extension not enabled already? It's one of the most common extensions and thus often is. Quote Link to comment Share on other sites More sharing options...
kicken Posted March 2, 2012 Share Posted March 2, 2012 So do you think I'll have to pay GoDaddy money to rebuild PHP for me, or can I just write a lne of code to use the multi-byte thingy you're talking about? Most hosting providers enable a range of extensions, especially popular ones like this. I happen to have a site on GoDaddy so I can tell you they do have it enabled. You can always check on your own though by simply looking at the output of phpinfo or using function_exists on one of the functions provided by the extension in question. From what I read, it sounded like I had to "install" that feature... It is a feature that has to be "installed", but since your not the one doing the PHP install (GoDaddy is), it's not something you need to really worry about. All you have to do is check if it is available. If by chance it were not available, then you'd have to contact godaddy about getting it setup (or find a different host that does have it). Quote Link to comment Share on other sites More sharing options...
doubledee Posted March 3, 2012 Author Share Posted March 3, 2012 Any other PHP functions that would freak out (e.g. Prepared Statements)? For Mysql, you need to setup the connection between your server and your script to use UTF-8. You do this by issuing a query SET NAMES UTF8 after you connect. Here is my MySQL connection script... <?php // This file contains the database access information. // It also establishes a connection to MySQL and selects the database. // Make the connection. $dbc = @mysqli_connect(DB_HOST, DB_USER, DB_PASSWORD, DB_NAME) OR die('Could not connect to database. Contact System Administrator.'); // Define Character Set. mysqli_set_charset($dbc, 'utf8'); ?> Will that last line of code make it so my database is now UTF-8 ready? Debbie Quote Link to comment Share on other sites More sharing options...
doubledee Posted March 3, 2012 Author Share Posted March 3, 2012 Most hosting providers enable a range of extensions, especially popular ones like this. I happen to have a site on GoDaddy so I can tell you they do have it enabled. You can always check on your own though by simply looking at the output of phpinfo or using function_exists on one of the functions provided by the extension in question. Here is what I see on my VPS... mbstring Multibyte Support enabled Multibyte string engine libmbfl Multibyte (japanese) regex support enabled Multibyte regex (oniguruma) version 3.7.1 mbstring extension makes use of "streamable kanji code filter and converter", which is distributed under the GNU Lesser General Public License version 2.1. Directive Local Value Master Value mbstring.detect_order no value no value mbstring.encoding_translation Off Off mbstring.func_overload 0 0 mbstring.http_input pass pass mbstring.http_output pass pass mbstring.internal_encoding no value no value mbstring.language neutral neutral mbstring.strict_detection Off Off mbstring.substitute_character no value no value I'm not sure if that means the Multi-Byte Functions will work or not?! It is "enabled" but some of the things above are turned off?! Debbie Quote Link to comment Share on other sites More sharing options...
requinix Posted March 3, 2012 Share Posted March 3, 2012 Multibyte Support enabled There's your answer. Quote Link to comment Share on other sites More sharing options...
doubledee Posted March 4, 2012 Author Share Posted March 4, 2012 Multibyte Support enabled There's your answer. What about... mbstring.encoding_translation Off Off Debbie Quote Link to comment Share on other sites More sharing options...
requinix Posted March 4, 2012 Share Posted March 4, 2012 Do you need "transparent character encoding filter for the incoming HTTP queries, [to perform] detection and conversion of the input encoding to the internal character encoding"? The manual Quote Link to comment Share on other sites More sharing options...
fenway Posted March 5, 2012 Share Posted March 5, 2012 UTF-8 is highly non-trivial to support -- if you don't need it, then avoid the headache. Quote Link to comment Share on other sites More sharing options...
doubledee Posted March 5, 2012 Author Share Posted March 5, 2012 UTF-8 is highly non-trivial to support -- if you don't need it, then avoid the headache. Yep, that is the conclusion that I've come to this weekend. It just shouldn't be high up on my Feature Priority List. Thanks, Debbie Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.