Jump to content

UTF8


doubledee

Recommended Posts

Is it suicide to build a website and back-end database and NOT use UTF-8 for International support?!  :confused:

 

Right now I am using the default "latin1_swedish_ci".

 

My website will be based in the U.S., but that is not to say that it won't grow or have foreign appeal...

 

Suggestions??

 

Thanks,

 

 

Debbie

 

Link to comment
Share on other sites

It's not suicide, but it is inadvisable to use limited character encodings (like Latin1). You might as well start using it now so you don't run the risk of having to switch to it later.

 

So which "flavor" should I use?

 

And how might that impact my current code base, which is limited, but still susceptible to changes...

 

 

Debbie

 

Link to comment
Share on other sites

UTF-8 is fine. There are others but this one is the most common.

 

As for how it impacts your code, I don't know. Functions like strlen() and substr() stop working correctly, HTML pages need the encoding specified, and your database tables have to change - for starters.

Link to comment
Share on other sites

UTF-8 is fine. There are others but this one is the most common.

 

As for how it impacts your code, I don't know. Functions like strlen() and substr() stop working correctly,

 

What happens to them?

 

How would I fix things?

 

Any other PHP functions that would freak out (e.g. Prepared Statements)?

 

 

HTML pages need the encoding specified,

 

What do you mean?

 

 

and your database tables have to change - for starters.

 

Since any data in my database is plain English, how would converting to UTF-8 impact the data in my database?

 

Thanks,

 

 

Debbie

 

Link to comment
Share on other sites

What happens to them?

 

Do you remember your previous post about bits, bytes, characters, and how we said a character could consist of multiple bytes?  That is where the problem comes in.  Functions like substr, strlen, strpos, several others assume that a character is just a single byte.  Since UTF-8 is a multi-byte encoding they will cause problems if used.  What you need to do is use a multi-byte aware alternative.  PHP has an extension full of [m=mbstring]multi-byte aware functions[/m] that you can use.

 

Any other PHP functions that would freak out (e.g. Prepared Statements)?

For Mysql, you need to setup the connection between your server and your script to use UTF-8.  You do this by issuing a query SET NAMES UTF8 after you connect.

 

HTML pages need the encoding specified,

What do you mean?

 

For your browser to interpret the page as UTF-8 you have to tell it that the page is encoded in UTF-8 by sending the appropriate header.  I believe it is:

header('Content-type: text/html; charset=utf8');

 

 

 

Since any data in my database is plain English, how would converting to UTF-8 impact the data in my database?

 

If your data is just plain english then there is no need to do any kind of conversion on the actual data, as it is the same between latin1 and utf8.  You do need to alter your table though so that mysql knows your storing utf8 data in the column and not latin1. 

 

Link to comment
Share on other sites

What happens to them?

 

Do you remember your previous post about bits, bytes, characters, and how we said a character could consist of multiple bytes?  That is where the problem comes in.  Functions like substr, strlen, strpos, several others assume that a character is just a single byte.  Since UTF-8 is a multi-byte encoding they will cause problems if used.  What you need to do is use a multi-byte aware alternative.  PHP has an extension full of [m=mbstring]multi-byte aware functions[/m] that you can use.

 

How many Bytes for each Character in UTF-8?

 

Is that constant, or can it change?

 

 

Ummm, this looks like a pain-in-the-ass...

http://us3.php.net/manual/en/mbstring.installation.php

 

 

 

Debbie

 

Link to comment
Share on other sites

How many Bytes for each Character in UTF-8?

One to four. Standard ASCII stuff (like you'd find on an en-us keyboard) is one byte and just about every other "common" character is two or three.

 

Ummm, this looks like a pain-in-the-ass...

http://us3.php.net/manual/en/mbstring.installation.php

That's for if you were compiling PHP yourself. Somehow I doubt you are. Just enable the extension if it isn't already.

Link to comment
Share on other sites

Ummm, this looks like a pain-in-the-ass...

http://us3.php.net/manual/en/mbstring.installation.php

That's for if you were compiling PHP yourself. Somehow I doubt you are. Just enable the extension if it isn't already.

 

Sorry for being a weeny, but I keep biting off more and more and I'm overwhelmed on my website project?!  :o

 

So do you think I'll have to pay GoDaddy money to rebuild PHP for me, or can I just write a lne of code to use the multi-byte thingy you're talking about?

 

From what I read, it sounded like I had to "install" that feature...

 

I'd be willing to switch if I thought it wasn't a lot of work.  (I'm all for making my site better, but you guys keep telling em to do more and my simple site is now becoming Amazon.com?!)

 

Thanks,

 

 

Debbie

 

Link to comment
Share on other sites

So do you think I'll have to pay GoDaddy money to rebuild PHP for me, or can I just write a lne of code to use the multi-byte thingy you're talking about?

 

Most hosting providers enable a range of extensions, especially popular ones like this.  I happen to have a site on GoDaddy so I can tell you they do have it enabled.  You can always check on your own though by simply looking at the output of phpinfo or using function_exists on one of the functions provided by the extension in question.

 

 

From what I read, it sounded like I had to "install" that feature...

It is a feature that has to be "installed", but since your not the one doing the PHP install (GoDaddy is), it's not something you need to really worry about.  All you have to do is check if it is available.  If by chance it were not available, then you'd have to contact godaddy about getting it setup (or find a different host that does have it).

 

 

Link to comment
Share on other sites

Any other PHP functions that would freak out (e.g. Prepared Statements)?

 

For Mysql, you need to setup the connection between your server and your script to use UTF-8.  You do this by issuing a query SET NAMES UTF8 after you connect.

 

Here is my MySQL connection script...

<?php

// This file contains the database access information.
// It also establishes a connection to MySQL and selects the database.

// Make the connection.
$dbc = @mysqli_connect(DB_HOST, DB_USER, DB_PASSWORD, DB_NAME)
			OR die('Could not connect to database.  Contact System Administrator.');

// Define Character Set.
mysqli_set_charset($dbc, 'utf8');

?>

 

Will that last line of code make it so my database is now UTF-8 ready?

 

 

Debbie

 

Link to comment
Share on other sites

Most hosting providers enable a range of extensions, especially popular ones like this.  I happen to have a site on GoDaddy so I can tell you they do have it enabled.  You can always check on your own though by simply looking at the output of phpinfo or using function_exists on one of the functions provided by the extension in question.

 

Here is what I see on my VPS...

mbstring
Multibyte Support			enabled
Multibyte string engine 		libmbfl
Multibyte (japanese) regex support 	enabled
Multibyte regex (oniguruma) version 	3.7.1

mbstring extension makes use of "streamable kanji code filter and converter", which is distributed under the GNU Lesser General Public License version 2.1.

Directive			Local Value	Master Value
mbstring.detect_order		no value	no value
mbstring.encoding_translation	Off		Off
mbstring.func_overload		0		0
mbstring.http_input		pass		pass
mbstring.http_output		pass		pass
mbstring.internal_encoding	no value	no value
mbstring.language		neutral		neutral
mbstring.strict_detection	Off		Off
mbstring.substitute_character	no value	no value

 

 

I'm not sure if that means the Multi-Byte Functions will work or not?!

 

It is "enabled" but some of the things above are turned off?!  :shrug:

 

 

Debbie

 

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.