Jump to content

Recommended Posts

Why not reference the mysql manual?  Each of those are setting things up with the mysql server so that it utilizes the utf8 character set, associated client connection parameters, and a collation (ie. sortation) that works well with utf8. 

BTW, about the only area you really need to consider options for in this case, is the collation.  This thread on the mysql forums does a fantastic job explaining the nature of the utf8_general_ci collation vs. using an alternative unicode collation.

 

http://forums.mysql.com/read.php?103,187048,188748#msg-188748

The manual isn't the easiest thing to read, and if someone could simply explain what each of them settings I listed do I'll be set.

 

I agree that in many cases the mysql manual isn't easy to read, but there is also google on each of the terms.  In broad strokes I already explained what they relate to, but the topic could and does make up a chapter in any decent mysql book. 

 

Do you understand:

 

-A character set?

-A collation?

-The mysql client protocol?

-what utf8 is?

 

The "defaults" simply apply those settings as default, so if you omit them on table creation for example, they will be applied to your new table automatically.

The set names modifies the client protocol explicitly.  The last 2 options do this by default, although PHP may override the default, so it may be necessary regardless.  You'd have to test to know for sure.  The reason this is necessary, is so that mysql knows that the data it is receiving from the client should already be in utf8 format, and no transcoding will be necessary.  Otherwise, if it things the data is using a different character set, it may try and modify it on the fly which often ends up being lossy/truncated.  In my experience it's more of an issue getting the data back out again, as without that, PHP turns all the utf8 characters into '?'.

You talk to mysql using client libraries.  That goes for a server running php, or the mysql command line client, or an application which makes calls to the client library.  The protocol sends queries and data, and gets data back.  So think of it as a 2 way pipe:

 

  Client -->

 

 

This pipe has an associated character set/collation. 

 

Utf8 is a unicode character set.  The idea is that every character in every language can be represented in unicode.  utf8 uses a variable number of bytes to store a character  (anywhere from 1-4 bytes) that does a neat trick so that it is typically byte compatible with latin1, which is in heavy use throughout the english speaking world. 

 

It is generally considered these days, that if your application needs to be multi-lingual out of the box, utf8 is a good character set to use, although like anything, flexibility often requires additional overhead.

 

1. Set up all databases, tables, and columns as utf8

2. Save all PHP scripts in UTF-8 without BOM.

3. Declare all HTML pages encodings as utf8

4. Set up PHP to MySQL connection as utf8

- for mysql_connect

mysql_connect('','','');
mysql_query("SET NAMES 'utf8'");

- for mysqli

$db = new mysqli('','','','');
$db->set_charset('utf8');

 

Is anything on that list unneeded or missing?

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.