Jump to content

Archived

This topic is now archived and is closed to further replies.

Firestorm ZERO

Question about collation

Recommended Posts

Can anyone give a brief run down about this? I kinda understand what it is for but I dunno how would it affect my php scripts. I plan on making my own scripts and would like it to be able to handle multi-langauges so do I have to use the utf8_general_ci? Also is there any pros/cons on collation? Like any security risks or such?

Share this post


Link to post
Share on other sites
Which collation you use depends on the languages you plan to support. If you just want to support latin languages (English, French, German, Spanish, etc) latin1_general_ci should be fine. If you want to support languages like Chinese with a completely different character set, then yes utf8 is a good choice.

The collation will take care of things like comparing strings (including sorting them), but you'll still have to have the appropriate character set support in PHP (set the collation to make comparisons work, use the multi-byte string functions like mb_substr()), and on the client's browser (usually you can assume they've configured their browser to support their native language, but you have to tell the browser what you're sending it).

There are several more considerations, but I am not too familiar with it in practice, I've never done it myself.

Share this post


Link to post
Share on other sites
I noticed that Wordpress and phpbb databases have it set at latin. I can type in multibyte characters and it does save it to the database and display properly when viewed because the charset of the html page is set to utf-8. But in the database, it is just scrambled letters.

I wondering more about this because I am planning to program my own CMS (to learn more about php+mysql). I just don't want to go back and mess with it or start from scratch again.

I also did some thinking and there would be a problem if I use utf-8 in the database. Like for username there is an ascii letter 'a' but there is also an unicode letter 'a' which would be a problem.

Share this post


Link to post
Share on other sites

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.