Jump to content

collations: utf-8 or latin1?


HeaDmiLe

Recommended Posts

I don't think Latin1 (ISO8859-1) is faster than UTF-8.

 

ISO8859-1 only use 1 byte per character. It can save space in database and strlen() will work fine. It can display most America, Western Europe and Africa language.

You can see the characters list here :

http://en.wikipedia.org/wiki/ISO_8859-1

 

UTF-8 may use more than 1 byte if needed per characters (variable length). It may take a lot more space in database than latin1 if you use a lot of japanese/chinese. You can store almost any language in the same database. It may not work (if you use any special chars) with strlen unless you utf8decode() it or use mb_strlen().

 

The first 127 characters of these charset are ASCII and are the same, so if you use htmlentities() or only use the 127 first characters you don't need any special function to convert from one charset to the others. If you don't use any special charaters they are the same (and you can't make the difference between them).

 

If you are sure to never need more than America, Western Europe and Africa language (english, french, spanish, ...) go for ISO8859-1

Anything else UTF-8.

 

I don't know Croatian but from what i have read it didn't seem to fit in ISO8859-1. (Wikipedia)

 

latin1 handles the standard ASCII character set... nothing more.

I think you are wrong here it can handle 256 characters including most special characters you can see with latin language like french.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.