Jump to content

Collations, Charset's, Encoding and everything in between


Recommended Posts

Hello fello php programmers,

 

I have a problem with mysql's collations and encoding regarding utf-8 and hebrew texts.

 

I'm now working on a running site that has a table which contain a char(60) field.

the fields and the table and collation latin1_swedish_ci for some reason - and i say "for some reason" because the site is running utf-8 and the text value is hebrew.

 

now because of that, when someone enters text that comes close to 60 chars the utf-8 encoding breaks and i get the wierd ? symbol (? in a black dimond box).

 

i need to change the collations to utf-8 unicode and somehow recode the values... i thought of building a php scripts that:

1. runs over all the tables

2. changes the collations

3. selected the field

4. converts the encoding and updates' the row.

 

is this the correct way of solving this problem?

 

all my attempts were a failure :-( (tried using both iconv, and mbstring_convert_encoding). PHP already recognizes the text i select from the table as UTF-8...

 

what can/should i do to fix this? any idea's?

 

Thanks,

Ken.

i know which fields i want to change (or rather - which fields have text in them, the rest are numbers)..

 

i have two table i need to deal with, the first tbl_topics, and tbl_posts. on _topics table only the topic_title fields is problematic with a defenition of char(60) - which like you said, isn't really 60...

 

and _posts is a bit more problematic because it contains one field which is varchar(120) and one text field.

 

now i don't really understand all this charset's and encoding buisness and i wish i did...

but why is the text breaking up?

i guess my question is : how can i fix the current rows and how can i change the table to avoid this in future rows?

 

Thanks for your patience,

Ken.

You already have a problem that you can't reverse. When the text was originally entered the database kindly converted it all into your default charset and encoding that you were using at the time (latin1_swedish_ci), thus wiping all the extra information above the byte number that UTF-8 utilises.

What has happened is that the conversion has been looking for characters and NOT the binary representations of such. So it's wiped out bytes above 2 (which contains all your hebrew characters), and now there is no way to retrieve that information.

 

Or at least the above is what I perceived to have happened. I don't think you can reverse this, and as such will have to have all the information re-entered.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.