kensaggy Posted October 31, 2007 Share Posted October 31, 2007 Hello fello php programmers, I have a problem with mysql's collations and encoding regarding utf-8 and hebrew texts. I'm now working on a running site that has a table which contain a char(60) field. the fields and the table and collation latin1_swedish_ci for some reason - and i say "for some reason" because the site is running utf-8 and the text value is hebrew. now because of that, when someone enters text that comes close to 60 chars the utf-8 encoding breaks and i get the wierd ? symbol (? in a black dimond box). i need to change the collations to utf-8 unicode and somehow recode the values... i thought of building a php scripts that: 1. runs over all the tables 2. changes the collations 3. selected the field 4. converts the encoding and updates' the row. is this the correct way of solving this problem? all my attempts were a failure :-( (tried using both iconv, and mbstring_convert_encoding). PHP already recognizes the text i select from the table as UTF-8... what can/should i do to fix this? any idea's? Thanks, Ken. Quote Link to comment Share on other sites More sharing options...
fenway Posted October 31, 2007 Share Posted October 31, 2007 First, you need to use SHOW TABLE STATUS to see which fields are not utf8. The truncation occurs because you're dealing with multi-btye characters so 60 isn't really 60. Quote Link to comment Share on other sites More sharing options...
kensaggy Posted November 1, 2007 Author Share Posted November 1, 2007 i know which fields i want to change (or rather - which fields have text in them, the rest are numbers).. i have two table i need to deal with, the first tbl_topics, and tbl_posts. on _topics table only the topic_title fields is problematic with a defenition of char(60) - which like you said, isn't really 60... and _posts is a bit more problematic because it contains one field which is varchar(120) and one text field. now i don't really understand all this charset's and encoding buisness and i wish i did... but why is the text breaking up? i guess my question is : how can i fix the current rows and how can i change the table to avoid this in future rows? Thanks for your patience, Ken. Quote Link to comment Share on other sites More sharing options...
aschk Posted November 1, 2007 Share Posted November 1, 2007 You already have a problem that you can't reverse. When the text was originally entered the database kindly converted it all into your default charset and encoding that you were using at the time (latin1_swedish_ci), thus wiping all the extra information above the byte number that UTF-8 utilises. What has happened is that the conversion has been looking for characters and NOT the binary representations of such. So it's wiped out bytes above 2 (which contains all your hebrew characters), and now there is no way to retrieve that information. Or at least the above is what I perceived to have happened. I don't think you can reverse this, and as such will have to have all the information re-entered. Quote Link to comment Share on other sites More sharing options...
fenway Posted November 1, 2007 Share Posted November 1, 2007 You should convert everything to UTF8, not just some of the fields... that's something that mysql lets you do for good reason, but in the general case, it's not what you want. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.