jamina1 Posted December 9, 2008 Share Posted December 9, 2008 Hi guys - I have a problem with our website and various encodings. We have a chinese and an english website. We use a program called Zen-Cart for the english site and we just copied over the pertinent "info" pages and translated them into chinese instead of duplicating our entire product database. The chinese pages are in utf-8. The english pages are in iso-8859-1. Problem being whenever someone enters chinese into a form, it is processed by the zencart scripts, thus the encoding is swapped and the characters get screwed. Easy solution, convert the english page to utf-8 (which I want to do!) Problem is that then and other characters like the R, C and TM symbols start showing up funny on the now-UTF-encoded english pages. Is there anyway to do a SELECT to find these weird entries and thus fix them BEFORE we change our page to unicode so that my boss doesn't freak out that half our pages will be messed up. I know unicode will fix all our problems, I just need to figure out how to ensure our database is COMPLETELY unicode compliant (data was entered as english/lating/iso-whatever encoding) So after all this rambling I need to know 1) Is there a way to find the entries in the database that are non compliant with UTF-8 so we can fix them? 2) Is there a way to convert the database and its contents (not just the coallation and charset) to utf-8? I've tried UPDATE $table SET $column=CONVERT(CONVERT(CONVERT($column USING latin1) USING binary) USING utf8) and ALTER TABLE $table DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci to no effect. Quote Link to comment https://forums.phpfreaks.com/topic/136253-mysql-unicode-problems/ Share on other sites More sharing options...
fenway Posted December 9, 2008 Share Posted December 9, 2008 I'm not sure I understand... you have to specify the encoding of the connection, too. Quote Link to comment https://forums.phpfreaks.com/topic/136253-mysql-unicode-problems/#findComment-710752 Share on other sites More sharing options...
jamina1 Posted December 9, 2008 Author Share Posted December 9, 2008 I'm not sure I understand... you have to specify the encoding of the connection, too. Here's what we have. We have a database that was entered in latin encoding that is now UTF-8 Charset, and utf collation, but the data in the fields is the same. (when we changed the charset and collation the data was not converted) The database connection is in utf-8, but some special characters, like non-breaking spaces, and other weird html entities show up as garbled text, diamond ???'s or just ?'s once we switch the pages they're brought up on from iso-5589-1 to utf-8. I need to know, is there a way to find these entries so they can be fixed, or is there a way to convert the data entirely? It isn't a problem if I *leave* the english pages as iso-5589-1 encoded, but that leaves my other more pertinent email problem outstanding, which the only solution I've found is to convert the english pages to utf-8.. Quote Link to comment https://forums.phpfreaks.com/topic/136253-mysql-unicode-problems/#findComment-710756 Share on other sites More sharing options...
fenway Posted December 9, 2008 Share Posted December 9, 2008 Is the database correct and the output wrong? or are both wrong? Quote Link to comment https://forums.phpfreaks.com/topic/136253-mysql-unicode-problems/#findComment-710852 Share on other sites More sharing options...
jamina1 Posted December 9, 2008 Author Share Posted December 9, 2008 Is the database correct and the output wrong? or are both wrong? Its in plain HTML in the database. It's when it's read out that it becomes wrong. Say you have an in there. In the database it says or © or ® or whatever. When it reads out in iso-5589-1 encoding on an html page, its a big black square with a ? in it. If I change the page to UTF-8, its just a ?. We need to rectify the problems with the data in the database ( , registered symbols, tm symbols, r symbols weren't encoded properly when they were inserted... or something) so that I can make our pages uniformly UTF-8 encoding without the jibberish showing up and my boss freaking out. If I go in and edit it using UTF-8 it sorts itself out, but I just need to know if there's a way to single these entries out, or wholly convert all the rows (in a database of about 30k rows) without just stumbling across them as I browse our products. Quote Link to comment https://forums.phpfreaks.com/topic/136253-mysql-unicode-problems/#findComment-710875 Share on other sites More sharing options...
fenway Posted December 9, 2008 Share Posted December 9, 2008 And you've used "SET NAMES" in your mysql connection from php, and defined everything else correctly w.r.t encoding/ Quote Link to comment https://forums.phpfreaks.com/topic/136253-mysql-unicode-problems/#findComment-710877 Share on other sites More sharing options...
jamina1 Posted December 9, 2008 Author Share Posted December 9, 2008 And you've used "SET NAMES" in your mysql connection from php, and defined everything else correctly w.r.t encoding/ Yes, it just sort of changes what goes on with the improperly entered data. See here on this page, the 2nd bullet point: http://www.testequipmentconnection.com/products/36938 And here on this page, where the page is UTF-8 encoded: http://70.86.88.202/~tec/products/36938 This is what has happened to the database. It was created, and filled with data completely in English - much of it copy and pasted from other sources. Quite a bit of it contains HTML code. About 6 months ago, they decided it needed to be able to support chinese characters, which since it was currently in Latin encoding, it wasn't going to be able to do. We changed the charset and coallation on the database, but I don't think it converted the data. New data goes in as UTF-8 so it looks right. Old data is sort of a tossup with special characters. Quote Link to comment https://forums.phpfreaks.com/topic/136253-mysql-unicode-problems/#findComment-710892 Share on other sites More sharing options...
fenway Posted December 11, 2008 Share Posted December 11, 2008 Hmmm... what does the database field contains for such values with HTML entities? Quote Link to comment https://forums.phpfreaks.com/topic/136253-mysql-unicode-problems/#findComment-712562 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.