Fluoresce Posted August 29, 2014 Share Posted August 29, 2014 I submit text to my MySQL database using a form on my site. When I look at the text in my database, sometimes, there are strange characters. The strange characters are caused by quotation marks, em dashes, apostrophes and foreign letters of the alphabet. I think that this only happens when the source of the text is a Windows program. I understand that this is a character encoding issue, but I don't fully understand the subject. I've spent the last few hours researching it, but it's only confused me. My site uses UTF-8 encoding: <meta http-equiv="content-type" content="text/xml; charset=utf-8" /> The collation of my database is utf8_general_ci. My form looks like this: <form action="" method="post"> </form> As you can see, an accept-charset="utf-8" attribute has not been specified. Questions 1) I am guessing that my problem is that the Windows characters are being misinterpreted by my UTF-8 setup. Is that correct? 2) If so, is there a way that I can safely convert the Windows characters to UTF-8 during the submission process? 3) Should I also specify an accept-charset="utf-8" attribute on the form? 4) When I paste the Windows text directly into my database without using the form, the characters save without turning into the strange characters. But they don't render properly on my site. Can't browsers identify Windows characters? Link to comment https://forums.phpfreaks.com/topic/290726-character-encoding-problem-got-strange-characters-in-my-database/ Share on other sites More sharing options...
Jacques1 Posted August 29, 2014 Share Posted August 29, 2014 Since you didn't say anything about the character encoding of the database connection, my guess is that this is your problem. When you send data from the browser to some database, there are at least 3 encodings involved: The encoding of the user input in the HTTP request. The encoding of the database connection (MySQL must know how to interpret the incoming data). The encoding of the MySQL table. You've appearently covered the first and the last stage. No, you don't need an accept-charset attribute; it's enough to declare the encoding of the entire document. If you've forgotten the second stage, then MySQL might misunderstand the incoming UTF-8 data (the default encoding is Latin-1) and store nonsense in your table. How to declare the encoding of the connection depends on the database interface you use. If you're still using the old mysql_* functions, then it's mysql_set_charset(): mysql_set_charset('utf8'); Do not use a SET NAMES query. While this also changes the encoding, it doesn't update the encoding information in the MySQL API and can break important functions like mysql_real_escape_string() entirely. Link to comment https://forums.phpfreaks.com/topic/290726-character-encoding-problem-got-strange-characters-in-my-database/#findComment-1489286 Share on other sites More sharing options...
Fluoresce Posted August 30, 2014 Author Share Posted August 30, 2014 Thanks, Jacques1! You were right. I added: mysql_set_charset("utf8", $connection); and now the curly quotes, em dashes, etc., are submitted from my form to my database without being changed into strange characters. Curly quotes, em dashes, etc., can now also be displayed on my web pages (before, they would render as small diamonds with question marks inside). I used: echo mysql_client_encoding($connection); to check the character set of my MySQL connection. It said latin 1. Now it says utf 8. I appreciate your help. Link to comment https://forums.phpfreaks.com/topic/290726-character-encoding-problem-got-strange-characters-in-my-database/#findComment-1489348 Share on other sites More sharing options...
CroNiX Posted August 30, 2014 Share Posted August 30, 2014 You should convert your tables to utf8 as well. Link to comment https://forums.phpfreaks.com/topic/290726-character-encoding-problem-got-strange-characters-in-my-database/#findComment-1489380 Share on other sites More sharing options...
Jacques1 Posted August 30, 2014 Share Posted August 30, 2014 He did. The collation of my database is utf8_general_ci. Link to comment https://forums.phpfreaks.com/topic/290726-character-encoding-problem-got-strange-characters-in-my-database/#findComment-1489382 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.