Jump to content

Character encoding problem. Got strange characters in my database


Go to solution Solved by Jacques1,

Recommended Posts

I submit text to my MySQL database using a form on my site. When I look at the text in my database, sometimes, there are strange characters. The strange characters are caused by quotation marks, em dashes, apostrophes and foreign letters of the alphabet. I think that this only happens when the source of the text is a Windows program. I understand that this is a character encoding issue, but I don't fully understand the subject. I've spent the last few hours researching it, but it's only confused me.

 

My site uses UTF-8 encoding:

<meta http-equiv="content-type" content="text/xml; charset=utf-8" />

The collation of my database is utf8_general_ci.

 

My form looks like this:

<form action="" method="post">
</form>

As you can see, an accept-charset="utf-8" attribute has not been specified.

 

 

Questions

 

1) I am guessing that my problem is that the Windows characters are being misinterpreted by my UTF-8 setup. Is that correct?

 

2) If so, is there a way that I can safely convert the Windows characters to UTF-8 during the submission process?

 

3) Should I also specify an accept-charset="utf-8" attribute on the form?

 

4) When I paste the Windows text directly into my database without using the form, the characters save without turning into the strange characters. But they don't render properly on my site. Can't browsers identify Windows characters?

  • Solution

Since you didn't say anything about the character encoding of the database connection, my guess is that this is your problem.

 

When you send data from the browser to some database, there are at least 3 encodings involved:

  • The encoding of the user input in the HTTP request.
  • The encoding of the database connection (MySQL must know how to interpret the incoming data).
  • The encoding of the MySQL table.

You've appearently covered the first and the last stage. No, you don't need an accept-charset attribute; it's enough to declare the encoding of the entire document.

 

If you've forgotten the second stage, then MySQL might misunderstand the incoming UTF-8 data (the default encoding is Latin-1) and store nonsense in your table. How to declare the encoding of the connection depends on the database interface you use. If you're still using the old mysql_* functions, then it's mysql_set_charset():

mysql_set_charset('utf8');

Do not use a SET NAMES query. While this also changes the encoding, it doesn't update the encoding information in the MySQL API and can break important functions like mysql_real_escape_string() entirely.

  • Like 1

Thanks, Jacques1!

 

You were right.

 

I added:

mysql_set_charset("utf8", $connection);

and now the curly quotes, em dashes, etc., are submitted from my form to my database without being changed into strange characters.

 

Curly quotes, em dashes, etc., can now also be displayed on my web pages (before, they would render as small diamonds with question marks inside).

 

I used:

echo mysql_client_encoding($connection);

to check the character set of my MySQL connection. It said latin 1. Now it says utf 8.

 

I appreciate your help. ;D

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.