Jump to content

Special language characters doesn't display correctly


erikla
Go to solution Solved by Jacques1,

Recommended Posts

I have a problem with the three special characters in the Danish language: æ, ø and å. I have created a database with a simple table manually within phpMyAdmin, and I have added contents, which is text with the letters mentioned. Everythings looks correct in here. But when I get the content of the database written out via a php document, these three letters are replaced with a box sign. In phpMyAdmin I chose "utf8_danish_ci" as collation. And here is the code of the php document writing out the content of the database table, named "lille_tabel", with the password written with stars.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
</head>

<body>

<?php
$server = "localhost";
$brugernavn = "root";
$kodeord = "************";
$db = "lille";

mysql_connect($server,$brugernavn,$kodeord) or die(mysql_error());

echo "Forbundet til MySQL server<br/>";

mysql_select_db($db) or die(mysql_error());

echo "Forbundet til Databasen<br/>";

$data = mysql_query("SELECT * FROM lilletabel") or die(mysql_error());

while ($info = mysql_fetch_array($data))
  {
echo "ID: ".$info['id']."<br/>";
echo "Navn: ".$info['navn']."<br/>";
echo "Tekst: ".$info['tekst']."<br/>"."<br/>";
  }

?>


</body>
</html>

By the way: I use Dreamweaver. I hope someone have suggestions to pinpoint the problem ...

 

Regards,

 

Erik

Link to comment
Share on other sites

1. Make sure the character set on the table is UTF-8: CREATE TABLE ... CHARACTER SET UTF-8

 

2. Make sure the HTML document is UTF-8 - You've got that: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

 

3. Make sure the database connection is using UTF-8: mysql_query("Set names 'utf8'");

 

NOTE: The (PHP) mysql library has been deprecated. All new code should be using the mysqli library (which sets the character set differently).

  • Like 1
Link to comment
Share on other sites

Stop right there, Erik.

 

That code hopelessly outdated and uses many of the anti-patterns from the early days of PHP. I don't know if Dreamweaver has shitty code templates, or if you've learned this from some very bad tutorial, but this is far from proper PHP and HTML as we use it today. It's (bad) 90s code.

 

You've declared the document as XHTML, but you're serving it as plain HTML. This is wrong and makes no sense. Why do you even want XHTML? It may have been all the rage 10 years ago, but that trend is long over. Most people never really understood how to use it, it requires a lot of discipline (no markup errors allowed), and it doesn't have all the great new elements of modern HTML. Unless you have special reasons for why you need XHTML, you shouldn't use it. Go with plain HTML. The current revision is HTML5.

 

As David already said, the mysql_* functions are obsolete since more than 10 years and will be removed in one of the next PHP releases. Nowadays, we use PDO or MySQLi. I don't recommend MySQLi, because it can only be used for the MySQL database system and tends to be very cumbersome. PDO, on the other hand, is a universal interface for all mainstream database systems and is much more user-friendly.

 

You're using die(mysql_error()) to print all errors on the website for the whole world to see. This, again, may have been acceptable back in the 90s when security was less important and users didn't mind being greeted with strange PHP errors. Today, security is absolutely crucial, and if people see error messages on your site, they will be very irritated. In modern application, errors need to be handled properly.

 

And this SELECT * stuff really must die. Always select the specific columns you need.

 

I'm not blaming you for not knowing this. But you definitely need to throw away Dreamweaver and use proper tools and proper resources. It's usually a good idea to start with plain HTML before you jump into PHP. An excellent resource is the Mozilla Developer Network. Make sure to keep away from “w3schools” and other fishy websites.

 

When you start with PHP, get used to the manual as early as possible. It contains many important information. For example, it would have warned you that the mysql_* functions are obsolete.

 

 

 

3. Make sure the database connection is using UTF-8: mysql_query("Set names 'utf8'");

 

Do not use SET NAMES. This is a major security vulnerability, because it silently changes the character encoding without notifying PHP. As a result, critical functions like mysql_real_escape_string() may no longer work, leaving the application wide open to SQL injection attacks.

 

The proper way to change the connection encoding is to call mysql_set_charset()

 

Fortunately, most modern encodings are ASCII-compatible, so you usually get away with this bug. But I wouldn't try it.

  • Like 1
Link to comment
Share on other sites

Do not use SET NAMES. This is a major security vulnerability, because it silently changes the character encoding without notifying PHP. As a result, critical functions like mysql_real_escape_string() may no longer work, leaving the application wide open to SQL injection attacks.

 

The proper way to change the connection encoding is to call mysql_set_charset().

I stand corrected. I swear I don't remember seeing that function before. For some reason, I thought the mysql library didn't have a specific charset function, but I was switching to mysqli when I first had the need for it, so I guess I missed it.

  • Like 1
Link to comment
Share on other sites

Superb explanations from both of you (David and Jacques)! David: Your line mysql_query("Set names 'utf8'"); did the trick! Now the Danish letters display correctly. I am perfectly aware, that the code is not appropriate. Actually I got the advice to use MySQL PDO from this site about a month ago. The reason why I used the above implementation is because I found it in a tutorial elsewhere. The situation is that I need to take one step at a time in my learning process. Firstly I used quite some time to find Xampp to install phpMyAdmin making the appropriate settings. I need to make things work locally before I upload things. Now I realize my local version of php/MySQL is working properly, my next step will be to change the php code to MySQL PDO. I have already read the article about "prepared statements". Besides I have begun using Dreamweaver, so this is new too. I have a license for an Adobe bundle, including Dreamweaver, so I better learn to use it now. So far I have been using a not so wellknown Web Editor named Namo. Mostly I have created my webpages via the WYSIWYG part, with some knowledge about the HTML code itself - but not too much. I never considered the thing about HTML vs. XHTML that you write about, Jacques. I just used an almost empty page settings from Dreamweaver. At least it did include the meta tag with the charset utf-8 part. Now that I know how to handle things in phpMyAdmin and Dreamweaver, I will take the next step and change to MySQL PDO. Maybe the charset=utf-8 is handled differently in this environment? Anyway I will probably return with more questions later. Again big thanks for the advice. I appreciate it! Great site!!

 

Erik

Edited by erikla
Link to comment
Share on other sites

Defining the character encoding with PDO is a breeze, because you can do it directly when establishing the connection:

<?php

$database = new PDO('mysql:host=YOURHOST;dbname=YOURDB;charset=utf8mb4', 'YOURUSER', 'YOURPASSWORD', array(
    // use actual prepared statements instead of client-side escaping
    PDO::ATTR_EMULATE_PREPARES => false,
    // throw an exception in case of an error
    PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION,
    // fetch associative arrays by default
    PDO::ATTR_DEFAULT_FETCH_MODE => PDO::FETCH_ASSOC,
));

Note that what MySQL calls “utf8” is not UTF-8. It only covers a subset of Unicode, namely the Basic Multilingual Plane. For full Unicode, you need utf8mb4.

 

Also note that it's not recommended to rely solely on a meta element to declare the character encoding of an HTML document. This is actually somewhat absurd: You ask the browser to parse the document in order to find out how to parse the document. This does work, but only under certain circumstances. For example, the element must be present within the first 1024 bytes of the document.

 

A much more robust solution is to declare the encoding in the HTTP headers so that the browser knows it beforehand. You should still add a meta element in case the document gets stored offline.

 

So the proper way would look something like this:

<?php

// This should be done by the webserver, not PHP.
header('Content-Type: text/html; charset=utf-8');

?>
<!DOCTYPE html>
<html lang="da">
    <head>
        <meta charset="utf-8">
        <title>My site</title>
    </head>
    <body>
        <p>Welcome</p>
    </body>
</html>
  • Like 1
Link to comment
Share on other sites

Thanks Jacques! I will take notice of both points: utf8mba and to include utf-8 in the header. Regarding the first, I guess you mean that utf8mb4_Danish_ci is a better choice than utf8_Danish_ci as charset when defining the tables manually within phpMyAdmin, right? Also thanks for the details about the PDO code.
 
Erik

Link to comment
Share on other sites

  • Solution

Not quite, those are several different things.

 

First of all, a string column in MySQL has a character set/encoding and a collation. The character set/encoding defines which characters are available and how they're encoded (mapped to bytes). In your case, you would use utf8mb4 which is Unicode with the UTF-8 encoding. The collation is a set of rules for how two strings should be compared. If you want a case-insensitive comparison with Danish-specific rules, you would indeed use utf8mb4_danish_ci. In any case, it's important to understand the difference between collation and character set/encoding.

 

On the other hand, there's the character set/encoding of the connection. This doesn't have to match the one of the column. For example, you can access a utf8mb4 column via a latin1 connection. MySQL will automatically convert the data if possible.

 

The connection encoding is very important and a cause of many errors. When you had issues with Danish characters, I'm pretty sure you indeed had a latin1 connection, because that's the default. The problem is that latin1 only supports 256 characters, so all Unicode characters outside of this small subset cannot be represented.

 

Long story short:

  • You have to define the character set/encoding for your stored data. This is done with the CHARACTER SET keyword at database-level, at table-level or at column-level.
  • You need to specify the collation for your stored data. This is done with the COLLATE keyword.
  • You have to set the character set/encoding of your database connection. This depends on the database extension you're using: In PDO, you use the charset attribute in the constructor. In MySQLi, you call the set_charset() method. And in the old MySQL extension, you call mysql_set_charset(). Like I already said above, you must not execute a SET NAMES query. While this does change the encoding as well, it breaks the inner workings of the PHP database functions and can lead to security vulnerabilities. Never use SET NAMES except when you're sitting in front of a MySQL console and manually enter the queries.
Link to comment
Share on other sites

Thanks a lot! I see one needs to make sure about characters issues on at least three levels. Now my initial problems occurred due to having manually created a database within phpMyAdmin. I wanted to upload a screenshot on how the structure of my table in phpMyAdmin is with the proper settings, but apparently users are not allowed to upload images in this Forum.

 

I have now rewritten my bad code above now using MySQL PDO. What this page does is to connect to the database "haka" and displaying the content of the table "gaestebog". I hope my character settings are fine now - and mysql code too?

<!DOCTYPE html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf8mb4" />
<title>Untitled Document</title>
</head>

<body>

<?php

header('Content-Type: text/html; charset=utf-8');

$server = 'localhost';
$brugernavn = 'root';
$kodeord = '***************';
$database = 'haka';

try {
  $db = new PDO('mysql:host='.$server.';dbname='.$database.';charset=utf8mb4', $brugernavn, $kodeord);
  $db->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
  $db->setAttribute(PDO::ATTR_EMULATE_PREPARES, false);
} catch(PDOException $ex) {
	echo "Der forekom en fejl";
}

foreach ($db->query('SELECT * FROM gaestebog') as $row) {
	echo "ID: ".$row['id']."<br/>";
	echo "Tid: ".$row['tid']."<br/>";
	echo "IP adresse: ".$row['ip']."<br/>";
	echo "Navn: ".$row['navn']."<br/>";
	echo "Email: ".$row['epost']."<br/>";
	echo "Indlæg: ".$row['indlaeg']."<br/>"."<br/>";
}

?>

NB! I should probably place the foreach block within the try section ...

 

Regards,

 

Erik

Edited by erikla
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.