Jump to content

Correctly store and retrieve records from DB table that may contain unicode


Recommended Posts

I have been using an outdated method for sometime to store and retrieve records from my database.

 

I am wanting to know the best way to convert or translate, whatever, the text to a format that I store in the database and then what I should be doing to get it back before it is displayed on the webpage.

 

I tried htmlentities and stuff like that but not realy understanding the flags and stuff and not getting it.

 

Please help me or at least guide me to the right commands that I should be using for what.

 

 

I do not really want someone to do it, just tel me what commands and how to use them.  The adding to the database and getting back is something I know.

I think it is best to store the data in its raw format, but escape when presenting the content.  That way whether inputted via the application or directly into the database, the viewer is protected.  Of course, escape for SQL injection using PDO prepared statements.  To escape, use the following (or better yet, start using a template engine such as Twig or Smarty).

htmlentities($yourContentFromTheDB, ENT_QUOTES, 'UTF-8');

I currently use MySQLi and $mysqli->real_escape_string($string) to escape the string before adding to the database.

 

So I just add the text as they entered it without altering it in anyway but escape it using $mysqli->real_escape_string($string)

 

Then when grabbing it from the table to display on the webpage I just unescape it using htmlentities($yourContentFromTheDB, ENT_QUOTES, 'UTF-8');

There are two types of escaping going on, and they are not related.

 

real_escape_string() protects against SQL injection where the bad guy sends content which can compromise your database.  See https://www.owasp.org/index.php/SQL_Injection for more.  After you escaped it, the data is not changed (unless it was a bad guy attack in which it will store the escaped content, but don't worry about that), and there is nothing to escape back.

 

htmlentities() escapes against cross-site scripting and is to protect the viewer of your site.  See https://www.owasp.org/index.php/Cross-site_Scripting_(XSS) for more.

So if I have understood this the htmlentities will convert all special caracters to their ACSII numbers and still show the text as say...

<script>...</script>

to the viewer but in the source code it shows as the ASCII version.

I was just checking the PHP.net site for the htmlentities it seems I may have found out why I was not able to convert/remove the special characters before as I was getting a blank string back.  I may have forgotten to use the  ENT_IGNORE part.  I am going to find the code I used to test this out.

 

 

 
<?php
$str 
"\x8F!!!";

// Outputs an empty string
echo htmlentities($strENT_QUOTES"UTF-8");

// Outputs "!!!"
echo htmlentities($strENT_QUOTES ENT_IGNORE"UTF-8");
?>

I just entered the text as is in the table using my script.  escaping the string first..

 

❤French style nest of table , oh la la £40❤

 

and it shows as this in the database...

 

â¤French style nest of table , oh la la £40â¤

 

But the £ and the heart do not show up even when I set the page as UTF-8

 

I used echo htmlentities($strENT_QUOTES ENT_IGNORE"UTF-8");

 

to output it on the page, but the page is blank still and nothing in the source code.

 

in the script I placed this at the start...

 

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Untitled Document</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>

<body>

 

 

and the following at the end...

 

</body>
</html>

Edited by jasonc

ok I use the following to add the entry as it and escape the text in case it contains something it should not...

 

$query="INSERT INTO `tablename` (`id`, `title`) VALUES ('1','".$mysqli->real_escape_string('<script>alert("1");< /script>')."')";    db_query($mysqli, $query);
 

 

Then to get it back I use...

 

echo(htmlentities($title, ENT_QUOTES | ENT_IGNORE, "UTF-8");

 

But I get a blank page, not even any source code...

 

so I change it to this...

 

echo($title);

 

and sure enough the alert box shows up.

 

 

Am I missing something here ?

 

How do I stop code like this from working as the bad poster expected...

But I get a blank page, not even any source code...

 

 

because, for the example code you are posting, it has a php syntax error in it, and the code never runs.

 

do you have php's error_reporting set to E_ALL and display_errors set to ON in the php.ini on your development system so that php will help you by reporting and displaying all the errors it detects? note: for a php syntax error in your main file. you cannot put the error_reporting/display_errors settings into your code since your code never runs in this case to make use of those settings.

since there was a php syntax error, that means the php.ini where you have placed the error_reporting/display_errors settings likely isn't the one that php is using. you need to make sure that your development system is set up to report and display all errors before you waste any more time on problems.

ok all seem ok, apart from the £ sign, i get the diamond with a question mark in it.

 

❤French style nest of table , oh la la �40❤

 

it should be

❤French style nest of table , oh la la £40❤

 

ok what I have done is echod out all the ord() codes...

 

226 �
157 �
164 �
70 F
114 r
101 e
110 n
99 c
104 h
32
115 s
116 t
121 y
108 l
101 e
32
110 n
101 e
115 s
116 t
32
111 o
102 f
32
116 t
97 a
98 b
108 l
101 e
32
44 ,
32
111 o
104 h
32
108 l
97 a
32
108 l
97 a
32
163 �
52 4
48 0
226 �
157 �
164 �

Edited by jasonc

I'm afraid this isn't the only issue. The malformed output you posted earlier indicates that there are other encoding-related problems.

It's also time to switch to HTML5. The days of HTML 4.01 are over.

Yes my page is html5, but some of the code was outdated.

 

It seems that so far all information that I add as new is being stored as is, no change to the text apart from escaping before being added and when displayed all looks ok and of the scripting text I tried this also shows as html entities in the source code.  All good so far.  But still there is the issue of all the entries in the database that was incorrectly stored.

 

How would I start to convert it to normal text as it would be if I had entered it as of now.

 

£ instead of the chr(163) which shows on the page as a diamond with question mark in it. and the other unicode characters.

The fact that “£” is still ISO-8859-1 encoded means you have not fully converted your application to UTF-8 yet.

 

Re-read my reply #15 and go through each of the four steps. We cannot do that for you.

All tables and databases are set to utf-8.

 

Adding/editing new entries is working ok.

 

The actual data in the database which was added ages ago is still in the old format. Is there a way to convert the previous existing data before I changed to the correct method of storing the data.

Edited by jasonc

ALTER TABLE `tablename` CONVERT TO CHARACTER SET utf8 [COLLATE utf8_general_ci]

 

I used this but still the text in the data is not showing on the webpage as that of a new entry that is added via my CMS area.

Edited by jasonc
This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.