Jump to content

Sanitizing, how's the best way of doing it?


Matt Ridge

Recommended Posts

  • Replies 82
  • Created
  • Last Reply

Top Posters In This Topic

I'm not really sure who's post you're referring to here, but if you sanitize data *before* validation, it might not be the same after.

 

For example:

 

// 15 character length string
$str = "15 'chars' long";

// Sanitize the string
$sanitized = mysql_real_escape_string($str);

// Now validate the data is no more than 15 characters long
if (strlen($sanitized) <= 15)
{
    // ...
}

 

This would fail because mysql_real_escape_string() would insert a slash before each single quote, making the string 17 characters long. If you switched around the validation and sanitization it would work fine.

Link to comment
Share on other sites

I'm not really sure who's post you're referring to here, but if you sanitize data *before* validation, it might not be the same after.

 

For example:

 

// 15 character length string
$str = "15 'chars' long";

// Sanitize the string
$sanitized = mysql_real_escape_string($str);

// Now validate the data is no more than 15 characters long
if (strlen($sanitized) <= 15)
{
    // ...
}

 

This would fail because mysql_real_escape_string() would insert a slash before each single quote, making the string 17 characters long. If you switched around the validation and sanitization it would work fine.

Or you could take into consideration to make it if (strlen($sanitized) <= 17) and it would work correct?

Link to comment
Share on other sites

Not really, because in a realistic situation $str would be user input. You have no idea how many slashes will be needed.

 

But you've already made it where it couldn't be more that 15 for initial input correct? The sanitation would add two more slashes to it, the final line as far as I read it is less than or equal to 17... so that would mean that it couldn't be beyond 17, no matter how many slashes are there if the value is greater than 17 it wouldn't accept it...

 

Or am I reading that wrong?

Link to comment
Share on other sites

Not really, because in a realistic situation $str would be user input. You have no idea how many slashes will be needed.

 

But you've already made it where it couldn't be more that 15 for initial input correct? The sanitation would add two more slashes to it, the final line as far as I read it is less than or equal to 17... so that would mean that it couldn't be beyond 17, no matter how many slashes are there if the value is greater than 17 it wouldn't accept it...

 

Or am I reading that wrong?

 

Are you for real? I'm starting to suspect that you are answering the way you are just to see how far you can take this thread.

 

User inputs data into a field that should only be 15 characters long. The PHP code will accept that value and then VALIDATE that the value does not exceed 15 characters. If so, then it rejects the user input. If the value passes validation (and all other validations pass for the submission), then the value will be sanitized before being used in a query (the sanitation is dependent upon the storage method). In this cae we would use mysql_real_escape_string(). That function will, among other things, precede quote marks with a backslash.

 

The backslash doesn't get added to the stored value in the database. It only ensures that the quote mark is interpreted as a quote match character instead of being interpreted as a control character for defining the query. So, a value of 15 quote marks would be 30 characters if you were to try and validate the length after sanitation. But, since that sanitized value won't actually be stored, it would be incorrect to use it for that validation.

 

User Value  | Sanitized Value | Stored Value
---------------------------------------------
abcd          abcd              abcd
abc ' 123     abc \' 123        abc ' 123
1'2'3'4'5'6   1\'2\'3\'4\'5\'6  1'2'3'4'5'6
''''''        \'\'\'\'\'\'      ''''''

 

 

Link to comment
Share on other sites

Not really, because in a realistic situation $str would be user input. You have no idea how many slashes will be needed.

 

But you've already made it where it couldn't be more that 15 for initial input correct? The sanitation would add two more slashes to it, the final line as far as I read it is less than or equal to 17... so that would mean that it couldn't be beyond 17, no matter how many slashes are there if the value is greater than 17 it wouldn't accept it...

 

Or am I reading that wrong?

 

Are you for real? I'm starting to suspect that you are answering the way you are just to see how far you can take this thread.

 

User inputs data into a field that should only be 15 characters long. The PHP code will accept that value and then VALIDATE that the value does not exceed 15 characters. If so, then it rejects the user input. If the value passes validation (and all other validations pass for the submission), then the value will be sanitized before being used in a query (the sanitation is dependent upon the storage method). In this cae we would use mysql_real_escape_string(). That function will, among other things, precede quote marks with a backslash.

 

The backslash doesn't get added to the stored value in the database. It only ensures that the quote mark is interpreted as a quote match character instead of being interpreted as a control character for defining the query. So, a value of 15 quote marks would be 30 characters if you were to try and validate the length after sanitation. But, since that sanitized value won't actually be stored, it would be incorrect to use it for that validation.

 

User Value  | Sanitized Value | Stored Value
---------------------------------------------
abcd          abcd              abcd
abc ' 123     abc \' 123        abc ' 123
1'2'3'4'5'6   1\'2\'3\'4\'5\'6  1'2'3'4'5'6
''''''        \'\'\'\'\'\'      ''''''

 

Actually I am asking because people here say this is how to sanitize, not how sanitation worked in a manner I could understand, hence why this has gone to four pages... as far as I knew the \ was only entered at the beginning and end of an input, and everything before and after said \ mark was ignored, or sanitized so malicious code was not entered into the database or was ignored by the web browser so it won't be interpreted as actionable code.

 

I'm sorry you think I am wasting your time, but when i originally posted some code a while back I was told to enter $phpcode = '\statement\' ; So I took it as this was sanitizing the code.

 

Now I started this post asking how sanitation works, you are the first person to actually respond in a manner that shows what I was asking for.

 

Now with the code you showed, I noticed it didn't add any \ when you typed abcd... is it because it doesn't have a , or ' in it, so it doesn't need to be sanitized at all?

 

Sorry if I don't seem to grasp things quickly, unlike you I am not a guru, and the only book I have read has one paragraph talking about sanitation.  Otherwise everything i've learned is from php.net and here.

Link to comment
Share on other sites

Now with the code you showed, I noticed it didn't add any \ when you typed abcd... is it because it doesn't have a , or ' in it, so it doesn't need to be sanitized at all?

 

There's nothing in the string abcd that would cause any problems.  You only need to escape characters that may cause problems such as quotation marks.  What characters will cause problems and how you escape them depend entirely on what you are doing with said data.  Meaning

 

If your putting data in a sql query: Quotation marks will cause problems.  How you escape them depends on the db engine in use.  mysql escapes them using backslash, sql server escapes them by doubling up.

If your putting data into HTML: <, >, &, an possibly quote characters will cause problems.  You escape them by converting them to the entity values < > & and "

If your putting data into a CSV file: commas and quotes will cause problems.  You escape commas by ensuring the field is encased in quotes and escape quotes using a backslash.

and the list could go on....

 

To properly sanitize something you just have to consider how your data is being used and ensure it is not mis-interpreted by escaping any problem characters/sequences.  You may need multiple types of sanitation on data, but you may not need to apply them at the same time.  For example if you let users type something into a form and save it into your database, then you have a page that displays that data you need to do three different sanitation at different times.

 

On input, when they submit to the database

1) Validate the data to ensure it meets all constraints (length, format, whatever you need)

2) Escape the data before putting it in the SQL query to prevent injection (eg mysql_real_escape_string or equivalent)

 

On output, when the data is displayed on a page

1) Run the data through htmlentities() to convert any special html characters (<, >, & ...) into their entity values so that they are not interpreted by the browser as html tags

 

 

Link to comment
Share on other sites

Now with the code you showed, I noticed it didn't add any \ when you typed abcd... is it because it doesn't have a , or ' in it, so it doesn't need to be sanitized at all?

 

There's nothing in the string abcd that would cause any problems.  You only need to escape characters that may cause problems such as quotation marks.  What characters will cause problems and how you escape them depend entirely on what you are doing with said data.  Meaning

 

If your putting data in a sql query: Quotation marks will cause problems.  How you escape them depends on the db engine in use.  mysql escapes them using backslash, sql server escapes them by doubling up.

If your putting data into HTML: <, >, &, an possibly quote characters will cause problems.  You escape them by converting them to the entity values < > & and "

If your putting data into a CSV file: commas and quotes will cause problems.  You escape commas by ensuring the field is encased in quotes and escape quotes using a backslash.

and the list could go on....

 

To properly sanitize something you just have to consider how your data is being used and ensure it is not mis-interpreted by escaping any problem characters/sequences.  You may need multiple types of sanitation on data, but you may not need to apply them at the same time.  For example if you let users type something into a form and save it into your database, then you have a page that displays that data you need to do three different sanitation at different times.

 

On input, when they submit to the database

1) Validate the data to ensure it meets all constraints (length, format, whatever you need)

2) Escape the data before putting it in the SQL query to prevent injection (eg mysql_real_escape_string or equivalent)

 

On output, when the data is displayed on a page

1) Run the data through htmlentities() to convert any special html characters (<, >, & ...) into their entity values so that they are not interpreted by the browser as html tags

 

Isn't a query pulling data from a database, not entering it?

 

Why would you need to sanitize a query pulling information from a database? Unless you mean that it is protecting itself from malicious code that may of been entered into the database by someone else?

 

That being said, if I use htmlentities, would I be able to retrieve the data from the database and it will show the correct information on output?

Link to comment
Share on other sites

Isn't a query pulling data from a database, not entering it?

 

That depends on the query your using.  A SELECT query pulls data out of the database.  An INSERT or UPDATE query puts data into the database.  What query type you use is irrelevant though.  Regardless of what type of query you use, if you are going to use a variable in the query text, you have to sanitize it to ensure it will not cause problems.

//$name must be sanitized so it does not cause problems because it is being used in a query.
$sql = 'SELECT blah FROM table WHERE Username='.$name;

 

Why would you need to sanitize a query pulling information from a database?

 

You don't sanitize 'the query', you sanitize the data.  Data you pull out of a database via a select for instance, only need sanitized if you need to use it in a context that requires it, such as if your going to output it in a web page.  For example, take these forms.  We can enter whatever we want in our post.  Say I enter in this:

 

<script type="text/javascript">alert('Hi'); </script>

 

If you do not run that through function such as mysql_real_escape_string before putting it into the insert query which saves the post to the database, it will cause the query to fail due to the quotes.

 

If you do not run it through htmlentities() before outputting it to the web page, then that script block will be executed by the browser and visitors would get an alert saying 'Hi' shown to them.  Imagine if it did more than just alert hi?  Such as steal cookies or use XHR to submit spam posts using that user's account?

 

 

 

Link to comment
Share on other sites

This is what I am getting confused at, mysql_real_escape_string and now htmlentities... each time I think I have a handle on things people throw something else into the mix.  I thought the sanitation script shown a few pages back would not need either of those. 

 

Since that seems to not be the case, how do you use the sanitation script a few pages back and still use mysql_real_escape_string and htmlentities effectively, or is this a one or the other thing?

 

Also, I thought mysql_real_escape_string and htmlentities was sanitizing data.

 

This is what I mean by there has to be a standard... if doing standard data inputting like the post, there has to be a string of sanitization steps to take... if doing a date, another...

 

I'm not asking really for an opinion...everyone here says it has to be done, but it seems there are more than one way to do it.  SO what do I do, use the script a few pages back? Or mysql_real_escape_string and htmlentities? Or both? If so how?

Link to comment
Share on other sites

If by "the script a few pages back" you mean the sanitize() and sanitize_array() functions that were posted, then no. Do not use them for the reasons listed in the last few posts. That implies that all data can be sanitized in the same way, which it can't.

 

Also, I thought mysql_real_escape_string and htmlentities was sanitizing data.

 

They do sanitize data, but for use in different situations. mysql_real_escape_string() is used for sanitizing data before it's used in a database query, and htmlentities() is used before outputting data into a web page.

 

This is what I mean by there has to be a standard... if doing standard data inputting like the post, there has to be a string of sanitization steps to take

 

mysql_real_escape_string() IS the standard function to use when escaping data for use in a MySQL query. htmlentities() is one of two similar functions, there's also htmlspecialchars(). The difference is that htmlentities() will convert any character that has a HTML entity equivalent, where-as htmlspecialchars() is bit more reserved and generally is all you will need.

 

Read the manual for a clear explanation of any function and to decide, if there's a few that are similar, which is the right one to use.

Link to comment
Share on other sites

If by "the script a few pages back" you mean the sanitize() and sanitize_array() functions that were posted, then no. Do not use them for the reasons listed in the last few posts. That implies that all data can be sanitized in the same way, which it can't.

 

Also, I thought mysql_real_escape_string and htmlentities was sanitizing data.

 

They do sanitize data, but for use in different situations. mysql_real_escape_string() is used for sanitizing data before it's used in a database query, and htmlentities() is used before outputting data into a web page.

 

This is what I mean by there has to be a standard... if doing standard data inputting like the post, there has to be a string of sanitization steps to take

 

mysql_real_escape_string() IS the standard function to use when escaping data for use in a MySQL query. htmlentities() is one of two similar functions, there's also htmlspecialchars(). The difference is that htmlentities() will convert any character that has a HTML entity equivalent, where-as htmlspecialchars() is bit more reserved and generally is all you will need.

 

Read the manual for a clear explanation of any function and to decide, if there's a few that are similar, which is the right one to use.

 

So in English if we have both of those in a script you'll protect yourself from the majority of all attacks?

Link to comment
Share on other sites

Kind of. You will likely use both for the same piece of data, but at different times. If you use htmlentities/htmlspecialchars whenever you read and write the data from the database, this could lead to issues. For example if it's a blog or forum post, every time it's edited by the user it will encode the encoding. Slowly you'll get an encoded ampersand, for an encoded ampersand, for say a space:

 

When it's inserted:

 

 

 

The first time it's modified:

 

&nbsp;

 

The second time it's modified:

 

&amp;nbsp;

 

etc.

 

Do not use a generic function to handle all types of sanitization.

Link to comment
Share on other sites

Kind of. You will likely use both for the same piece of data, but at different times. If you use htmlentities/htmlspecialchars whenever you read and write the data from the database, this could lead to issues. For example if it's a blog or forum post, every time it's edited by the user it will encode the encoding. Slowly you'll get an encoded ampersand, for an encoded ampersand, for say a space:

 

When it's inserted:

 

 

 

The first time it's modified:

 

&nbsp;

 

The second time it's modified:

 

&amp;nbsp;

 

etc.

 

Do not use a generic function to handle all types of sanitization.

 

Then how would you sanitize something like a blog so that kind of issue doesn't occur?

Link to comment
Share on other sites

.... Exactly how we've been describing for 5 pages.

 

So then in that case you'd use the sanitation script?

 

I hate to sound dense, but when people say don't use some sanitation for some situations, but to use it in others, and not really say which is the best for which, I tend to get confused.

 

Blogs, don't use htmlentities/htmlspecialchars, but at the same time use mysql_real_escape_string with the script a few pages back? Or no?

 

This is what I mean by a standard.  There has to be a way to sanitize script depending on what you are doing...

Link to comment
Share on other sites

.... Exactly how we've been describing for 5 pages.

 

So then in that case you'd use the sanitation script?

 

I hate to sound dense, but when people say don't use some sanitation for some situations, but to use it in others, and not really say which is the best for which, I tend to get confused.

 

Blogs, don't use htmlentities/htmlspecialchars, but at the same time use mysql_real_escape_string with the script a few pages back? Or no?

 

This is what I mean by a standard.  There has to be a way to sanitize script depending on what you are doing...

 

THERE IS NO STANDARD BECAUSE THE APPROPRIATE METHOD OF SANITIZING A VALUE IS COMPLETELY DEPENDENT UPON HOW THE VALUE WILL BE USED.

 

That is the reason you were advised not to use that sanitation script previously posted. Many people do create sanitation scripts such as that, but they are being short-sighted and possibly (probably) have bugs they aren't even aware of. You need to know "how" your variables are being used, "what" values could pose a problem and "implement" the necessary validation and sanitation processes to prevent those problems. This just doesn't apply to storing values in the database or displaying value in the webpage. It can apply to many different scenarios. For example, if you are using a value as the denominator in a division you need to ensure the value is not 0.

 

I will state it again - there is no one size fits all solution. You need to understand the type of application you are building and have a grasp of the language you are working with. Then you need to use critical thinking skills to identify the potential problems and then use the correct process to guard against them.

 

There are "best practices" that you need to learn, such as using mysql_real_escape_string() for any "string" values that you will use in a DB query.

 

As an analogy, think about making dinner. There are no absolutes, but there are best practices. Do you always use a 3qt pot over high heat? No. Sometimes you need a bigger pot or maybe you need a pan. Hell, sometimes you don't even cook the meal. Programming is no different.

Link to comment
Share on other sites

Blogs, don't use htmlentities/htmlspecialchars, but at the same time use mysql_real_escape_string with the script a few pages back? Or no?

 

It's not that you do not use htmlentities, you just use it at a different time.  As I said before, it all depends on where your going to be using your data at.

 

For example, with the forum posts:

Adding a new post:

  - mysql_real_escape_string  before the data is put into a SQL query

//Adding
$postContent = mysql_real_escape_string($_POST['content']);
$sql = 'INSERT INTO posts VALUES (\''.$postContent.'\');

 

Modifying a post:

  - htmlentities before the data is put into the html for viewing.

  - mysql_real_escape_string when you put the data into the query to update the database

//Modifying
if (isset($_POST['save'])){
$postContent = mysql_real_escape_string($_POST['content']);
$postId = intval($_POST['id']);
$sql = 'UPDATE posts SET content=\''.$postContent.'\' WHERE postid='.$id;
}
else {
$postId = intval($_GET['id']);
$sql = 'SELECT content FROM posts WHERE postid='.$id;
$res=mysql_query($sql);
$row=mysql_fetch_array($res);

$content = htmlentities($row['content']);
echo '<textarea name="content" rows="10" cols="10">'.$content.'</textarea>';
}

 

Viewing a post:

  - htmlentities before the data is put into the HTML for viewing

//Viewing
$postId = intval($_GET['id']);
$sql = 'SELECT content FROM posts WHERE postid='.$id;
$res=mysql_query($sql);
$row=mysql_fetch_array($res);

$content = htmlentities($row['content']);
echo $content;

 

 

What you do not want to do is run the content through htmlentities prior to inserting it into the database (either when adding or when editing).  The reason for this is because then when you want to edit the post you would have to undo this escaping or else you will end up with multiple layers of it, like demonstrated above.  Also if you ever wanted to use the data elsewhere (say, in a PDF version of the topic) you would have to undo this escaping because it is not necessary when putting the data into a PDF.

 

You always want to do whatever sanitation/escaping you need only just before you actually need it, not before.  You want to try and preserve the original data as much as possible.

 

As for this standard practice you seem to be desiring, as has been stated there is no one-size-fits-all or always-do-this method that will protect you from everything.  It all depends on how the data is being used and process.  For each individual use there are generalizations that can be made, but that is the extent of it.  By that I mean for example, when outputting data to HTML, you generally want to run htmlentities.  It is not always necessary (an sometimes not desired) but in general it is what you want. It's up to you to recognize when you do or do not want/need it.  Same for when you use data in a SQL query.  Generally you want to run it though mysql_real_escape_string, though it is not always necessary.  Using it when not necessary will not hurt anything.  In my examples above, I did not use it for the post id because I ran it through intval() instead, which guarantees it is going to be an integer.

 

Link to comment
Share on other sites

.... Exactly how we've been describing for 5 pages.

 

So then in that case you'd use the sanitation script?

 

I hate to sound dense, but when people say don't use some sanitation for some situations, but to use it in others, and not really say which is the best for which, I tend to get confused.

 

Blogs, don't use htmlentities/htmlspecialchars, but at the same time use mysql_real_escape_string with the script a few pages back? Or no?

 

This is what I mean by a standard.  There has to be a way to sanitize script depending on what you are doing...

 

THERE IS NO STANDARD BECAUSE THE APPROPRIATE METHOD OF SANITIZING A VALUE IS COMPLETELY DEPENDENT UPON HOW THE VALUE WILL BE USED.

 

That is the reason you were advised not to use that sanitation script previously posted. Many people do create sanitation scripts such as that, but they are being short-sighted and possibly (probably) have bugs they aren't even aware of. You need to know "how" your variables are being used, "what" values could pose a problem and "implement" the necessary validation and sanitation processes to prevent those problems. This just doesn't apply to storing values in the database or displaying value in the webpage. It can apply to many different scenarios. For example, if you are using a value as the denominator in a division you need to ensure the value is not 0.

 

I will state it again - there is no one size fits all solution. You need to understand the type of application you are building and have a grasp of the language you are working with. Then you need to use critical thinking skills to identify the potential problems and then use the correct process to guard against them.

 

There are "best practices" that you need to learn, such as using mysql_real_escape_string() for any "string" values that you will use in a DB query.

 

As an analogy, think about making dinner. There are no absolutes, but there are best practices. Do you always use a 3qt pot over high heat? No. Sometimes you need a bigger pot or maybe you need a pan. Hell, sometimes you don't even cook the meal. Programming is no different.

 

Sorry for the long delay, but as kicken proved there are standards for specific items, which I was asking for originally...

 

As for your cooking example, if you want the same results in cooking you use the same items all the time. If you wish to increase how much you cook then you adjust it accordingly.

 

The thing though is that you don't use salt when you are meant to use sugar.  I am asking how to use the salt in the recipe, and what you are telling me is that it doesn't matter what you use as long as you make it taste good in the long run.

 

I have over 20 years of cooking experience, 3 in a professional kitchen. Don't tell me that a chef that creates something will tell you to just "throw stuff into a pot" and see if it works... he expects you to follow his recipe to the letter, and in many cases he has motions, hardware, and utensils that he expects to be used to create his recipe.

 

When creating a form that can be edited, htmlspecialchars(), htmlentities() and real_escape_string() are used in a specific way in that case. I was asking how they work.  You and others said it depends on what it used on. Instead of giving examples you fell back on that excuse.

 

I'm sorry, but that is a poor excuse.

 

kicken showed exactly what I was looking for, and example that is understandable.

 

Sometimes you need to show how things work in code for them to be understood better. And sometimes you need to burn butter to make it taste good on some items as well.

Link to comment
Share on other sites

Any good recipe instruction, while giving you exact measurements, utensils etc, will also indicate that, depending upon certain conditions, there are substitutes or other options available - most common one 'Salt and pepper to taste".  Depending upon a particular situation the EXACT recipe may not be the best, most appropriate or desired.  New recipes often begin with modifications to old ones. Standards are NOT hard fast rules, they are guidelines.

Link to comment
Share on other sites

Any good recipe instruction, while giving you exact measurements, utensils etc, will also indicate that, depending upon certain conditions, there are substitutes or other options available - most common one 'Salt and pepper to taste".  Depending upon a particular situation the EXACT recipe may not be the best, most appropriate or desired.  New recipes often begin with modifications to old ones. Standards are NOT hard fast rules, they are guidelines.

 

True, but they are also something in where examples are given for said conditions.  Just not told to use them without any example of why or how to.

Link to comment
Share on other sites

Matt, you may want to re-read through this entire thread.  All of us have said to use certain functions before doing certain things.  That's why I mentioned escaping strings before using them in a db query.  That's why several others discussed using htmlentities before outputting user-supplied data.  There are 5 pages discussing not only what to use, not only how to use them, but also why.

 

There's no "Do the following things for EVERY SINGLE THING you come across" standard.  Instead, it's all "If this is what you're doing, do X.  If this other thing is what you're doing, do Y."

 

What you should do is brush up on the basics.  You have a very rudimentary, and partially flawed, view of PHP and web security in general.  You're arguing from ignorance, and there are only so many ways for us to (re)explain things.

Link to comment
Share on other sites

Matt, you may want to re-read through this entire thread.  All of us have said to use certain functions before doing certain things.  That's why I mentioned escaping strings before using them in a db query.  That's why several others discussed using htmlentities before outputting user-supplied data.  There are 5 pages discussing not only what to use, not only how to use them, but also why.

 

There's no "Do the following things for EVERY SINGLE THING you come across" standard.  Instead, it's all "If this is what you're doing, do X.  If this other thing is what you're doing, do Y."

 

What you should do is brush up on the basics.  You have a very rudimentary, and partially flawed, view of PHP and web security in general.  You're arguing from ignorance, and there are only so many ways for us to (re)explain things.

 

All I was asking was for an example, not someone saying it goes at X... if a few pages of that type of explaining didn't show people I didn't understand what you were talking about, then perhaps what happened on page 5 should of been used sooner?

 

I don't know...I asked for examples, no one gave them till page 5... and that is all I wanted. Examples of what they meant...by saying it is used in different situations. There is a standard... people here just don't see it as that.

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.