Jump to content

Sanitizing, how's the best way of doing it?


Matt Ridge

Recommended Posts

Well usually you'll write a function and include it in your php.

Like this for instance:

 

function sanitize($var)

{

$var = strip_tags($var);

$var = htmlentities($var);

$var = stripslashes($var);

return mysql_real_escape_string($var);

}

 

...so when you write sanitize php performs all those steps on the variable you enter.

Ie, sanitize($userInput) means $userInput goes through all the steps in the function.

You can add or take away all you want from it depending on what you need to do.

My version above is useless if you want the input to be HTML so you would modify that to leave the HTML intact.

Link to comment
Share on other sites

  • Replies 82
  • Created
  • Last Reply

Top Posters In This Topic

Well usually you'll write a function and include it in your php.

Like this for instance:

 

function sanitize($var)

{

$var = strip_tags($var);

$var = htmlentities($var);

$var = stripslashes($var);

return mysql_real_escape_string($var);

}

 

...so when you write sanitize php performs all those steps on the variable you enter.

Ie, sanitize($userInput) means $userInput goes through all the steps in the function.

You can add or take away all you want from it depending on what you need to do.

My version above is useless if you want the input to be HTML so you would modify that to leave the HTML intact.

 

Ok, just curious... you have $var= a lot of different things, how can it equal multiple things without loosing it's original =?

Link to comment
Share on other sites

@Matt:

 

You have a lot of enthusiasm and you are actually trying to learn (which is more than most people who visit these forums). But, the problem is that we cannot answer your questions in a way that you can understand because you apparently don't understand some of the basics yet. That's not meant to be mean, it's just an observation.

 

We've tried to answer some of the questions in generalizations, but you are wanting specific details. But, if we gave you the details you wouldn't understand those either. Your last question is one of the most basic things, but you don't understand it. You can redefine a variable based on the current value of the variable

$var = 1; //Var = 1
$var = $var +2 //Var = (1) + 2 = 3
$var = $var * 3 //Var = (3) * 3 = 9
$var = $var - 4 //Var = (9) - 4 = 5
echo $var; //Output: 5

 

As stated before "sanitizing" means many things:

1. Validating that the value is appropriate for what it is being submitted for. If you expect the value to be an integer, make sure it is an integer. If it should be a data, make sure it is a date

 

2. Escaping the value to prevent errors. This can be using mysql_real_escape_string() for data before it is used in a query or it could be running it through htmlentities() before echoing it to an html page.

 

The appropriate Validation and Escaping are completely dependent upon what the particular value is and should be and how you are going to use it. I have seen many people use an all-in-one sanitize process - which is the wrong thing to do. Although you could create such a process that takes two parameters (the value and the type) and then performs the necessary processes for the type of value.

 

YOU need to analyze the data you are capturing and determine what processes need to be done based upon how YOU plan to use that data. If you were to ask how to validate the input for a monetary field before saving to the database we could give specific code examples. But, there is no ONE answer that fits every situation.

Link to comment
Share on other sites

But, there is no ONE answer that fits every situation.

 

Too true.  Programming is hardly ever yes/no, right/wrong, black/white.  It's usually about finding a balance between:

 

What works

What's easy to read/understand (in a professional environment, you're all but guaranteed to be working with others on the same code)

What's efficient

Link to comment
Share on other sites

@Matt:

 

You have a lot of enthusiasm and you are actually trying to learn (which is more than most people who visit these forums). But, the problem is that we cannot answer your questions in a way that you can understand because you apparently don't understand some of the basics yet. That's not meant to be mean, it's just an observation.

 

We've tried to answer some of the questions in generalizations, but you are wanting specific details. But, if we gave you the details you wouldn't understand those either. Your last question is one of the most basic things, but you don't understand it. You can redefine a variable based on the current value of the variable

$var = 1; //Var = 1
$var = $var +2 //Var = (1) + 2 = 3
$var = $var * 3 //Var = (3) * 3 = 9
$var = $var - 4 //Var = (9) - 4 = 5
echo $var; //Output: 5

 

As stated before "sanitizing" means many things:

1. Validating that the value is appropriate for what it is being submitted for. If you expect the value to be an integer, make sure it is an integer. If it should be a data, make sure it is a date

 

2. Escaping the value to prevent errors. This can be using mysql_real_escape_string() for data before it is used in a query or it could be running it through htmlentities() before echoing it to an html page.

 

The appropriate Validation and Escaping are completely dependent upon what the particular value is and should be and how you are going to use it. I have seen many people use an all-in-one sanitize process - which is the wrong thing to do. Although you could create such a process that takes two parameters (the value and the type) and then performs the necessary processes for the type of value.

 

YOU need to analyze the data you are capturing and determine what processes need to be done based upon how YOU plan to use that data. If you were to ask how to validate the input for a monetary field before saving to the database we could give specific code examples. But, there is no ONE answer that fits every situation.

 

So in the simplest terms, If $var = 1 and then in the next line $var= 2, var actually equals 3, because it compounds instead of replaces.  That means the same with all variables, hence why you can add strings of information as code goes down the line. 

 

Now if you make $var="" near the end that should clear out all the variables that were attached to it correct?

 

 

 

Thanks for being understanding, I am self taught, reading books only can go so far... they are good, but they tend to assume you understand what they are saying.  So I know enough to be dangerous. but not enough to be proficient. At least not yet.

Link to comment
Share on other sites

if $var = 1 and then in the next line $var= 2, var actually equals 3

 

No, you are still skimming over and then not thinking about what you actually see in front of you. $var will only be equal to 3, if you do something that assigns the value three to it. Setting $var = 1; followed by a line that sets $var=2;, will only result in $var being equal to 2. In the example that mjdamato posted, the only 3 result was due to a mathematical expression involving $var +2, where the starting value in $var was 1, i.e. 1 + 2 = 3.

 

Numbers in variables and equations in a programming language are NO different than that you learned in Algebra in school.

Link to comment
Share on other sites

You should spend a day or two and read the PHP manual. For the most part, it has very good documentation for all of PHP.

 

http://www.php.net/manual/en/language.operators.assignment.php Maybe this will help you understand.

 

When you use the assignment operator "=" only, the value to the right overwrites the previous value in a variable.

$var = 'foo';

$var = 'bar';

// $var is now "bar", "foo was overwritten"

 

Now if you wanted to append "bar" to "foo", you would use the assignment operator ".=".

$var = 'foo';

$var .= 'bar';

// $var is "foobar"

 

This is the utmost basic fundamentals of programming. I'm not sure what book you are reading, but it may be more of an advanced book rather than a beginner book that doesn't spend much time on these sorts of things. Because they are very easy and with a proper book you should be able to pick them up right away.

 

My advice is to learn all of the (or most of the) PHP operators: what they do, how to use them, when to use X over Y, etc. The PHP manual has a wealth of knowledge, and you can find more by Googleing if you need to. Also, don't be afraid to make a .php file and just try stuff out. In my opinion, that is the best way to learn. Once you know a handful of the operators just try stuff out and see if it works in the way you think it should work. If not, look it up again and figure out what went wrong.

 

http://php.net/manual/en/language.operators.php

Link to comment
Share on other sites

if $var = 1 and then in the next line $var= 2, var actually equals 3

 

No, you are still skimming over and then not thinking about what you actually see in front of you. $var will only be equal to 3, if you do something that assigns the value three to it. Setting $var = 1; followed by a line that sets $var=2;, will only result in $var being equal to 2. In the example that mjdamato posted, the only 3 result was due to a mathematical expression involving $var +2, where the starting value in $var was 1, i.e. 1 + 2 = 3.

 

Numbers in variables and equations in a programming language are NO different than that you learned in Algebra in school.

 

Sorry I should of been more clear by using the math in-between the vars... Basically the $var plus the math computation takes the value of the prior $var value and then then in that line becomes the new $var value till line three if there is one.  The way I had it then showed it as one replacing the other.

 

That being said, when something is being used like this:

 

function sanitize($var)
{
   $var = strip_tags($var);
   $var = htmlentities($var);
   $var = stripslashes($var);
   return mysql_real_escape_string($var);
}

 

Now this means that because it has the prior $var in it's new line it takes on the properties of the current definition plus the prior one? And the last $var equals all the prior $var values? Hence why you only need to add this in once?

 

Now that being said a code like below has the variables up front, if I used the code above, I would have to do this with each and every variable?

 

I don't mean to sound lazy, but curious, there is no one line of "magic code" once entered could be used to blanket sanitize a variable instead of doing the above code for each variable? I don't mind doing it if I have to, but you can sort of see the reason why my question comes around because if a form has a bunch of input areas, the sanitation code can seem to be almost as long as the entire form itself.

 


<?php

require_once('connectvars.php');
?>

<!DOCTYPE html 

     PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>PDI Non-Conforming Materials Report</title>
<link rel="stylesheet" type="text/css" href="CSS/ie.css" />

</head>

<body>
<div id="logo">
<img src="images/PDI_Logo_2.1.gif" alt="PDI Logo" />
</div>

<div id="title">
<h2 id="NCMR2">Non-Conforming Materials Report (NCMR)</h2>
</div>

<?php

//Post Data
if (isset($_POST['submit'])) {
$Added_By = $_POST['Added_By'];
$Added_By_Date = date('Y-m-d',strtotime($_POST['Added_By_Date']));
$Nexx_Part = $_POST['Nexx_Part'];
$Nexx_Rev = $_POST['Nexx_Rev'];
$Nexx_Part_Description = $_POST['Nexx_Part_Description'];
$NCMR_Qty = $_POST['NCMR_Qty'];
$JO = $_POST['JO'];
$SN = $_POST['SN'];
$INV = $_POST['INV'];
$Nexx_Inventory_On_Hand = $_POST['Nexx_Inventory_On_Hand'];
$Nexx_Inventory_Chk = $_POST['Nexx_Inventory_Chk'];
$Supplier_Name = $_POST['Supplier_Name'];
$Supplier_Number = $_POST['Supplier_Number'];
$Manufacturer_Part_Number = $_POST['Manufacturer_Part_Number'];
$Manufacturer_Serial_Number = $_POST['Manufacturer_Serial_Number'];
$NCMR_ID = $_POST['NCMR_ID'];
$Nonconformity = $_POST['Nonconformity'];
$Disposition = $_POST['Disposition'];
$Comments = $_POST['Comments'];
$CommentsAdditional_Details = $_POST['CommentsAdditional_Details'];
$PO = $_POST['PO'];
$PO_Date = date('Y-m-d',strtotime($_POST['PO_Date']));
$Date_Received = date('Y-m-d',strtotime($_POST['Date_Received']));
$output_form = 'no';

if (empty($Added_By) || empty($Added_By_Date) || empty($Nexx_Part) || empty($Nexx_Part_Description) || empty($Supplier_Name) || empty($NCMR_ID) || empty($Nonconformity) || empty($Disposition) || empty($PO) || empty($PO_Date) || empty($Date_Received)) {

// We know at least one of the input fields is blank 
echo 'Please fill out all of the required NCMR information.<br />';
$output_form = 'yes';
	}
}
  else {
$output_form = 'yes';
}
//Access the Database
if (!empty($Added_By) && !empty($Added_By_Date) && !empty($PO_Date)) {
	$dbc = mysqli_connect(DB_HOST, DB_USER, DB_PASSWORD, DB_NAME)
	or die('Error connecting to MySQL server.');

$query = "INSERT INTO ncmr (Added_By, Added_By_Date, Nexx_Part, Nexx_Rev, Nexx_Part_Description, NCMR_Qty, JO, SN, INV, Nexx_Inventory_On_Hand, Nexx_Inventory_Chk, Supplier_Name, Supplier_Number, Manufacturer_Part_Number, Manufacturer_Serial_Number, NCMR_ID, Nonconformity, Disposition, Comments, CommentsAdditional_Details, PO, PO_Date, Date_Received)

VALUES ('$Added_By', '$Added_By_Date', '$Nexx_Part', '$Nexx_Rev', '$Nexx_Part_Description', '$NCMR_Qty', '$JO', '$SN', '$INV', '$Nexx_Inventory_On_Hand', '$Nexx_Inventory_Chk', '$Supplier_Name', '$Supplier_Number', '$Manufacturer_Part_Number', '$Manufacturer_Serial_Number', '$NCMR_ID', '$Nonconformity', '$Disposition', '$Comments', '$CommentsAdditional_Details', '$PO', '$PO_Date', '$Date_Received')";

    mysqli_query($dbc, $query)
      or die ('Data not inserted.');

      // Confirm success with the user
  echo '<tr><td class="thank">';
      echo '<p>Thank you for adding the NCRM, the correct person will be informed.</p>';
      echo '<p><a href="post.php"><< Back to the form</a></p>';
  echo '</td></tr>';
  
mysqli_close($dbc);
  }
  if ($output_form == 'yes') {
	echo '<form method="post">';
		echo '<fieldset>';
				echo '<div id="ab"><span class="b">Added By:  </span><input type="text" name="Added_By" value="" /></div>';
				echo '<div id="abd"><span class="b">On:  </span><input type="text" name="Added_By_Date" value="" /></div>';

	//Nexx Part, Nexx Rev, Nexx Part Description, NCMR Qty, JO, SN and INV
	echo '<div id="box">';
		echo '<div id="box1">';
				echo '<div id="np"><span class="b">Nexx Part:  </span><input type="text" name="Nexx_Part" value="" /></div>';
				echo '<div id="nr"><span class="b">Nexx Rev:  </span><input type="text" name="Nexx_Rev" value="" /></div>';
				echo '<div id="npd"><span class="b">Nexx Part Description:  </span><textarea name="Comments" rows="3" cols="22" ></textarea></div>';
				echo '<div id="ncqt"><span class="b">NCMR Qty:  </span><input type="text" name="NCMR_Qty" value="" /></div>';

		//JO, SN and INV
		echo '<div id ="JSI2">';
				echo '<div id="JO"><span class="b">JO:  </span><br/ ><input type="text" name="JO" size="3" value="" /></div>';
				echo '<div id="SN"><span class="b">SN:  </span><br /><input type="text" name="SN" size="3" value="" /></div>';
				echo '<div id="INV"><span class="b">INV:  </span><br /><input type="text" name="INV" size="3" value="" /></div>';
		echo '</div>';
	echo '</div>';

		//Nexx Inventory On Hand, Nexx Inventory Check, Supplier Name, Supplier Number, Manufacturer Part Number, Manufactuer Serial Number and NCMR ID
		echo '<div id="box2">';
				echo '<div id="nioh"><span class="b">Nexx Inventory On Hand:  </span><input type="text" name="Nexx_Inventory_On_Hand" value="" /></div>';
				echo '<div id="nic"><span class="b">Nexx Inventory Chk:  </span><input type="text" name="Nexx_Inventory_Chk" value="" /></div>';
				echo '<div id="sun"><span class="b">Supplier Name:  </span><textarea name="Supplier_Name" rows="3" cols="22"></textarea></div>';
				echo '<div id="supn"><span class="b">Supplier Number:  </span><input type="text" name="Supplier_Number" value="" /></div>';
				echo '<div id="mpn"><span class="b">Manufacturer Part Number:  </span><input type="text" name="Manufacturer_Part_Number" value="" /></div>';
				echo '<div id="msn"><span class="b">Manufacturer Serial Number:  </span><input type="text" name="Manufacturer_Serial_Number" value="" /></div>';
				echo '<div id="cnno"><span class="b">NCMR ID:  </span><input type="text" name="NCMR_ID" value="" /></div>';
		echo '</div>';

		 //Nonconformity, Disposition, Comments and Comments & Additional Details
            echo '<div id="box3">';
				echo '<div id="non"><span class="b">Nonconformity:  </span><br /><textarea name="Nonconformity" rows="3" cols="120" ></textarea><br /></div>';
				echo '<div id="dis"><span class="b">Disposition:  </span><br /><textarea name="Disposition" rows="3" cols="120" ></textarea></div>';
				echo '<div id="comm3"><span class="b">Comments:  </span><br /><textarea name="Comments" rows="3" cols="120" ></textarea></div>';
				echo '<div id="caad3"><span class="b">Comments and/or Additional Details:  </span><br /><textarea name="CommentsAdditional_Details" rows="3" cols="120" ></textarea></div>';
		echo '</div>';

		//PO, PO Date, and Date Recieved
		echo '<div id="poinfo">';
				echo '<div id="po"><span class="b">PO:  </span><input type="text" name="PO"  size="7" value="" /></div>';
				echo '<div id="pod"><span class="b">PO Date:  </span><input type="text" name="PO_Date"  size="7" value="" /></div>';
				echo '<div id="dri"><span class="b">Date Received:  </span><input type="text" name="Date_Received"  size="7" value="";
				/></div>';
				echo '<div id="button2"><input type="submit" value="Submit NCMR" name="submit" /></div>';
		echo '</div>';
	echo '</div>';
echo '</fieldset>';
echo '</form>';
}
?>
</body>
</html>

Link to comment
Share on other sites

Code: [select]

 

function sanitize($var)

{

  $var = strip_tags($var);

  $var = htmlentities($var);

  $var = stripslashes($var);

  return mysql_real_escape_string($var);

}

 

 

Now this means that because it has the prior $var in it's new line it takes on the properties of the current definition plus the prior one? And the last $var equals all the prior $var values? Hence why you only need to add this in once?

 

When you use those functions, $var is assigned a completely new value. In this case, each function is returning the new value of $var. So you are re-assigning $var to the returned $var which has been changed. Not sure if that makes any sense or not...

 

I don't mean to sound lazy, but curious, there is no one line of "magic code" once entered could be used to blanket sanitize a variable instead of doing the above code for each variable?

 

Well, since $_POST is an array, you could just sanitize the entire array.

 

function sanitize($var)
{
   $var = strip_tags($var);
   $var = htmlentities($var);
   $var = stripslashes($var);
   return mysql_real_escape_string($var);
}

function sanitize_array($array)
{
$sanitized = array();
if (is_array($array) && !empty($array)) {		
	foreach($array as $key=>$val)
	{
		$sanitized[$key] = sanitize($val);
	}
}

return $sanitized;
}

 

$_POST = sanitize_array($_POST);

Link to comment
Share on other sites

Code: [select]

 

function sanitize($var)

{

  $var = strip_tags($var);

  $var = htmlentities($var);

  $var = stripslashes($var);

  return mysql_real_escape_string($var);

}

 

 

Now this means that because it has the prior $var in it's new line it takes on the properties of the current definition plus the prior one? And the last $var equals all the prior $var values? Hence why you only need to add this in once?

 

When you use those functions, $var is assigned a completely new value. In this case, each function is returning the new value of $var. So you are re-assigning $var to the returned $var which has been changed. Not sure if that makes any sense or not...

 

Make sense to me, I think I was trying to say that, while fumbling around for the right words.

 

I don't mean to sound lazy, but curious, there is no one line of "magic code" once entered could be used to blanket sanitize a variable instead of doing the above code for each variable?

 

Well, since $_POST is an array, you could just sanitize the entire array.

 

function sanitize($var)
{
   $var = strip_tags($var);
   $var = htmlentities($var);
   $var = stripslashes($var);
   return mysql_real_escape_string($var);
}

function sanitize_array($array)
{
$sanitized = array();
if (is_array($array) && !empty($array)) {		
	foreach($array as $key=>$val)
	{
		$sanitized[$key] = sanitize($val);
	}
}

return $sanitized;
}

 

$_POST = sanitize_array($_POST);

 

You know I never thought if it that way. Thanks :)

Link to comment
Share on other sites

I should have mentioned this in my post. Remember that because there is no always-true solution to sanitation, sanitizing the entire $_POST array in this manner may yield unexpected results later on down the road. So I therefore recommend that you instead sanitize each variable individually. Or, if you want, you can sanitize groups of variables if they all require the same sanitation.

 

So using my previous function, something like this:

$sanitized = sanitize_array(
     'var1' => $var1,
     'var2' => $var2,
     'var3' => $var3
);

Link to comment
Share on other sites

I should have mentioned this in my post. Remember that because there is no always-true solution to sanitation, sanitizing the entire $_POST array in this manner may yield unexpected results later on down the road. So I therefore recommend that you instead sanitize each variable individually. Or, if you want, you can sanitize groups of variables if they all require the same sanitation.

 

So using my previous function, something like this:

$sanitized = sanitize_array(
     'var1' => $var1,
     'var2' => $var2,
     'var3' => $var3
);

 

What do you mean it would yield unexpected results?

Link to comment
Share on other sites

I should have mentioned this in my post. Remember that because there is no always-true solution to sanitation, sanitizing the entire $_POST array in this manner may yield unexpected results later on down the road. So I therefore recommend that you instead sanitize each variable individually. Or, if you want, you can sanitize groups of variables if they all require the same sanitation.

 

So using my previous function, something like this:

$sanitized = sanitize_array(
     'var1' => $var1,
     'var2' => $var2,
     'var3' => $var3
);

 

What do you mean it would yield unexpected results?

 

Well, that function offers no control. It simply sanitizes everything in the same way. If, for example, you ever wanted to save data to the database without losing any HTML then you couldn't use this function because of its HTML sanitation.

Link to comment
Share on other sites

I should have mentioned this in my post. Remember that because there is no always-true solution to sanitation, sanitizing the entire $_POST array in this manner may yield unexpected results later on down the road. So I therefore recommend that you instead sanitize each variable individually. Or, if you want, you can sanitize groups of variables if they all require the same sanitation.

 

So using my previous function, something like this:

$sanitized = sanitize_array(
     'var1' => $var1,
     'var2' => $var2,
     'var3' => $var3
);

 

What do you mean it would yield unexpected results?

 

Well, that function offers no control. It simply sanitizes everything in the same way. If, for example, you ever wanted to save data to the database without losing any HTML then you couldn't use this function because of its HTML sanitation.

 

I thought the code was meant to keep from code from being interjected to keep hackers from destroying the database or server.

Link to comment
Share on other sites

I should have mentioned this in my post. Remember that because there is no always-true solution to sanitation, sanitizing the entire $_POST array in this manner may yield unexpected results later on down the road. So I therefore recommend that you instead sanitize each variable individually. Or, if you want, you can sanitize groups of variables if they all require the same sanitation.

 

So using my previous function, something like this:

$sanitized = sanitize_array(
     'var1' => $var1,
     'var2' => $var2,
     'var3' => $var3
);

 

What do you mean it would yield unexpected results?

 

Well, that function offers no control. It simply sanitizes everything in the same way. If, for example, you ever wanted to save data to the database without losing any HTML then you couldn't use this function because of its HTML sanitation.

 

I thought the code was meant to keep from code from being interjected to keep hackers from destroying the database or server.

 

This is SQL injection. It is covered by mysql_real_escape_string() (or if you use prepared statements then you don't have to worry about it).

 

HTML is a different. It isn't going to harm the database. What it may harm is individual users that view it after it is outputted, so this is why you may want to sanitize on output if you want to show HTML.

Link to comment
Share on other sites

There is no single right way to sanitize data. The "right" way depends on what will be done with that data, and what type of data it is. You seem to be trying to find a magic formula where none exists.

 

No, but what I am attempting to figure out, is why are there multiple ways to sanitize data and HTML. As people say here it is a form of programming, and as far as I can tell, there should be a standard.

 

I'm just wondering why there isn't. Wouldn't it make defending against hacking and injections easier?

Link to comment
Share on other sites

There isn't because there can't be. It's entirely dependent upon how the data will be used. A control character that may break a database query string and allow execution of arbitrary SQL has no meaning whatsoever when rendered with html, likewise a chunk of javascript that may cause an end user problems when processed by the browser can be inserted into a database and sit there forever without any ill effects at all.

 

And it may help to stop trying to make a distinction between 'data' and 'html'; within the context of PHP, it's all just data.

Link to comment
Share on other sites

You're a looking for a magical "do this and forget about it" solution for sanitizing. One simply does not exist. You need to sanitize on a per-use basis. That's just the way it is. There are libraries and such that make this process easier, but you still need to do it on a per-use basis.

 

Every specific piece of data that comes into your app needs to be checked that it matches what it should be. For example if someone is putting in a username, it probably doesn't need to contain HTML so strip any of that away. Then if that same someone puts in a forum post, you may want it to contain HTML - so now you can't strip it away. If you used the exact same sanitation routine than the HTML would also be stripped from the forum post, which isn't what you wanted.

 

This is why there can't be a standard, or a magical one-use function. It's just too specific.

Link to comment
Share on other sites

There isn't because there can't be. It's entirely dependent upon how the data will be used. A control character that may break a database query string and allow execution of arbitrary SQL has no meaning whatsoever when rendered with html, likewise a chunk of javascript that may cause an end user problems when processed by the browser can be inserted into a database and sit there forever without any ill effects at all.

 

And it may help to stop trying to make a distinction between 'data' and 'html'; within the context of PHP, it's all just data.

 

What I am saying is there should be a standard for a specific way data is used.

 

In English:

 

 

If Alpha numeric data is inputed into a database, x way is how to deal with it.

If Dates are entered, y way is how to deal with it.

If email addresses, z...

 

Again I am speculating the way it is being used, but the fact is that data can be entered by $post, update, replace, and I am sure I am forgetting one other.

 

But there has to be a universal way to protect each type of situation.

 

The same can be said about viewing data...if it is even needed.

 

I can't imagine someone writing a code for a program and not have a standard way of making it work, nor can I imagine having a standard way of protection either.

 

You all know better than I do about the ways people may attempt to enter data into a database or take over a site maliciously, I can't imagine people developing PHP forms won't have a set line of rules to follow either to protect their user's data.

 

There is a such thing as HTML strict, I can't imagine there isn't a PHP strict.

 

If you think about it,  $GET only does one thing. _$POST only does one thing... logic would dictate that sanitizing against the majority of all attacks for said actions should be one way too...

 

Yes people have different ways of going about it, but it doesn't mean that their way is correct.

 

I have a proposal for people here, at least the ones that are practical teachers of the PHP world...

 

Hash out the best way to sanitize an action as stated above using strict methods.

 

So in other words, make it so the sanitization allows PHP to be used the way it is meant to be, to allow a level of uniformity so that when people ask a question there really is one way of doing it.

 

Such as with HTML and HXTML tables are no longer looked favorable, but that is an opinion.... HTML and XHTML strict will still allow tables.

 

So in English, take opinions throw them out the window with the bathwater, and create a uniformed sanitization script for a PHP Action that will protect that action from everything...

 

If that means we no longer can post www.kaboomlabs.com into a database, then find a way to make it work...

 

Yes it is a lot of work, and yes I am asking a lot, but think of it this way... if you can create a PHP standard that everyone can follow and understand this place would be recognized of making something that everyone can use.... and perhaps PHP would be more manageable then... because honestly when I ask someone for help on one thing, and there are 12 ways of doing it, to me there seems to be an issue... for even though all 12 are right, only one is truly done correctly.

Link to comment
Share on other sites

As people keep saying, the reason there's no one solution to this is because different data needs to be sanitised in different ways. Having a read through just page 3 of this thread, three or four people must have said that already.

 

Like in the most common example, when outputting HTML you're not sanitising against the same problems as when you insert data, because you're not dealing with the same security issues. Having such a complex function to do everything would just over complicate the use of it, or waste resources protecting against things you don't need to. Therefore you have several functions to do different types of sanitation.

 

Instead of fighting back against what experienced developers are telling you, you should start learning the vast amount of security issues there are and how to protect against them. You'll also notice amongst the more experienced developers there are fairly standard (read: the most efficient / correct) ways of doing things.

Link to comment
Share on other sites

Not only that, but the validation side of it is entirely dependent on the context of what the site considers to be valid data.  There's no way to standardize that as the needs for each site varies.  Site A allows user names to contain special characters.  Site B allows user names to contain special characters except ' and `.  Site C doesn't allow user names to contain any special characters.

 

There are a few standard things to do, depending on circumstance - always escaping strings before using them in a db query, always running text through htmlentities before outputting it to the screen, etc. - but there's no good way to take into account all possibilities (are you going to take uploaded files into account?  if so, good luck...), and certainly not in a way that would be useful for the web, where response time is critical.  It's far more efficient to simply write validation and sanitation for the specific needs of a particular site.

Link to comment
Share on other sites

Ok, amuse me here. When you say different kinds of data are validated differently, what do you mean?

 

There really is only four forms of data that I know of.

 

1. Alpha-numerical

2. Dates

3. Alpha

4. Numerical. 

 

What am I missing?

 

Also why does this actually matter when sanitizing?

Link to comment
Share on other sites

Matt, validation and sanitation are two different things.

 

Validation: Is the data valid?  Does it fit what I'm trying to do?

Sanitation: Is the data harmful?

 

That said, validation and sanitation go hand-in-hand.  You check to see if data is valid before running it through whatever process you need to run it through for security's sake. 

 

Example:

 

A hypothetical site requires that passwords be at least 8 characters long, and not contain any special characters except !@#$.  Is that a universal condition?  Of course not.  How could you standardize it?  You can't.

 

Using the example site condition above, if an incoming password didn't match the condition, you'd abort the login process immediately.  You wouldn't continue to the part where you escape it for the query.

 

Dates can be tricky.  How are they entered?  American style?  European style?  Allow for both?  One textbox for the entire thing?  Individual textboxes for each component?  Drop downs?

 

What about phone numbers?  Do you force the user to supply the area code?  What about international numbers?

 

How about files?  Are you just going to check their extensions?  MIME types?  What if they don't have an extension?  Are you going to do any other kind of integrity check?  Are you going to blindly accept any kind of file, or does your web app have a particular focus?

Link to comment
Share on other sites

Sanitization is not about the type/form of data, it's about protecting the user/server from the data. When you use data within a SQL query for example, people can use SQL injections to modify what the query does. When outputting HTML, users can use XSS to inject potentially harmful JavaScript, or just code that will break the HTML and render the page incorrectly. There's no need to protect against XSS when you're inserting data into a database, likewise there's no need to protect against SQL injections when you're displaying data in the browser. That's why we use different sanitization techniques depending on what we're doing.

 

Validating/ensuring the integrity of data is a slightly different, but related subject. For example, if you're storing an email address you want to validate the string matches the form of an email address. If it's currency or a decimal number, you cast the data as a float. If it's a whole number, integer. After validation you would then know these types of input cannot contain SQL injections, because they match a pattern that would make it impossible. Certain data cannot be validated except for say minimum or maximum length, like a forum post. In this case you would just have to sanitize the data to ensure any SQL injections are prevented.

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.