Jump to content

Streamlining HTMLENTITIES


doubledee

Recommended Posts

Security is very important to me, so I have been making sure that all PHP Variables that are being output by my scripts are run through the escape function like this...

 

htmlentities($someVariable, ENT_QUOTES)

 

 

The problem that I am have, however, is that I find that to be a real PITA because it takes already complex code and makes it even harder to read.

 

So, I am wondering if I could do something like this to streamline thing...

 

1.) Create a function

 

function escapeOutput($someVariable){
    $safeOutput = htmlentities($someVariable, ENT_QUOTES);

    return $safeOutput;
}

 

 

2.) And then as I create PHP Variables, run them through my new function like this...

$usename = escapeOutput($username);

$address = escapeOutput($address);  

$subject = escapeOutput($subject);

 

 

3.) Then instead of this mess...

echo "<dl>
		<dt>FROM:</dt>
		<dd>htmlentities($fromData, ENT_QUOTES)</dd>
		<dt>TO:</dt>
		<dd>htmlentities($toData, ENT_QUOTES)</dd>
		<dt>DATE:</dt>
		<dd>" . htmlentities($msgDate, ENT_QUOTES)
		. "</dd>
		<dt>SUBJECT:</dt>
		<dd><b>" . htmlentities($subject, ENT_QUOTES) . "</b></dd>\n\n
		<dt></dt>";

 

 

I could have the streamlined...

echo "<dl>
		<dt>FROM:</dt>
		<dd>$fromData</dd>
		<dt>TO:</dt>
		<dd>$toData</dd>
		<dt>DATE:</dt>
		<dd>$msgDate</dd>
		<dt>SUBJECT:</dt>
		<dd>$subject</dd>
		<dt></dt>";

 

What do you gurus think?    :shrug:

 

 

Debbie

 

Link to comment
Share on other sites

Looks fine to me. You might want to also specify an encoding as a third parameter for the htmlentities() function (probably UTF8).

 

If it were me, I would add support for arrays - so that you can pass an associative array to the function and get back a sanitized associative array. This way you don't have to run each key manually.

 

Also, if I may be nitpicky, your function name doesn't accurately describe what the function does - since you're technically not escaping anything but rather converting it.

Link to comment
Share on other sites

Scootstah,

 

You can be nitpicky if I can follow up with more questions!  :P

 

 

Looks fine to me. You might want to also specify an encoding as a third parameter for the htmlentities() function (probably UTF8).

 

Encoding is a scary topic, and one I started a thread on here before with no good recommendations.

 

I'm not sure what I am using, to be honest, and I know from what I have read, that supporting International Character Sets between PHP and MySQL can be a real b*tch...

 

Think I'll pass on that one until I understand the topic better.

 

 

If it were me, I would add support for arrays - so that you can pass an associative array to the function and get back a sanitized associative array. This way you don't have to run each key manually.

 

That is an excellent idea!!

 

Care to share how you'd do that?

 

 

Also, if I may be nitpicky, your function name doesn't accurately describe what the function does - since you're technically not escaping anything but rather converting it.

 

Okay, so what would be a better name?

 

I'm open to more accurately describing what I am doing.

 

To be honest, maybe I don't totally get what HTMLENTITIES is really doing for me...  :shy:

 

 

Debbie

 

 

Link to comment
Share on other sites

Okay, don't say I don't try things myself!!!

 

What do you think about this?!

function str2htmlentities($var){
	/**
	 * Convert all applicable characters to HTML entities.
	 *
	 * To display reserved characers (e.g. < >) we need to use
	 * the function htmlentities to convert text to the appropriate HTML Entity.
	 *
	 * This will also help prevent against Cross-Site Scripting (XSS) attacks.
	 *
	 * Returns either a scalar variable or an array
	 *
	 *
	 * @param		{String, Array}		$var
	 * @return	String
	 */

	// Check Data-Type.
	if (is_scalar($var)){
		// Variable is Scalar.
		$converted = htmlentities($var, ENT_QUOTES);

	}elseif (is_array($var)){
		// Variable is Array.
		$converted = array_map('htmlentities', $var);

	}else{
		// Invalid Data-Type.
		$_SESSION['resultsCode'] = 'FUNCTION_HTMLENTITIES_INVALID_TYPE_5004';

		// Set Error Source.
		$_SESSION['errorPage'] = $_SERVER['SCRIPT_NAME'];

		// Redirect to Display Outcome.
		header("Location: " . BASE_URL . "/account/results.php");

		// End script.
		exit();
	}//End of CHECK DATA-TYPE

	return $converted;
}//End of str2htmlentities

 

 

<?php

// Access Constants.
require_once('config/config.inc.php');

// Access Functions.
require_once('utilities/functions.php');

// Set Variables.
$username = 'DoubleDee';

$ages['John'] = 32;
$ages['Mary'] = 25;
$ages['Sally'] = 41;

$favoriteTags=array("<b>", "<p>", "<html>");


// Convert Variables.
$username = str2htmlentities($username);

$ages = str2htmlentities($ages);

$favoriteTags = str2htmlentities($favoriteTags);


// Output Variables.
echo '<p>$username = ' . $username . '</p><br />';

foreach($ages as $key => $value){
	echo $key . ' is ' . $value . ' years old.<br />';
}

echo "<br /><p>My favorite HTML tags include:</p>";
foreach($favoriteTags as $value){
	echo "$value<br/>";
}

?>

 

 

Debbie

 

 

Link to comment
Share on other sites

Encoding is a scary topic, and one I started a thread on here before with no good recommendations.

 

I'm not sure what I am using, to be honest, and I know from what I have read, that supporting International Character Sets between PHP and MySQL can be a real b*tch...

 

Think I'll pass on that one until I understand the topic better.

 

Well, you still ought to pick an encoding (UTF8 is pretty standard) and make sure everything is the one you choose. It's not particularly difficult and is beneficial in the long run.

 

 

If it were me, I would add support for arrays - so that you can pass an associative array to the function and get back a sanitized associative array. This way you don't have to run each key manually.

 

That is an excellent idea!!

 

Care to share how you'd do that?

 

Probably with recursion. Check if your input is an array and then loop through it and re-call the function from within itself. This way you can easily traverse through multi-dimensional arrays without any extra effort.

 

 

Okay, so what would be a better name?

 

Other developers have used some variation of the "html entities" name to solve the same problem. Ultimately you are still using the htmlentities() function, but just wrapping it up first.

 

 

To be honest, maybe I don't totally get what HTMLENTITIES is really doing for me...  :shy:

 

It is converting symbols and such to their HTML entities. You can see a list of them here (excuse the w3schools link, their HTML entities charts happen to be good references).

 

Essentially you are preventing XSS attacks by removing your users' ability to add markup to any dynamic content. < and > tags will be changed to their entities - < and > respectively.

Link to comment
Share on other sites

Okay, don't say I don't try things myself!!!

 

What do you think about this?!

function str2htmlentities($var){
	/**
	 * Convert all applicable characters to HTML entities.
	 *
	 * To display reserved characers (e.g. < >) we need to use
	 * the function htmlentities to convert text to the appropriate HTML Entity.
	 *
	 * This will also help prevent against Cross-Site Scripting (XSS) attacks.
	 *
	 * Returns either a scalar variable or an array
	 *
	 *
	 * @param		{String, Array}		$var
	 * @return	String
	 */

	// Check Data-Type.
	if (is_scalar($var)){
		// Variable is Scalar.
		$converted = htmlentities($var, ENT_QUOTES);

	}elseif (is_array($var)){
		// Variable is Array.
		$converted = array_map('htmlentities', $var);

	}else{
		// Invalid Data-Type.
		$_SESSION['resultsCode'] = 'FUNCTION_HTMLENTITIES_INVALID_TYPE_5004';

		// Set Error Source.
		$_SESSION['errorPage'] = $_SERVER['SCRIPT_NAME'];

		// Redirect to Display Outcome.
		header("Location: " . BASE_URL . "/account/results.php");

		// End script.
		exit();
	}//End of CHECK DATA-TYPE

	return $converted;
}//End of str2htmlentities

 

Hmm. The first problem I see is that array_map isn't going to work for multi-dimensional arrays. Also, the error handling is unnecessary here - if it doesn't match expected data type just ignore it.

 

Here is my take:

function entities($input)
{
if (is_array($input)) {	
	$clean = array();		
	foreach($input as $key => $val)
	{
		$clean[$key] = entities($val);
	}
	return $clean;

}

return htmlentities($input, ENT_QUOTES);	
}

Link to comment
Share on other sites

Hmm. The first problem I see is that array_map isn't going to work for multi-dimensional arrays. Also, the error handling is unnecessary here - if it doesn't match expected data type just ignore it.

 

Way to burst my bubble!  (Just when I thought I figured something out on my own...)  :(

 

 

 

Here is my take:
function entities($input)
{
if (is_array($input)) {	
	$clean = array();		
	foreach($input as $key => $val)
	{
		$clean[$key] = entities($val);
	}
	return $clean;

}

return htmlentities($input, ENT_QUOTES);	
}

 

 

So that is all I need?

 

And that will handle all multi-dimensional arrays?

 

 

BTW, how do I do what you are saying here...

Well, you still ought to pick an encoding (UTF8 is pretty standard)

 

...in your above function?

 

Thanks,

 

 

Debbie

 

Link to comment
Share on other sites

I would still use array_map(), but in such a way that it will work for multidimensional arrays. For the encoding you can just define it inside the function.

function entities($input)
{
    if (is_array($input))
    {
        return array_map('entities', $input);
    }
    return htmlentities($input, ENT_QUOTES, 'UTF-8');
}

 

EDIT: Fixed a typo in code

Link to comment
Share on other sites

I just created a wrapper for htmlentities with my defaults and a shorter name:

 

function hent($str, $type=ENT_QUOTES, $char='UTF-8'){
return htmlentities($str, $type, $char);
}

 

To add your array support one could do:

function hent($str, $type=ENT_QUOTES, $char='UTF-8'){
if (is_array($str)){
	foreach ($str as &$v){
		$v=hent($v, $type, $char);
	}

	return $str;
}
else {
	return htmlentities($str, $type, $char);
}
}

 

Or if your on 5.3 or better:

function hent($str, $type=ENT_QUOTES, $char='UTF-8'){
if (is_array($str)){
	return array_map(function($s) use ($type,$char){ return hent($s, $type, $char); }, $str);
}
else {
	return htmlentities($str, $type, $char);
}
}

 

Link to comment
Share on other sites

I would still use array_map(), but in such a way that it will work for multidimensional arrays. For the encoding you can just define it inside the function.

function entities($input)
{
    if (is_array($input))
    {
        return array_map('entities', $input);
    }
    return htmlentities($input, ENT_QUOTES, 'UTF-8');
}

 

EDIT: Fixed a typo in code

 

 

1.) Shouldn't it be...

        return array_map('htmlentities', $input);

 

 

2.) How is your code different from mine?

 

 

3.) How does your code handle a multi-dimensional array?

 

I don't see how it is recursive like scootstah's code.

 

 

Debbie

 

Link to comment
Share on other sites

I would still use array_map(), but in such a way that it will work for multidimensional arrays. For the encoding you can just define it inside the function.

function entities($input)
{
    if (is_array($input))
    {
        return array_map('entities', $input);
    }
    return htmlentities($input, ENT_QUOTES, 'UTF-8');
}

 

EDIT: Fixed a typo in code

 

 

1.) Shouldn't it be...

        return array_map('htmlentities', $input);

 

 

2.) How is your code different from mine?

 

 

3.) How does your code handle a multi-dimensional array?

 

I don't see how it is recursive like scootstah's code.

 

 

Debbie

 

 

1. Nope, that's how it's recursive, it "calls itself", although this can be achieved another way


function entities($input, $quotes = ENT_QUOTES, $charset = 'UTF-8')
{
   if ( is_array($input) )
   {
      array_walk_recursive($input, function(&$v, $k, $params) { $v = htmlentities($v, $params[0], $params[1]); }, array($quotes, $charset));
      return $input;
   }
   return htmlentities($input, $quotes, $charset);
}

2. I can't work out whose code is whose anymore lol, probably that it uses a built-in PHP function rather than foreach.

 

 

3. It's recursive, so:

 

 

$arr = array('on>', array('tw>'), '"thr');
// first iteration
function entities($input)
{
    // [ 'on>', [ 'tw>' ], '"thr' ] is array, map it with entities function (run entities on each value)
    if (is_array($input))
    {
        return array_map('entities', $input);
    }
    return htmlentities($input, ENT_QUOTES, 'UTF-8');
}
// $arr = array();

// second iteration
function entities($input)
{
    {
        return array_map('entities', $input);
    }
    // 'on>' is not array, return htmlentities 
    return htmlentities($input, ENT_QUOTES, 'UTF-8');
}
// $arr = array('on>'); 

// third iteration
function entities($input)
{
    // [ 'tw>' ] is array, map it with entities function (run entities on each value)
    if (is_array($input))
    {
        return array_map('entities', $input);
    }
    return htmlentities($input, ENT_QUOTES, 'UTF-8');
}
// $arr = array('on>', array()); 

// fourth iteration
function entities($input)
{
    if (is_array($input))
    {
        return array_map('entities', $input);
    }
     // 'tw>'  is not array, return htmlentities   
    return htmlentities($input, ENT_QUOTES, 'UTF-8');
}
// $arr = array('on>', array('tw>')); 

// fifth iteration
function entities($input)
{
    if (is_array($input))
    {
        return array_map('entities', $input);
    }
     // '"thr'  is not array, return htmlentities   
    return htmlentities($input, ENT_QUOTES, 'UTF-8');
}
// $arr = array('on>', array('tw>'), '"thr'); 

Hope that helps, for the record I wouldn't do it this way at all, I'd use the way you were already doing, that way you only need to store one variable, and can escape it as-per for MySQL, display etc.

 

 

If you wish to make it more bearable the "ENT_QUOTES" constant resolves to (int)3 so you could use:

htmlentities($in, 3, 'UTF-8');

 

 

As far as character encoding, I always use UTF-8, it's the go-to-encoding if you don't know anything about character encoding (like myself), and just make sure everything's in sync, I.E.

 

 

HTML pages have:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" >

PHP (untrusted variables) are output using:

htmlentities($var, 3, 'UTF-8');

MySQL charset (default is 'latin1_sweedish_ci') is set to utf8_general_ci or utf8_unicode_ci (see attachment), and you run:

mysql_query("SET NAMES 'UTF8'");
// or PDO

$dbh = new \PDO('mysql:dbname='. DB .';host='. DB_HOST, DB_USER, DB_PASS);
$dbh->setAttribute(\PDO::MYSQL_ATTR_INIT_COMMAND, "SET NAMES 'UTF8'");

 

 

And you should be OK.

post-67458-13482403554837_thumb.png

post-67458-1348240355515_thumb.png

Link to comment
Share on other sites

I much prefer Psycho's version. It's simple really; if an array is passed to entities() it will array_walk() through it, and pass back the escaped array. If any item in that array happens to also be an array, the same happens but a level deeper; the escaped array is passed back, which is then passed back as part of it's parent. That can happen infinite levels, essentially.

 

If you wish to make it more bearable the "ENT_QUOTES" constant resolves to (int)3 so you could use:

htmlentities($in, 3, 'UTF-8');

 

Eek! There's a reason constants are used -- to prevent future change breaking things! Also for readability, for anyone else who doesn't happen to know that ENT_QUOTES == 3. Given the manual doesn't generally document constant values either, it can be a pain in the arse to work out these kind of things.

Link to comment
Share on other sites

Kicken,

 

Can you please help me understand your code?  (I'm still pretty shaky on array...)  :(

 

 

I just created a wrapper for htmlentities with my defaults and a shorter name:

 

function hent($str, $type=ENT_QUOTES, $char='UTF-8'){
return htmlentities($str, $type, $char);
}

 

To add your array support one could do:

function hent($str, $type=ENT_QUOTES, $char='UTF-8'){
if (is_array($str)){
	foreach ($str as &$v){
		$v=hent($v, $type, $char);
	}

	return $str;
}
else {
	return htmlentities($str, $type, $char);
}
}

 

What is going in here...

 

foreach ($str as &$v){

 

 

Debbie

 

Link to comment
Share on other sites

The key was left out because it isn't needed. "&" means loop through by reference. Instead of putting a copy of each array item into the variable, the variable is just a reference. That means you can modify the original array by just changing $value. The following would produce the same result:

 

foreach ($array as $key => $value) {
$array[$key] = htmlentities($value);
}

foreach ($array as &$value) {
$value = htmlentities($value);
}

 

Worth noting though that after the second method, $value would still exist as a reference. In this situation it's not a problem because nothing happens afterwards (except the array is returned to the parent caller), but if anything else did happen after within that function you should always unset the reference.

Link to comment
Share on other sites

I tweaked the function like this...

function str2htmlentities($input, $type=ENT_QUOTES, $char='UTF-8'){
	if (is_array($input)){
		foreach ($input as $key => $value){
			$v = str2htmlentities($value, $type, $char);
		}

		return $input;
	}else{
		return htmlentities($input, $type, $char);
	}
}

 

 

Is that way okay??

 

It seems to work, but I am still a little shaky on what is going on even as I step through the code in NetBeans.  (NetBeans takes a few strange hops as you cycle through everything?!)

 

 

Debbie

 

 

Link to comment
Share on other sites

I tweaked the function like this...

   function str2htmlentities($input, $type=ENT_QUOTES, $char='UTF-8'){
      if (is_array($input)){
         foreach ($input as $key => $value){
            $v = str2htmlentities($value, $type, $char);
         }

         return $input;
      }else{
         return htmlentities($input, $type, $char);
      }
   }

 

 

Is that way okay??

 

It seems to work, but I am still a little shaky on what is going on even as I step through the code in NetBeans.  (NetBeans takes a few strange hops as you cycle through everything?!)

 

 

Debbie

 

 

$v doesn't exist there, should be:

 

 

   function str2htmlentities($input, $type=ENT_QUOTES, $char='UTF-8'){
      if (is_array($input)){
         foreach ($input as $key => $value){
            $input[$key] = str2htmlentities($value, $type, $char);
         }

         return $input;
      }else{
         return htmlentities($input, $type, $char);
      }
   }

Link to comment
Share on other sites

$v doesn't exist there, should be:

 

            $input[$key] = str2htmlentities($value, $type, $char);

 

Wow!  Good catch!!

 

Okay, so to be sure it is...

function str2htmlentities($input, $type=ENT_QUOTES, $char='UTF-8'){
	if (is_array($input)){
		foreach ($input as $key => $value){
			$input[$key] = str2htmlentities($value, $type, $char);
		}

		return $input;
	}else{
		return htmlentities($input, $type, $char);
	}
}

 

Right?  (BTW, how do I know this function is actually working?!  I mean had it not been for you and that last catch, I would have never know that things weren't coded/working properly, because neither stepping through NetBeans, nor looking at the output gave anything away...  :o

 

 

--------------

And how does that compare to Psycho's...

function entities($input){
    if (is_array($input)){
        return array_map('entities', $input);
    }
    return htmlentities($input, ENT_QUOTES, 'UTF-8');
}

 

 

Is there any reason why I would want to use one versus the other?

 

My main goal of this exercise - in addition to obviously streamlining how I use HTMLEntities - was to be able to handle Multi-Dimensional Arrays should they come up.

 

Thanks,

 

 

Debbie

 

 

Link to comment
Share on other sites

Both can handle multi-dimensional arrays, but I don't see much reason to manually loop through the array, when you can use a native function to do it for you.

 

 

Although her code allows you to set the ENC type flag and character encoding.

 

 

@Debbie

array_map will be faster than foreach as it's a native PHP function so the looping is executed in C and doesn't need to be interpreted. Although the difference in execution time will not be significant in this case.

 

 

If you are running PHP >= 5.3 and are pretty sure you always will, my array_map_recursive code will allow you to specify parameters and use a native function, but to be honest, if you understand how your function works, just go with that.

 

 

As for knowing whether the function has worked:

 

   function str2htmlentities($input, $type=ENT_QUOTES, $char='UTF-8'){
      if (is_array($input)){
         foreach ($input as $key => $value){
            $input[$key] = str2htmlentities($value, $type, $char);
         }

         return $input;
      }else{
         return htmlentities($input, $type, $char);
      }
   }
$array = array('<>', array('"<>"'), '">');
$array = str2htmlentities($array);
echo '<pre>'. print_r($array, 1);

 

View page source should output < as > > as < and " as "

Link to comment
Share on other sites

(BTW, how do I know this function is actually working?!  I mean had it not been for you and that last catch, I would have never know that things weren't coded/working properly, because neither stepping through NetBeans, nor looking at the output gave anything away...  :o

 

Run the string: <b>this is bold</b> through the function.

 

If you get: <b>this is bold</b> then it works.

 

If you get: this is bold then it doesn't work.

Link to comment
Share on other sites

(BTW, how do I know this function is actually working?!  I mean had it not been for you and that last catch, I would have never know that things weren't coded/working properly, because neither stepping through NetBeans, nor looking at the output gave anything away...  :o

 

Run the string: <b>this is bold</b> through the function.

 

If you get: <b>this is bold</b> then it works.

 

If you get: this is bold then it doesn't work.

 

@scootstah: But, you'd also want to throw it a multi-dimensional array with values that would need to be escaped as well.

 

@Debbie: You know what the function is supposed to do, so you should know how to test it to see if it works. Writing some code and posting it here asking if it will work or not can be perceived as arrogant. The function is supposed to encode strings values or string within arrays (including multidimensional arrays). So, pass it both types of values and verify the results.

 

$testString = "<b>This is a bold string value</b>";
$testArray = array(array("<b>This is a bold mutidimensional array value</b>"));

echo "<br>String before encoding: " . $testString;
echo "<br>String after encoding: " . entities($testString);
echo "<br>Array before encoding: <pre>" . print_r($testArray, true) . "</pre>";
echo "<br>Array after encoding: <pre>" . print_r(entities($testArray), true) . "</pre>";

 

Expected Output (using my function):

String before encoding: This is a bold string value

String after encoding: <b>This is a bold string value</b>

 

Array before encoding:

Array (

    [ 0 ] => Array (

        [ 0 ] => This is a bold mutidimensional array value

    )

)

Array after encoding:

Array (

    [ 0 ] => Array (

        [ 0 ] => <b>This is a bold mutidimensional array value</b>

    )

)

Link to comment
Share on other sites

View page source should output < as > > as < and " as "

 

THAT was the key thing I was missing and that a lot of people fail to point out!!!

 

It doesn't matter if you see < or > or <> on your screen, it is how it is being displayed in the View Source that matters...

 

Thanks!!!

 

 

Debbie

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.