Jump to content
LLLLLLL

Opinion: strip_tags on field that will be encrypted in the database

Recommended Posts

I'm curious to get opinions on using strip_tags() for fields that will be encrypted in a database. I often see websites that say "choose a password that contains X certain characters but not Z other characters." And I got curious.

 

Let's say there's a registration form where a new user creates a username and password, and the server will store the password as ... 

sha1( $user_entered_value )

... or some other sort of hashed/encrypted string.

 

In this case, why would it ever matter that a user had entered <div> or some other such text in their password? The password will only ever be hashed into something before it is matched... so why would you bother stripping tags? Why bother preventing any "special" characters?

 

Thoughts?

Share this post


Link to post
Share on other sites

There is no good reason to use strip_tags() or any other functions that will modify the content of a password other than a valid hashing algorithm. There are very few reasons why strip_tags() needs to be used for any purposes. Data should be properly escaped for the processes it will be used for. For DB operations, prepared statements should be used. For output to a web page, htmlspecialchars() or htmlentities() should be used. For any use of data, determine how data should be escaped rather than trying to remove problematic characters.

Share this post


Link to post
Share on other sites

Never remove any characters from a password. This is extremely dangerous, because it weakens the input. For example, running the strong password

<(MX/aw9O(DK

through strip_tags yields an empty string, which means you've literally removed the password from the account.

 

In fact, you shouldn't manipulate any user input. You're free to validate and reject data, but silently changing it can have bad consequences ranging from total confusion to serious security vulnerabilities (as you just saw).

 

If a website doesn't accept “special characters”, there's something wrong. Either they're trying to cover up bugs or vulnerabilities. Or they simply have no idea what they're doing and make up arbitrary rules. In both cases, I'd be worried.

 

Passwords should generally have no restrictions at all. There are only two exceptions: technical limitations of the password hash algorithm and usability issues. For example, the bcrypt algorithm chokes on null bytes and very long strings, so it's OK to reject those (there are smarter solutions, though). And it makes sense to only allow printable characters, because anything else was probably sent accidentally.

 

Note that SHA-1 is completely unsuitable for password hashing, because it can easily be attacked with brute force. Current GPUs are able to calculate billions of SHA-1 hashes per second. You need an actual password hash algorithm like bcrypt. This is available through the Password Hash API.
 

Edited by Jacques1
  • Like 1

Share this post


Link to post
Share on other sites

I agree with these replies but... I guess I wonder why SO MANY websites still say that some characters are unallowed. Mmmph. And I'm talking about major websites like large banks, financial institutions, or other similar corporations

Edited by timneu22

Share this post


Link to post
Share on other sites

Big corporations don't necessarily make smart decisions. The financial industry in particular is full of absurd policies, crazy rituals, legacy garbage and plain incompetence.

 

For example, some banks have blacklisted words like “select” and “drop”, because they believe this is how to deal with SQL injection vulnerabilities. A lot of big websites don't let the user copy and paste the password into the log-in form, which makes it very difficult to use strong passwords stored in a password manager. And then there's the password policy madness. The rules and restrictions make no sense whatsoever from a technical standpoint. They're arbitrary and counter-productive. For example, I'm used to generating long, purely random passwords, and many amateur sites can handle those just fine. But several big “professional” websites force me to reduce the length, reduce the character space or introduce patterns, which means I have to sabotage my own security just to make some fucking validation procedure happy.

 

Learning from others makes a lot of sense, but learn from smart programmers, not big companies.

  • Like 1

Share this post


Link to post
Share on other sites

Yeah, no kidding. I always enjoy a large website saying "letters and numbers only."  Seriously??

Share this post


Link to post
Share on other sites

There is a banking site I have to use that is tied to my health insurance. When creating my security questions, one was "What's your favorite band?". I provided the answer "U2" and went along my way. Fast forward many months to when I needed to reset my password. They provided that question above and when I entered the answer and submitted it I was greeted with the error message that my answer was too short.  :o

 

I am in total agreement with Jacques1. You should never change a user's input. Plus, in the vast majority of instances, it isn't necessary to disallow certain characters for 'security' at all. If you are doing that, it's probably because the code isn't written securely to begin with. Sort of like creating a lock that could be jimmied with a pencil and proclaiming no one may have a pencil - instead of just fixing the lock.

Share this post


Link to post
Share on other sites

I would reject passwords with non-printable characters, though:

<?php

/**
 * Regular expression for validating passwords
 *
 * Only printable characters (including whitespace) are allowed, because all other characters were probably sent
 * erroneously and may cause problems with the underlying hash algorithm.
 */
const PASSWORD_PATTERN = '/\\A[[:print:]]+\\z/u';



$password = "foo\0bar";

if (preg_match(PASSWORD_PATTERN, $password))
{
    echo 'Valid password';
}
else
{
    echo 'Invalid password';
}

Unfortunately, the current de-facto standard for hashing (bcrypt) is rather fragile in terms of input processing, so this either requires extra validation or pre-hashing with a more robust algorithm like SHA-2:

<?php

$password = "\0".str_repeat('A', 56);    // not safe for bcrypt

/*
 * Pre-hash password with SHA-256
 *
 * bcrypt cannot handle long passwords (> 56 bytes) and passwords containing null bytes. To get around this issue, the
 * input is hashed with SHA-256, and the Base64-encoded hash is then passed to bcrypt. This makes sure both the length
 * and the characters are bcrypt-safe.
 */
$binaryPrehash = hash('sha256', $password, true);
$encodedPrehash = rtrim(base64_encode($binaryPrehash), '=');

$hash = password_hash($encodedPrehash, PASSWORD_BCRYPT, ['cost' => 14]);

var_dump($hash);

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.