Jump to content

length of a string


ajoo
Go to solution Solved by Jacques1,

Recommended Posts

Hi all,

 

The output of the following line of code where user is Jack1234

$user = filter_input(INPUT_POST, 'user', FILTER_SANITIZE_STRING);

using var_dump is 

string 'Jack1234' (length=

I would like to know if there is someway we can use/retrieve the length value of the string that is displayed in the output or verify the string length against it. 

 

Thanks.

Link to comment
Share on other sites

No, no, no.

 

First of all, strlen() usually counts the number of bytes, which is probably not what you want. If you use a modern multi-byte character encoding like UTF-8 (which you should), then you'll get “weird” results. For example, the function will tell you that the length of “Jörg” is 5, because UTF-8 happens to encode the “ö” umlaut with 2 bytes.

 

The implementation of strlen() also depends on the PHP environment, which means you may get different results on different machines. Some enviroments count bytes, some count characters of a particular encoding. Good luck debugging this.

 

Long story short: Do not use this function.

 

If you want to count characters, you need to take the encoding into account. Is it UTF-8? ISO 8859-1? ASCII? Something else? PHP needs to know. The mbstring extension provides an encoding-aware length function which correctly counts the number of characters in a string:

<?php

$name = 'Jörg';
$name_length = mb_strlen($name, 'UTF-8');

// this says 4 as expected
var_dump($name_length);

If you actually do want to count bytes, then you still need the mbstring extension:

<?php

$name = 'Jörg';
$name_byte_length = mb_strlen($name, '8bit');	// 8bit means: count the raw bytes

// this says 5 as expected
var_dump($name_byte_length);

Besides that, FILTER_SANITIZE_STRING is a horribly misnamed and utterly useless fossile from the early days of PHP when the developers had no idea what they're doing. It randomly mangles the input in a desparate attempt to somehow make it secure for HTML contexts. For example, the name

Peter's mother, I <3 cookies

becomes

Peter's mother, I

I'm fairly sure this is not what you want. You only picked the filter_input() function because the name sounded good, right?

 

It's great that you're worried about input filtering. But this requires careful analysis of the specific goal and context. There's no magical filter function which somehow makes everything nice and secure (even though the PHP devs like to promise that). For now, I suggest you forget about this filter_input() stuff. It's wrong and useless except for a few special cases.

Edited by Jacques1
  • Like 2
Link to comment
Share on other sites

Hi all ! Oh wow ! SO many replies. Hi Jacques. Thanks again for cautioning me to not use filter_input. I am dying to drop it but kindly suggest an alternative. Yes the charset is UTF-8. The goal is to have a safe user input in a form which has the following fields:

 

1. Userlogin which I wish to limit to 40 characters ( as also the length of VARCHAR in the database). That was the reason for wanting to use the string length.

2. Password.  

3. Names

4. Address Fields.

5 Gender.

6. Phones

7. Cell Phones.

8. City, State and Country. 

11. email

 

 

As you can see all of these are required to be alphanumeric strings and some of them require characters like '.', '-', '+','_', and maybe some more.  I would like to limit the length of most of these strings. For example I would like the phone string not to exceed say 13 characters. 

 

I am also using the inbuilt filters for INTEGERS, EMAILS and a REGEX for the gender field.  So if FILTER_SANITIZE_STRING is wrong and I am sure it is if you say so & as also explained by you, I'ld like to ask you how should we filter these then to ensure that they are safe or at least have lengths within the ranges that we want them to be? 

 

And yes I am changing all mysql statements to mysqli prepared statements.  

 

Thanks again all for the replies and looking forward to some more.   

Link to comment
Share on other sites

Input filtering is overrated. I understand that people feel to need to “clean up” the incoming data, but this is a rather vague goal. When you look at the hard facts, you'll see that input filtering rarely does what you expect from it and sometimes even causes problems.

 

You want security? Then filtering is not the answer. For example, a perfectly valid e-mail address can still be used for SQL injection or cross-site scripting attacks. And how do you “filter” a password? Passwords are by definition unrestricted, because any kind of limitation would weaken them (well, we do have to limit the lenght sometimes).

 

The thing is this:

  • Injection attacks are an interpretation problem, not an input problem. There's nothing wrong with a user entering a word like “SELECT” or a symbol like “<”. It's all just text. If your application does strange things with this text, then that's the problem.
  • Protection always depends on the specific context. For example, preventing SQL-based attacks is entirely different from preventing HTML-based attacks, because SQL and HTML are two entirely different languages. There is no magical one-size-fits-all filter.

Using prepared statements to prevent SQL injections is a much better idea, because it actually makes sure the that data is interpreted correctly, and it's meant for a specific context (SQL). To prevent attacks against your HTML documents, you'll mostly need HTML-escaping.

 

Of course you may use filters to validate the incoming data. But this is not a security feature, and you need to be aware of several problems:

  • Formal validation doesn't prevent users from lying. For example, “bill@microsoft.com” is a perfectly valid e-mail address and probably even exists, but it's obviously not mine.
  • Overzealous validation can frustrate legitimate users. For example, human names and addresses vary greatly accross different countries and cultures. If you expect every name to only contain latin characters, then a large part of the world population won't be able to use your website.
  • If you silently “correct” the user input like in your code above, then users won't get what they meant. This again leads to frustration and can make them leave your site forever. When I enter a username, I want that exact name. If it's somehow not allowed on your site, I want you to tell me so that I can choose a different one. But I certainly don't want you to manipulate my input.
Link to comment
Share on other sites

Hi Jacques and all,

 

Thanks again for the reply and I have been reading and trying to understand what you are saying here. Actually I just want to validate data (not sanatize it) and mostly use filter_input validation functions but there is none for strings and so I had to use the SANATIZE in my example. Besides like I said earlier I was just looking to check / validate the the string length was within limits.

 

One I would like to ask how is it possible to use a perfectly legal email ID to carry out any attack, SQL Injection or any other kinds?

 

What I gather is that if we do not use any kinds of filters we are good if we use Prepared Statements  for mysql  queries. And we must escape all HTML output ( Strings ) with htmlspecialchars, htmlentities, etc other such functions. If we are doing this then there is no need to filter the input. Is that correct? 

 

Sorry if I sound so confused, but then security is an extremely confusing topic. And there is everybody - well almost - cautioning to use filters.

 

Thanks for the answers. Look forward to some more. 

Link to comment
Share on other sites

Thanks again for the reply and I have been reading and trying to understand what you are saying here. Actually I just want to validate data (not sanatize it) and mostly use filter_input validation functions but there is none for strings and so I had to use the SANATIZE in my example.

 

Well, that's not really a good reason for using a function. Sure, “sanitzing a string” sounds great, but does that even mean? It's such an incredibly vague term that it could mean almost anything.

 

When you write code, it's very important that you know exactly what you want to achieve. For example: “I want to help my users recognize errors when they fill out the form.” or “I want to prevent SQL injections.” Then come up with a concrete approach and finally choose the function which does exactly that.

 

In case of the length check, you need mb_strlen(). The rest of your checks may be useless (depending on what you want to achieve).

 

 

 

 

One I would like to ask how is it possible to use a perfectly legal email ID to carry out any attack, SQL Injection or any other kinds?

 

The specification of e-mail addresses wasn't written with SQL injections in mind, so it would be pure luck if e-mail addresses were immune to injections. Unfortunately, they aren't:

"'OR(1)#"@foo.com

This is actually a valid address which is accepted by FILTER_VALIDATE_EMAIL. But at the same time it will lead to an SQL injection if inserted into an SQL string:

SELECT is_admin
FROM users
WHERE email_address = '$email'

This becomes

SELECT is_admin
FROM users
WHERE email_address = '"' OR (1)	# @foo.com

In other words, it will neutralize the WHERE clause and return the admin status of the first user (who is usually indeed the admin) rather than the status of the current user.

 

This clearly shows that injection attacks are not an input problem. I gave you the exact input you asked for, yet still I can attack your database system. Just because a string is valid according to the e-mail syntax rules doesn't mean that it's harmless in an SQL context. Those are two entirely different things.

 

 

 

What I gather is that if we do not use any kinds of filters we are good if we use Prepared Statements  for mysql  queries. And we must escape all HTML output ( Strings ) with htmlspecialchars, htmlentities, etc other such functions. If we are doing this then there is no need to filter the input. Is that correct?

 

Yes. Or more specifically: Input filtering has nothing to do with security at all. It you want security, you need to make sure that the input is interpreted correctly in the specific target context (SQL, HTML or whatever). That's why we use prepared statements for SQL queries and HTML-escaping for HTML documents: They guarantuee that the input is treated as data rather than code.

 

 

 

Sorry if I sound so confused, but then security is an extremely confusing topic. And there is everybody - well almost - cautioning to use filters.

 

Web security is no rocket science. The problem is that there's too much bad advice out there, and PHP itself is full of bullshit.

 

So don't go with ideas or functions just because they sound good. Think about the problem: What do you want to achieve? How do you achieve it?

 

Anyway, it's great that you care about security and want to learn more about it. This is very rare in the PHP world.

Link to comment
Share on other sites

Filtering is still ok to use in forms, it will let users know it's required or something not correct with it. What part of the form was wrong.

 

But as Jacques explained is not something you should rely on, all input should be escaped and also the data you expect it to be.

 

Look into constraint validation for forms

 

You can mess around with this form setting minimum and maximum values.

<form action="" method="post">
Name: <input type="text" user="" pattern=".{3,10}" required placeholder="3 to 10 characters">
<input type="button" name="submit" value="Go">
</form>

min="3" max="10" could also work

Link to comment
Share on other sites

Note that you cannot rely on HTML-based validation at all. The user may choose to ignore it, or the browser may not support it, or the user may not use a browser in the first place.

 

So if you use this, be aware that the data getting sent to your server is still entirely unrestricted. If you ask for 3 digits, you may get 100 Chinese symbols instead. If that's a problem, you need server-side validation or a combination of client-side and server side validation.

Link to comment
Share on other sites

Hi Thanks all, 

 

Sorry I have returned here after some time. Thank you Jacques for all that information and the example of injection via email.I am changing the code to now use mysqli prepared statements. I still would like to implement the constraint validation though and on the server side.

 

 

 

So if you use this, be aware that the data getting sent to your server is still entirely unrestricted. If you ask for 3 digits, you may get 100 Chinese symbols instead. If that's a problem, you need server-side validation or a combination of client-side and server side validation.

 

Since I always learn faster via an example I would once again request for a server side validation example for a variable that may have integer value from 1 to 100 and a string with a max length of 30 characters. ( to validate for 30 english characters and not 1000 chinese characters). Say for example a form that sends the name ( max 30 chareacters in length) and age (max 100 years).

 

Thanks very much ! 

Edited by ajoo
Link to comment
Share on other sites

  • Solution

Well, the server-side validation works exactly as described above. To count the number of characters, you use mb_strlen() and pass the string as well as the encoding you've used in the database column. Let's say the encoding is UTF-8 (MySQL calls it “utf8mb4”). Then your check looks like this:

<?php

// just for readability
define('MAX_NAME_LENGTH', 30);



$name_length = mb_strlen($_POST['username'], 'UTF-8');
if ($name_length !== false && $name_length <= MAX_NAME_LENGTH)
{
	// all good
}
else
{
	// name either not valid UTF-8 or too long
} 

And the age check should be obvious:

<?php

// just for readability
define('MIN_AGE', 1);
define('MAX_AGE', 100);



if (ctype_digit($_POST['age']) && $_POST['age'] >= MIN_AGE && $_POST['age'] <= MAX_AGE)
{
	// all good
}
else
{
	// age either not numeric or out of range
}
Edited by Jacques1
  • Like 1
Link to comment
Share on other sites

Hi Thanks Jacques, 

 

All the information provided by you has been very valuable.

 

There are so many answers that can be marked as best but I would like to mark the one just above this as the best since it answers most of the original question about validation of constraints. However I would just like to ask once again and in the context of the examples above, that would it not be OK as  to use the the inbuilt php validation function for checking the 'age' ? 

 

Thanks !

Link to comment
Share on other sites

“Built-in validation function”? What do you mean?

 

#

I guess you mean filter_var() with the FILTER_VALIDATE_INT flag? Well, this will also accept signed integers like “+36”, which doesn't make a lot of sense for the age. But if you're OK with that, sure, you can use it.

Edited by Jacques1
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.