Jump to content

[SOLVED] Filtering input


T-Bird

Recommended Posts

I'm trying to filter my $_REQUEST data.  Here's a filter I made and I want to run this by some of you that actually know regex (I just finished reading the tutorial here).  I want to, in this case, filter a user name.  I want to accept letters, digits, the underscore the @ symbol, and the dash.  If the string contains any other characters I want my filter to return false.

 

function FILTER_VALIDATE_UNAME($string)
{
$s = preg_match('/^[^\w@\-]+$/',$string);
if($s === 0)
{
	return true;
}
else
{
	return false;
}
}

 

Will this work?  Is there a more optimal method?

Link to comment
Share on other sites

I think you should escape the backslash, but maybe it works without.. try?

 

Edit: sorry I misread your code when I posted earlier. You are matching invalid characters and then returning true if it didn't find any? Should work...

Link to comment
Share on other sites

You wrote it backwards.  ^ inside the character class negates them.  So if it matches something other than what's in your character class, it will return true, and your function will return true. 

 

If you are wanting to include a hyphen in a character class, you need to have it as the first character listed.  Otherwise you can get unexpected results, because it is a special character within a character class (used to create ranges).  You did escape it first, but that's not a guarantee.  Depending on whether it's php's time of the month or not, it could instead interpret that as a range of backslash to unspecified and either break or count everything from that character on, who knows? It's unpredictable.

 

Also, since you are just checking for existence and not wanting to capture anything (no 3rd param), you can put the preg_match directly into your condition.

 

Also note that while this pattern will only allow a-z, A-Z, 0-9, - _ and @, first off, it will match true on 1 character or longer string.  If that's what you want, then that's okay, but it is common to enforce a minimum character length for things...(and a maximum)

 

Also note that while this will only allow those things, it does not restrict number of occurrences of anything.  So for example, any of the following will return true:

 

----------------

--

---------------------------------

@@@@@@@

ab-------------cd---------xyz

_______blah@@@----____blah____________________________---------123

...and so on and so forth.

 

function FILTER_VALIDATE_UNAME($string)
{
   if(preg_match('/^[-\w@]+$/',$string))
   {
      return true;
   }
   else
   {
      return false;
   }
}

Link to comment
Share on other sites

To expand on what CV is talking about with regards to commonly enforcing minimum (and possibly maximum) number of characters in a string, you can use what is known as an interval (which takes on the format A{b,c}).

 

So if you wanted character(s) totaling 10 in sequence, you could use {10} as in: [abc]{10}

You can also specify a minimum only, by leaving the second parameter in the interval blank, such as:  [abc]{3,} (this requires a minimum of 3 a,b or c's consecutively, with no upper limit).

Or you can specify a range, such as:  [abc]{2,4} (minimum 2 of a,b or c's consecutively, maximum 4).

 

As it stands, your + quantifier means 'one or more times'.. so indeed, a single character that matches something within the character class will satisfy that.

Link to comment
Share on other sites

When I initially wrote it I didn't use the ^$ characters to limit it to the strings beginning and end, so I felt I had to search for any characters that did not match, and then disqualify the string if it had any of those characters (thus the backwardsness).  I actually tacked the ^$ on just before posting it here.  Although now I feel like slapping myself in the forehead.

 

That's good to know about the dash.  Also in this case the minimum/maximum length didn't really apply because they were FTP users pre-created on the server.  I was just sanitizing out of the user's attempted login data any inappropriate characters before storing/using it.

 

Just for practice, lets say I wanted it to include those same characters, begin with a letter, and be 5-15 characters long.  Would the following work?

 

/^\w+[-\w@]{4,14}$/

 

What if I wanted to expand the above to include at least one number.  Would this work?

 

/^\w+([-a-zA-z@]*\d+[-\w@]){4,14}*$/

 

Also for general storage into a MySQL database, is it advisable to use a prebuilt filter, or a custom "whitelist" regex?

Link to comment
Share on other sites

Just for practice, lets say I wanted it to include those same characters, begin with a letter, and be 5-15 characters long.  Would the following work?

 

/^\w+[-\w@]{4,14}$/

 

You can always try it out...

But no, this will not work.

What happens in the situation above is the regex engine will match (from the start) a-zA-Z0-9_ one or more times, but since the + is greedy, it will match all the way to the end of the string first.

However, since there is more in the pattern: [-\w@]{4,14}$, the engine must start backtracking to see if the final 4 characters satisfy [-\w@] (since the minimum is 4 in the interval, this is what the engine looks for).

 

To illustrate, consider the following:

$string = '123ghuJOoiI_P2@2';
echo $string . "<br />\n";
if(preg_match('#^(\w+)([-\w@]{4,14})$#', $string, $match)){
echo $match[1] . "<br />\n";
echo $match[2];
}

 

Here, I started by capturing (from the start) the first part: (\w+).. so at this point, this capture initially grabs the whole string (as the + is greedy), so $match[1] [which represents (\w+)] is equal to '123ghuJOoiI_P2@2' at this point.

But now, the engine must try to satisfy the rest of the pattern: ([-\w@]{4,14})$

So that means, from the end, the last 4 characters (again, using the minimum specified) must be within the character class [-\w@]. So if this is the case, the engine must take away the last 4 characters from the first match to store it into the second match to satisfy the pattern (that's an over simplified explanation).

As a result, since I echo out what the (\w+) is equal to, and what ([-\w@]{4,14}) is equal to...

 

$match[1] = '123ghuJOoiI_'

$match[2] = 'P2@2'

 

This all passes,  but as you can see, the total length of the string is 16 characters long.. not what you want.. so bottom line, I wouldn't recommend using + or * quantifiers in conjunction with intervals if you expect the intervals to represent the total string length.

 

 

 

 

What if I wanted to expand the above to include at least one number.  Would this work?

 

/^\w+([-a-zA-z@]*\d+[-\w@]){4,14}*$/

 

Again, try it out.

As usual, there are many ways to skin a cat. I would personally tackle this using perhaps regex with non-regex as such:

 

$str = 'ab3c@d-e_fujkl';
$validation = 'false';
if(preg_match('#^[-\w@]{5,14}$#', $str)){
$count = strlen($str);
for ($a = 0 ; $a < $count ; $a++) {
	if(ctype_digit($str[$a])){
		$validation = 'true';
		break;
	}
}
}
echo $validation;

 

Granted, this is just one way to do it. Some might first check the length, then see if the string satisfies [-\w@] from start to finish, then go from there. Again, many ways to skin a cat.

 

The best advice I can give you is to experiment. Keep trying things, and echoing them out to see the results.. You'll learn a hell of a lot through experimentation.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.