Checking for "Strong" Password

doubledee · December 26, 2011

I need some help checking for a "Strong" Password defined as...

- One Uppercase Letter

- One Lowercase Letter

- One Number

- One Special Character

- Between 8 and 20 Characters in Length

I know how to do basic REGEX, but am stuck on this one since I think it also requires something called a "Look-Back" call?!

Here is some code used for First Name that I am hoping can be tweaked into the equation needed...

if (preg_match('#^[A-Z \'.-]{2,20}$#i', $trimmed['firstName'])){

Thanks,

Debbie

ragax · December 28, 2011

Hi Debbie,

Here is an expression that matches your password requirement. I have unrolled it in "comment mode" so you can see what each line does. That way you can understand how the lookahead works.

(?x)  # comment mode
^   # beg of string anchor
(?=[\d\D]{8,20}$)  # lookahead: between 8 and 20 digits or non-digits, then the end of the string
(?=.*[A-Z]) # lookahead: an upper-case character
(?=.*[a-z]) # lookahead: a lower-case character
(?=.*[0-9]) # lookahead: a number
(?=.*[*?!&_-]) # lookahead: a special character
.*  # match anything
$   # end of string anchor

You can actually paste that directly into your PHP, or you can bring it back to one line and remove the comments. I would recommend leaving the comments.

I wasn't sure what you wanted to include in the special characters, so have a look on that line and add whatever you like between the brackets (you would have to escape a backslash).

Here is an example of use in a PHP program (tested):


<?php 
$passes=array('2_Short','Need_A_Number','Valid_Pass99','This_1_is_going_to_be_too_long');
foreach ($passes as $pass)
if (preg_match('/(?x)  # comment mode
^   # beg of string anchor
(?=[\d\D]{8,20}$)  # lookahead: between 8 and 20 digits or non-digits, then the end of the string
(?=.*[A-Z]) # lookahead: an upper-case character
(?=.*[a-z]) # lookahead: a lower-case character
(?=.*[0-9]) # lookahead: a number
(?=.*[*?!&_-]) # lookahead: a special character
.*  # match anything
$   # end of string anchor
/m', $pass))
echo 'Valid: '.$pass.'<br />';
else echo 'Invalid: '.$pass.'<br />';
?>

Don't forget the closing ?> as it is not showing inside the code box.

Please let me know if this works for you!

Wishing you a beautiful day

doubledee · December 29, 2011

Wow! I am impressed with your regex skills!

Hi Debbie,

Here is an expression that matches your password requirement. I have unrolled it in "comment mode" so you can see what each line does. That way you can understand how the lookahead works.
(?x)  # comment mode

Apparently that is called "Free Spacing Mode"??

What is its purpose??

^   # beg of string anchor
(?=[\d\D]{8,20}$)  # lookahead: between 8 and 20 digits or non-digits, then the end of the string
(?=.*[A-Z]) # lookahead: an upper-case character
(?=.*[a-z]) # lookahead: a lower-case character
(?=.*[0-9]) # lookahead: a number
(?=.*[*?!&_-]) # lookahead: a special character
.*  # match anything
$   # end of string anchor

That code doesn't look right.

.*[A-Z]

says "match any character zero or more times" and then the [A-Z] on the end doesn't make sense?!

Debbie

ragax · December 29, 2011

Hi Debbie,

That code doesn't look right.

To clarify: do you mean that it doesn't work, or that it looks strange to your eyes?

(I have tested the php in the codebox. If you paste it in a php file, it should run.)

Apparently that is called "Free Spacing Mode"?

Yes, "free-spacing mode", or "white-space mode", or simply (my favorite): "comment mode".

What is its purpose?

As you can see in the code box, it has allowed me to write the regex on multiple lines instead of a single line. Also, it allows you to use comments (everything after a # on a line is a comment). It makes it a lot easier to read, lets it breathe. If you return to the code in a year, you will be able to understand the expression at one glance. You can see many more examples (with better formatting) on the tutorial in my signature. Not all regex flavors support this, but PCRE does, and PHP's preg functions use PCRE. If you use a space in the expression, you need to include it in a character class: [ ]

says "match any character zero or more times" and then the [A-Z]

You want at least one upper-case character, right? So this lookahead sees if anywhere on the string it can find ONE character between A and Z. The dot star is what enables the lookahead to look anywhere on the string: anything can precede the uppercase letter, including, potentially, nothing (allowed by the star).

Does this clear it up for you?

You seem interested, and I love regex, so I'm happy to keep answering any question you have about this expression.

Warmest wishes

doubledee · December 29, 2011

Hi Debbie,

That code doesn't look right.

To clarify: do you mean that it doesn't work, or that it looks strange to your eyes?

(I have tested the php in the codebox. If you paste it in a php file, it should run.)

I just didn't think it was correct from how I thought you did regex grammar. (More on this below)

Apparently that is called "Free Spacing Mode"?

Yes, "free-spacing mode", or "white-space mode", or simply (my favorite): "comment mode".

What is its purpose?

As you can see in the code box, it has allowed me to write the regex on multiple lines instead of a single line. Also, it allows you to use comments (everything after a # on a line is a comment). It makes it a lot easier to read, lets it breathe. If you return to the code in a year, you will be able to understand the expression at one glance.

I'm all about "pretty code", so that works for me!

You can see many more examples (with better formatting) on the tutorial in my signature.

You have a really strange website!!

says "match any character zero or more times" and then the [A-Z]

You want at least one upper-case character, right? So this lookahead sees if anywhere on the string it can find ONE character between A and Z. The dot star is what enables the lookahead to look anywhere on the string: anything can precede the uppercase letter, including, potentially, nothing (allowed by the star).

This is the tiny bit of regex that I know...

	if (preg_match('#^[A-Z \'.-]{2,20}$#i', $trimmed['firstName'])){

I am used to seeing a literal...

debbie

...or a class...

[A-Z \'.-]

...followed by a quantified...

{2,20}

In your code it is like you have that flipped...

(?=.*[A-Z]) # lookahead: an upper-case character

From the PHP Manual...

Lookahead assertions start with (?= for positive assertions and (?! for negative assertions. For example, \w+(?= matches a word followed by a semicolon, but does not include the semicolon in the match, and foo(?!bar) matches any occurrence of "foo" that is not followed by "bar". Note that the apparently similar pattern (?!foo)bar does not find an occurrence of "bar" that is preceded by something other than "foo"; it finds any occurrence of "bar" whatsoever, because the assertion (?!foo) is always TRUE when the next three characters are "bar". A lookbehind assertion is needed to achieve this effect.

Based on that explanation, I would think the code should be...

[A-Z](?=.*)

"Find any upper case letter, followed by zero or more characters..."

Does this clear it up for you?

I can kinda see your way, but my way makes sense too.

And, no, I'm pretty confused right now. (There really aren't as many good resources on this topic online as I'd hope. Maybe I need to go buy a book?! (There is also a lot of *incorrect* information out there which always makes me nervous not knowing who to believe...)

You seem interested, and I love regex, so I'm happy to keep answering any question you have about this expression.

Warmest wishes

I am interested in learning in general!! But, yes, I would like to advance my regex knowledge, as they seem powerful (although a lot of developers think they are evil?!)

Here is hoping I can get this figured out (correctly).

Thanks,

Debbie

ragax · December 29, 2011

Greetings Debbie,

Morning here in New Zealand, great to hear from you.

There is also a lot of *incorrect* information out there

Yes, that is true. Often I read advice that is a case of the blind leading the blind, though I also often see advice by extremely skilled regex practitioners.

Maybe I need to go buy a book?

The two books on the tutorial are excellent.

I recommend starting with the tutorial in my sig, but if it starts too fast, step back and start at the beginning with Jan's tutorial on regular-expressions.info. Then come back. The books and Jan's site are excellent, they really are, and yet lacking in some ways that the tutorial in my sig tries to fill. You can answer "how do I match a phone number", or you can answer "what's a good way to use this syntax", and "why should I use this instead of that", and all kinds of questions that mean you are learning a craft rather than applying a recipe. You have to do both: learn recipes, and learn the craft. In the tut there is a section on all the bits of syntax that use (?, and since you are starting out with lookaheads you might find this worthwhile very soon. Ultimately if you get hooked by regex you will want to eat up all of these resources.

To complete this survey of the landscape: the PHP manual is a good reference for PCRE regex, but short on explanations and potentially confusing. The original PCRE manual is much better.

Based on that explanation, I would think the code should be...

I'm going to have to insist.

The code is correct (I'm sure you've tried it by now), the alternative is not.

Let me try to explain what might be the source of confusion.

You wrote this:

[A-Z](?=.*)

This means: match one upper-case character, then look ahead if you can find anything (the dot-star). The lookahead is meaningless, because you can always match dot-star. As for the [A-Z], it is not part of a lookahead, so it requires you to match an upper-case character right after whatever you have matched before, and right before whatever you're going to match next.

Now let's look again at the expression I sent you.

First, let's talk about its general "personality".

What it does is let you stand at the beginning of the string (thanks to the caret anchor). From the beginning of the string, you look ahead five times. Always from the beginning of the string, because no matching whatsoever happens until we have finished the lookaheads.

So the lookaheads are a series of tests: let's see if somewhere in the distance I can see an upper-case letter... Etc.

Once the lookaheads are done, we know we have matched all the requirements of the password, so we are allowed to match anything. And we do! With the dot-star. We use a dot-star because we are absolutely confident that all are requirements have been met. At that stage, we say "okay, let's eat up this entire string".

Now let's dive into one of the lookaheads (the one you were mentioning):

(?=.*[A-Z]) # lookahead: an upper-case character

You say that it looks flipped, and that suggests to me that the source of confusion is that you might think that part of this line contributes to the final match. But actually, everything inside the parentheses is part of the lookahead. There is no matching done whatsoever here: we don't eat up any characters, we just look. All the matching happens at the very end of the regex, with the dot-star.

What does the lookahead do? The first thing to remember is that you are standing at the caret, at the beginning of the string. The lookahead starts looking from the very next character in the string. So we cannot just say:

(?=[A-Z])

because this would mean: look to see if the very first character in the string is an upper-case letter.

Instead, we say "look if we can eat up any number of characters, THEN find an upper-case letter". This allows the upper-case letter to be first, last, or anywhere in the string. In lookahead parlance, this looks like this:

(?=.*[A-Z]) # lookahead: an upper-case character

So the key is to understand that the lookahead starts looking from the very point where you are standing. If you want it to look way ahead in the distance, you have to tell him so, which we did.

I know this is a rather detailed message, but I wanted to try to clear this up for you.

Did it work?

If there is still any confusion, please let me know. I would love to know that it has clicked for you, because once it does, there will be no stopping you with regular expressions.

Wishing you a beautiful day

doubledee · December 31, 2011

Playful,

Sorry for the late response. haven't been feeling good, and my mind has been in a "fog" the last day or so - which makes this topic even harder to get down?!

I will be honest and say that I am frustrated because I spent all yesterday reading up on Regular Expressions and particularly "Look Aheads" and in my assessment there is very little material out there on the Internet. Most of it is just regurgitated (i.e. "stolen") from other website, I more concerning is the fact that I believe a lot of what I read yesterday was wrong. (I may be a newbie to REGEX, but I'm no dummy!)

The few decent sources I found contradicted what you did, so I don't know who to believe.

It's nothing personal, but *usually* if you dig around enough, you can find a consensus and figure things out yourself.

On this topic I still fee uncertain, which bugs me, because I hate not knowing...

Regardless, you seem eager to help, and your code is probably right.

Maybe if you help explain it better, then I will understand it and thus trust it more?!

So let me ask LOTS of questions...

Hi Debbie,

Here is an expression that matches your password requirement. I have unrolled it in "comment mode" so you can see what each line does. That way you can understand how the lookahead works.
(?x)  # comment mode
^   # beg of string anchor
(?=[\d\D]{8,20}$)  # lookahead: between 8 and 20 digits or non-digits, then the end of the string

Do "digits" and "non-digits" make up the entire universe of characters?

A period represents "any character", right?

And an asterisk represents 0 to many characters, right?

So wouldn't this...

(?=.*{8,20}$)

...be better than this...

(?=[\d\D]{8,20}$)

And do I understand how .* works?

So what is this first Lookahead doing? (Please don't use the "eating up characters" thing, because I have NO CLUE what that means. It sounds like you are describing Pac-man?!)

It looks at the entire string and sees if it is 8 to 20 characters long?

(?=.*[A-Z]) # lookahead: an upper-case character

?= says "look forward from whatever position you are at...", right?

.* says "zero or more of ANY characters...", right?

[A-Z] says "the class of capital letters...", right?

?=.*[A-Z] presumably says "from wherever the pointer is at, look forward zero or more characters, for any characters that are in the class or upper-case letters..."

I think the .* and the [A-Z] thing seem to be working against each other. This one really confuses me?!

I am used to working with simple regular expressions like this...

			if (preg_match('#^[A-Z \'.-]{2,20}$#i', $trimmed['firstName'])){

In that code, you FIRST define the character set you are working with. In the example above, this would be...

[A-Z \'.-]

After that, THEN YOU DEFINE THE SIZE like this...

{2,20}

So this throws me off...

.*[A-Z]

Also, if I didn't say so before, if a period is "any character except a new line" then doesn't that conflict [A-Z] ??????

(?=.*[a-z]) # lookahead: a lower-case character
(?=.*[0-9]) # lookahead: a number
(?=.*[*?!&_-]) # lookahead: a special character

Help me understand the first part above with upper-case letters, and then these will make sense naturally.

I wasn't sure what you wanted to include in the special characters, so have a look on that line and add whatever you like between the brackets (you would have to escape a backslash).

Isn't there a predefined class of "Special Characters"??

In English, what constitutes a "Special Character" anyways?!

Okay, I'll wait to hear back from you and hope I can figure this out?!

Thanks,

Debbie

P.S. After you respond, I will likely paste a few other examples I saw online and see if you can explain why your way is correct and presumably the others ways are wrong!!

ragax · December 31, 2011

Hi Debbie,

Great to hear from you. Sorry to hear that you haven't been feeling well.

I hope you soon have a chance to give your stomach a break.

my assessment there is very little material out there on the Internet. Most of it is just regurgitated (i.e. "stolen") from other website

I see it the same way. <rant>In my view, this is part of the short-attention-span world where people are too busy with Facebook and their cell phones to learn something for themselves. Yes, most of these resources feel like copy-paste. But not the ones I have pointed you to, I think. You can tell the difference when someone has been down in the trenches working with code.</rant>

The few decent sources I found contradicted what you did

Could this be a surface impression, I wonder? Do please share one or two spots where such contradictions seem to exist, and I will do my very best to clear it up for you. It must be frustrating to navigate a sea of contradicting information, and I feel annoyed on your behalf.

So let me ask LOTS of questions...

Okay, hanging on to my seat.

Please don't use the "eating up characters"

Actually, it is rather important. So rather than avoiding it, I will explain it.

The regex engine starts at the beginning of the string. Then it moves from left to right as it tries to build a match. Sometimes, it has to move backward (backtrack). "Eating up characters" just means that the regex engines consumes a number of characters on the string as it moves from left to right. For instance, if your string is Monday25 and your regex is \w+\d+, first the regex consumes (eats up) the M, the o,n,d,a,y (one by one!), then it eats up the 2 and the 5.

(?=.*{8,20}$)

This is not correct regex. It means "look for between eight and twenty times" (the {8,20}) of "any amount of anything (the dot-star), then look for the end of the string". This would be my face if I has to look for eight times any amount of anything.

If you remove the star, it works as it means "look for between eight and twenty times of any character". It is a lookahead, so it does not "eat up" (consume) any characters. The regex engine stays planted at the beginning of the string. It just looks ahead if it can find what you told it to look for.

And do I understand how .* works?

Probably, but not in the above expression. Dot means "any character". Star means "any amount of (and possibly none)". Dot star means "any amount of any character, and possibly none".

Footnote (please don't spend time on this paragraph until the rest of the page is clear): depending on whether we are in "dot-match mode" (activated for instance by (?s) ), the dot may or may not match the end of line character. Normally, it does not. In this expression, a dot would work just as well as a [\d\D]. If your input allowed carriage returns (which a password field would not), then the dot version would be preferable. In this case it really does not matter: there are always more than one way to write an expression. It's fun to switch it up sometimes.

If you wanted to use the dot to write this lookahead, you would write:

(?=.{8,20}$)

?= says "look forward from whatever position you are at...", right?

Right.

.* says "zero or more of ANY characters...", right?

Right.

[A-Z] says "the class of capital letters...", right?

Right. (More precisely, any character from A to Z).

?=.*[A-Z] presumably says "from wherever the pointer is at, look forward zero or more characters, for any characters that are in the class or upper-case letters..."

Nope. But that's great because we have found the source of confusion. It means look for any number of characters, THEN for an upper-case letter. See, the expression in the parentheses is a small regex in itself. It is sequential, meaning, you read it from left to right. If it said (?=abc), it would not be the same as (?=cba). The regex engine reads look for this, THEN that, THEN the other. So the secret of this lookahead is that it looks if there is an upper-case character ANYWHERE in the string. Because the lookahead goes through any number of characters, THEN an upper-case character.

Note that we could have written it with a lazy quantifier instead:

(?=.*?[A-Z]).*

Whether one works better than the other depends on the expected position of the upper-case character.

Bottom-line: both work.

Help me understand the first part above with upper-case letters, and then these will make sense naturally.

Correct.

Isn't there a predefined class of "Special Characters"

No, that's really up to you to decide what a special character means. Will you allow a letter in the Thai alphabet in your password? I don't think so.

Will you allow all punctuation marks? Great, you can use [:punct:]

I left it up to you to decide what to put in that class.

After you respond, I will likely paste a few other examples I saw online and see if you can explain why your way is correct and presumably the others ways are wrong!!

Looking forward to it. Both ways may very well be right.

I hope you feel better soon! It must be hard to work through this regex with a foggy head. If you can crack this one in that state, you will be able to crack anything.

Wishing you a good rest.

Looking forward to reading you again. I have a feeling that it's just about to click for you.

If my explanation is still not clear, please keep insisting on why it "looks wrong" until I have given you an acceptable explanation.

ragax · December 31, 2011

Hi again Debbie,

A second message because something just crossed my mind.

I wonder if this could be the source of outstanding confusion, as I did not emphasize it:

At the end of the first lookahead (looking for eight to twenty characters), the regex engine hasn't moved. (It is still ready to eat up characters at the very beginning of the string.) It is still planted in the same place. It has only LOOKED ahead.

So the second lookahead also LOOKS from the very beginning (looking for an upper-case character). At the end of it, the engine is still planted at the beginning, ready to eat up whatever you tell it to MATCH.

All the lookaheads in this expression do that. They start looking from the very beginning. The engine's position in the string doesn't change.

After all the lookaheads, when the dot-star starts its job of eating up any character, it does so from the very beginning of the string, since we haven't moved at all.

Sign In

Checking for "Strong" Password

Recommended Posts

doubledee

Link to comment

Share on other sites

ragax

Link to comment

Share on other sites

doubledee

Link to comment

Share on other sites

ragax

Link to comment

Share on other sites

doubledee

Link to comment

Share on other sites

ragax

Link to comment

Share on other sites

doubledee

Link to comment

Share on other sites

ragax

Link to comment

Share on other sites

ragax

Link to comment

Share on other sites

Join the conversation

Browse

Activity

Important Information