Jump to content

pattern matching in regex


johnmerlino1
Go to solution Solved by Jacques1,

Recommended Posts

In the below example, we match 0 or more alphanumeric characters in the beginning and 0 or more alphanumeric characters in the end. Then we create 4 groups. Each group has "?=" which means "the next text must be like this". Now in first group we match any character 8 or more times. The next group we match 0 or more characters and a digit. In the next group we match 0 or more characters and a lowercase letter. In the next group we match 0 or more characters and an uppercase letter.

<?php
$password = "Fyfjk34sdfjfsjq7";

if (preg_match("/^.*(?=.{8,})(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).*$/", $password)) {
    echo "Your passwords is strong.";
} else {
    echo "Your password is weak.";
}
?>

My question is do these four groups impact each other? That is, the fact that the first group has 8 or more characters means that all the groups must have 8 or more characters. The fact that the second group has a digit means that all the groups must have a digit. Or do they work exclusively meaning that a 4 character word with a single digit would match this pattern, even though first group says it must have 8 characters.

Link to comment
Share on other sites

I think there is a misunderstanding on what ?= does. My understanding is that it is a 'lookaround'. Basically it looks for a match, but then 'gives up' the match. That way it does not move the position forward. So, you can use it to look for a number or a letter and their position won't matter.

 

The way I read the above expression is that the value must be at least 8 characters long. It must have a number. It must have a lower-case letter. It must have an upper-case letter. I think that the ".*" at the beginning and end are unnecessary. But, I'd have to test it to be sure.

 

EDIT: This should do what you want.

 

/^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{8,}$/

 - Must contain number

 - Must contain lower-case letter

 - Must contain upper-case letter

 - Must be at least 8 characters long

 

 

Or, if you also want to include a special character

 

/^(?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.[\W]).{8,}$/
Edited by Psycho
Link to comment
Share on other sites

  • Solution

To match the regex, a string must have the following properties:

  • a sequence of 8 characters which are not newlines (the dot exclused newlines unless the s modifier is set)
  • a digit anyhwere in the string
  • a lowercase latin letter anwhere in the string
  • an uppercase latin letter anywhere in the string

However, the regex is extremely inefficient and leads to excessive backtracking: It first consumes everything up to the first newline due to the .* pattern. From that position, it tries to find a sequence of 8 characters which are not newlines. If the .* pattern has consumed too much (like the entire string), then the parser has to go back character by character until it finds the sequence. From that position, it again reads all characters up to the next newline. Now it tries to find it a digit. It likely has to go back until it finds one. It may even have to go back in the very first .* pattern. Then the same procedure happens for the lowercase and the uppercase latin letter.

 

So the poor regex parser has to go back and forth dozens of times only to check a few trivial properties.

 

Is there any reason why you cannot use standard string functions like strlen()? It's also much simply to just apply three separate regexes for the digit, the lowercase and the uppercase letter. If you stuff it all into one big regex, you have to be very careful with how it is parsed and how the parts interact with each other.

 

Besides that, there's a conceptual issue: Your regex is based on your personal ideas of how a password looks like, which means you'll reject many strong passwords just because they use a different scheme. For example, I usually generate 16 random bytes and then encode them as 32 hexadecimal characters. This is extremely strong, yet you reject it based on the fact that I don't have uppercase letters. This is obviously silly. And what's wrong with using only symbols or non-latin letters? There are many different languages with many different alphabets, and people should actually be encouraged to use a large space of letters and not just A-Z.

 

You generally need to be very careful with password policies. Let me put it this way: Your check is good enough to make the average boss happy. If that's your goal, you can call it a day and go home. But if you're seriously interested in improving password security, you need to realize that there's a large variety of password schemes. Forcing everybody to comply to a particular set of rules can be very annoying and often borders cultural ignorance. It's a bit like trying to validate human names: You really can't.

 

It does make sense to enforce a minimum length, because this is a fairly strong indicator of security. But besides that, you probably shouldn't do anything. Do you really think that forcing people to follow a bunch of stupid rules will result in better password? I think it's much more effective to support and encourage them. Tell them about password managers, give practical tips for coming up with good passwords (like the famous

Correct Horse Battery Staple”). Work with your users, not against them.

Link to comment
Share on other sites

To match the regex, a string must have the following properties:

  • a sequence of 8 characters which are not newlines (the dot exclused newlines unless the s modifier is set)
  • a digit anyhwere in the string
  • a lowercase latin letter anwhere in the string
  • an uppercase latin letter anywhere in the string

However, the regex is extremely inefficient and leads to excessive backtracking: It first consumes everything up to the first newline due to the .* pattern. From that position, it tries to find a sequence of 8 characters which are not newlines. If the .* pattern has consumed too much (like the entire string), then the parser has to go back character by character until it finds the sequence. From that position, it again reads all characters up to the next newline. Now it tries to find it a digit. It likely has to go back until it finds one. It may even have to go back in the very first .* pattern. Then the same procedure happens for the lowercase and the uppercase latin letter.

 

So the poor regex parser has to go back and forth dozens of times only to check a few trivial properties.

 

Is there any reason why you cannot use standard string functions like strlen()? It's also much simply to just apply three separate regexes for the digit, the lowercase and the uppercase letter. If you stuff it all into one big regex, you have to be very careful with how it is parsed and how the parts interact with each other.

 

Besides that, there's a conceptual issue: Your regex is based on your personal ideas of how a password looks like, which means you'll reject many strong passwords just because they use a different scheme. For example, I usually generate 16 random bytes and then encode them as 32 hexadecimal characters. This is extremely strong, yet you reject it based on the fact that I don't have uppercase letters. This is obviously silly. And what's wrong with using only symbols or non-latin letters? There are many different languages with many different alphabets, and people should actually be encouraged to use a large space of letters and not just A-Z.

 

You generally need to be very careful with password policies. Let me put it this way: Your check is good enough to make the average boss happy. If that's your goal, you can call it a day and go home. But if you're seriously interested in improving password security, you need to realize that there's a large variety of password schemes. Forcing everybody to comply to a particular set of rules can be very annoying and often borders cultural ignorance. It's a bit like trying to validate human names: You really can't.

 

It does make sense to enforce a minimum length, because this is a fairly strong indicator of security. But besides that, you probably shouldn't do anything. Do you really think that forcing people to follow a bunch of stupid rules will result in better password? I think it's much more effective to support and encourage them. Tell them about password managers, give practical tips for coming up with good passwords (like the famous

Correct Horse Battery Staple”). Work with your users, not against them.

thanks for thoughtful response.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.