johnmerlino1 Posted July 29, 2014 Share Posted July 29, 2014 In the below example, we match 0 or more alphanumeric characters in the beginning and 0 or more alphanumeric characters in the end. Then we create 4 groups. Each group has "?=" which means "the next text must be like this". Now in first group we match any character 8 or more times. The next group we match 0 or more characters and a digit. In the next group we match 0 or more characters and a lowercase letter. In the next group we match 0 or more characters and an uppercase letter. <?php $password = "Fyfjk34sdfjfsjq7"; if (preg_match("/^.*(?=.{8,})(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).*$/", $password)) { echo "Your passwords is strong."; } else { echo "Your password is weak."; } ?> My question is do these four groups impact each other? That is, the fact that the first group has 8 or more characters means that all the groups must have 8 or more characters. The fact that the second group has a digit means that all the groups must have a digit. Or do they work exclusively meaning that a 4 character word with a single digit would match this pattern, even though first group says it must have 8 characters. Quote Link to comment https://forums.phpfreaks.com/topic/290160-pattern-matching-in-regex/ Share on other sites More sharing options...
Psycho Posted July 29, 2014 Share Posted July 29, 2014 (edited) I think there is a misunderstanding on what ?= does. My understanding is that it is a 'lookaround'. Basically it looks for a match, but then 'gives up' the match. That way it does not move the position forward. So, you can use it to look for a number or a letter and their position won't matter. The way I read the above expression is that the value must be at least 8 characters long. It must have a number. It must have a lower-case letter. It must have an upper-case letter. I think that the ".*" at the beginning and end are unnecessary. But, I'd have to test it to be sure. EDIT: This should do what you want. /^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{8,}$/ - Must contain number - Must contain lower-case letter - Must contain upper-case letter - Must be at least 8 characters long Or, if you also want to include a special character /^(?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.[\W]).{8,}$/ Edited July 29, 2014 by Psycho Quote Link to comment https://forums.phpfreaks.com/topic/290160-pattern-matching-in-regex/#findComment-1486360 Share on other sites More sharing options...
Solution Jacques1 Posted July 29, 2014 Solution Share Posted July 29, 2014 To match the regex, a string must have the following properties: a sequence of 8 characters which are not newlines (the dot exclused newlines unless the s modifier is set) a digit anyhwere in the string a lowercase latin letter anwhere in the string an uppercase latin letter anywhere in the string However, the regex is extremely inefficient and leads to excessive backtracking: It first consumes everything up to the first newline due to the .* pattern. From that position, it tries to find a sequence of 8 characters which are not newlines. If the .* pattern has consumed too much (like the entire string), then the parser has to go back character by character until it finds the sequence. From that position, it again reads all characters up to the next newline. Now it tries to find it a digit. It likely has to go back until it finds one. It may even have to go back in the very first .* pattern. Then the same procedure happens for the lowercase and the uppercase latin letter. So the poor regex parser has to go back and forth dozens of times only to check a few trivial properties. Is there any reason why you cannot use standard string functions like strlen()? It's also much simply to just apply three separate regexes for the digit, the lowercase and the uppercase letter. If you stuff it all into one big regex, you have to be very careful with how it is parsed and how the parts interact with each other. Besides that, there's a conceptual issue: Your regex is based on your personal ideas of how a password looks like, which means you'll reject many strong passwords just because they use a different scheme. For example, I usually generate 16 random bytes and then encode them as 32 hexadecimal characters. This is extremely strong, yet you reject it based on the fact that I don't have uppercase letters. This is obviously silly. And what's wrong with using only symbols or non-latin letters? There are many different languages with many different alphabets, and people should actually be encouraged to use a large space of letters and not just A-Z. You generally need to be very careful with password policies. Let me put it this way: Your check is good enough to make the average boss happy. If that's your goal, you can call it a day and go home. But if you're seriously interested in improving password security, you need to realize that there's a large variety of password schemes. Forcing everybody to comply to a particular set of rules can be very annoying and often borders cultural ignorance. It's a bit like trying to validate human names: You really can't. It does make sense to enforce a minimum length, because this is a fairly strong indicator of security. But besides that, you probably shouldn't do anything. Do you really think that forcing people to follow a bunch of stupid rules will result in better password? I think it's much more effective to support and encourage them. Tell them about password managers, give practical tips for coming up with good passwords (like the famous “Correct Horse Battery Staple”). Work with your users, not against them. Quote Link to comment https://forums.phpfreaks.com/topic/290160-pattern-matching-in-regex/#findComment-1486361 Share on other sites More sharing options...
johnmerlino1 Posted July 30, 2014 Author Share Posted July 30, 2014 To match the regex, a string must have the following properties: a sequence of 8 characters which are not newlines (the dot exclused newlines unless the s modifier is set) a digit anyhwere in the string a lowercase latin letter anwhere in the string an uppercase latin letter anywhere in the string However, the regex is extremely inefficient and leads to excessive backtracking: It first consumes everything up to the first newline due to the .* pattern. From that position, it tries to find a sequence of 8 characters which are not newlines. If the .* pattern has consumed too much (like the entire string), then the parser has to go back character by character until it finds the sequence. From that position, it again reads all characters up to the next newline. Now it tries to find it a digit. It likely has to go back until it finds one. It may even have to go back in the very first .* pattern. Then the same procedure happens for the lowercase and the uppercase latin letter. So the poor regex parser has to go back and forth dozens of times only to check a few trivial properties. Is there any reason why you cannot use standard string functions like strlen()? It's also much simply to just apply three separate regexes for the digit, the lowercase and the uppercase letter. If you stuff it all into one big regex, you have to be very careful with how it is parsed and how the parts interact with each other. Besides that, there's a conceptual issue: Your regex is based on your personal ideas of how a password looks like, which means you'll reject many strong passwords just because they use a different scheme. For example, I usually generate 16 random bytes and then encode them as 32 hexadecimal characters. This is extremely strong, yet you reject it based on the fact that I don't have uppercase letters. This is obviously silly. And what's wrong with using only symbols or non-latin letters? There are many different languages with many different alphabets, and people should actually be encouraged to use a large space of letters and not just A-Z. You generally need to be very careful with password policies. Let me put it this way: Your check is good enough to make the average boss happy. If that's your goal, you can call it a day and go home. But if you're seriously interested in improving password security, you need to realize that there's a large variety of password schemes. Forcing everybody to comply to a particular set of rules can be very annoying and often borders cultural ignorance. It's a bit like trying to validate human names: You really can't. It does make sense to enforce a minimum length, because this is a fairly strong indicator of security. But besides that, you probably shouldn't do anything. Do you really think that forcing people to follow a bunch of stupid rules will result in better password? I think it's much more effective to support and encourage them. Tell them about password managers, give practical tips for coming up with good passwords (like the famous “Correct Horse Battery Staple”). Work with your users, not against them. thanks for thoughtful response. Quote Link to comment https://forums.phpfreaks.com/topic/290160-pattern-matching-in-regex/#findComment-1486461 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.