phppup Posted August 14, 2023 Share Posted August 14, 2023 (edited) Trying to utilize this expression $pattern = "/^[A-Z][a-zA-Z '&-]*[A-Za-z]$/"; to take name to start with a capital letter, allow some symbols or space, and then continue with more letters. How can I require the length of all characters be involved to be between 3 and 50 keystrokes? Ice tried {3,50} in several positions but none seem to be making the connection that I want. Thanks Edited August 14, 2023 by phppup Clean up post Quote Link to comment Share on other sites More sharing options...
requinix Posted August 14, 2023 Share Posted August 14, 2023 Think about it a little more: 1. The name must start with an uppercase letter. You already have that. It will account for 1 character. 2. The name must end with a letter. You already have that. It will account for 1 character. 3. If you require 3-50 characters total then that means the part in the middle must be between ___ and ___ characters long. Quote Link to comment Share on other sites More sharing options...
phppup Posted August 14, 2023 Author Share Posted August 14, 2023 Ahhhh, so that's the trick!! I used parenthesis (that seem to resolve the issue) but I'd like to avoid any unforeseeable surprised. $pattern = "/^([A-Z][a-zA-Z '&-]{1,48})*[A-Za-z]$/"; Please check to confirm that this working solution is the most correct implementation. Quote Link to comment Share on other sites More sharing options...
requinix Posted August 14, 2023 Share Posted August 14, 2023 You've tested that with things you want to allow, I assume. Did you test it with things that you didn't want to allow? Quote Link to comment Share on other sites More sharing options...
phppup Posted August 14, 2023 Author Share Posted August 14, 2023 Well, of course (!of_course] Now, it's confirmed to my needs, but I still pose the question, since sometimes even testing does not reveal what a more educated eye can plainly see. Quote Link to comment Share on other sites More sharing options...
phppup Posted August 14, 2023 Author Share Posted August 14, 2023 (edited) @requinix Some deeper testing has revealed an unexpected issue. For test purposes I changed the range $pattern = "/^([A-Z][a-zA-Z '&-]{0,4})*[A-Za-z]$/"; and discovered that while the RegEx seems to have a few holes for what would be allowed, it is NOT returning 0 to keep the minimum requirement at two characters. In fact, a single uppercase A passed the test of acceptance. Edited August 14, 2023 by phppup Typos Quote Link to comment Share on other sites More sharing options...
requinix Posted August 14, 2023 Share Posted August 14, 2023 5 hours ago, phppup said: Now, it's confirmed to my needs, Then you didn't test it very well. Nevermind your current problem. Go back to what you had before and make one simple change to it. That's the puzzle for you to solve: there is a single change - replacing one thing with another - that will make your original regex do what you need to do. Quote Link to comment Share on other sites More sharing options...
phppup Posted August 15, 2023 Author Share Posted August 15, 2023 @requinix Ok, I think I got it. $pattern = "/^([A-Z][a-zA-Z '&-]{2,4})*[A-Za-z]$/"; Please tell me if this is the [correct] solution that you eluded to. I moved the opening parenthesis to the middle section. Ironically, I may have inadvertently discovered something, as I altered the range for easier testing (but stumbled upon new conditions). Effectively, this modified range of 2,4 had to affect ONLY this mid-section of the string. So, the first character is [A-Z] And the last is [A-Za-z] So with the mid-section {2,4} a total of 3 characters will FAIL the test but 4 thru 6 pass, and then 7 or greater fails. Of course, a minimum of 1 will elevate the minimum to a total of 3 (if a special character is used) I did notice that 2 characters only (being first and last) will pass, but I imagine that adding a minimum to the last set of characters would correct that. My final solution is $pattern = "/^([A-Z][a-zA-Z '&-]{1,48})*[A-Za-z]$/"; Is this the same as your solution? Quote Link to comment Share on other sites More sharing options...
kicken Posted August 15, 2023 Share Posted August 15, 2023 When working on a regex, it helps to use something like Regex101 so you can easily test and modify your expression. If what you are trying to validate is names of people/places, it's generally best to not bother as names are complicated. I just check a maximum length for such things to ensure it fits in the database column. Quote Link to comment Share on other sites More sharing options...
Solution requinix Posted August 15, 2023 Solution Share Posted August 15, 2023 2 hours ago, phppup said: My final solution is $pattern = "/^([A-Z][a-zA-Z '&-]{1,48})*[A-Za-z]$/"; Is this the same as your solution? No. When testing software, your goal should be to break it. To make it do something you don't want, or to not do something you do want. Simply testing some examples of what you want and what you don't isn't enough. Since I have other things to do today, $pattern = "/^[A-Z][a-zA-Z '&-]{1,48}[A-Za-z]$/"; Try both your solution and my solution against the string AbcdefghijklmnopqrstuvwxyzAbcdefghijklmnopqrstuvwxyzAbcdefghijklmnopqrstuvwxyz Quote Link to comment Share on other sites More sharing options...
phppup Posted August 15, 2023 Author Share Posted August 15, 2023 @kicken Yes. I am building a form and want PHP to "scold" users that try to submit junk. (It will be used internally by a staff group that needs to be reliied on and trusted. LOL) I realize the need to sanitize names and was going to simply require a-zA-z to eliminate numbers and characters. Then I realized that there are names with hyphens, apostrophes, and spaces, so down the rabbit hole I went. @requinix Quote I moved the opening parenthesis to the middle section. which left me with $pattern = "/^[A-Z]([a-zA-Z '&-]{1,48})*[A-Za-z]$/"; Is that the same as yours? Do my parentheses alter the outcome? So I guess, thanks to your assistance, I made progress after all. I just have to be more careful about using cut & paste BEFORE my coffee. Quote Link to comment Share on other sites More sharing options...
kicken Posted August 15, 2023 Share Posted August 15, 2023 1 hour ago, phppup said: Yes. I am building a form and want PHP to "scold" users that try to submit junk. The point is that when it comes to things like names, the difference between junk and not junk is hard to define, and you're often better off just not even trying. Better to accept a few junk records than to tell someone their real legal name is not valid. If you want to provide some filtering, you need to be a lot more permissive than you currently are. Your regex for example would be telling Ms Bérénice Bejo that her name is invalid. We had an issue with a public request information form with a bunch of junk submissions, particularly name fields including Emoji characters. What I ended up doing was applying a filter that checked the Unicode code point for each character in the name to ensure it was within a particular set of allowed unicode characters. The sets of allowed characters I went with is pretty broad. There's still plenty of opportunity for junk, but it does stop quite a bit of junk. function validate_unicode_codepoints($allowedRanges, ...$strings) : bool{ foreach ($strings as $str){ $chars = mb_str_split($str, 1, 'utf-8'); foreach (array_map('mb_ord', $chars) as $codePoint){ $isInRange = false; foreach ($allowedRanges as $range){ $isInRange = $isInRange || $codePoint >= $range[0] && $codePoint <= $range[1]; } if (!$isInRange){ return false; } } } return true; } 1 hour ago, phppup said: Do my parentheses alter the outcome? The parenthesis on their own do not change things, but the * after them does. * means "match the previous expression 0 or more times". The parenthesis are "previous expression", which allows for between 1 and 48 occurrences of the indicated characters. So your overall expression then would allow 0 or more instances of between 1 and 48 characters. Effectively, a string of unlimited length so long as it matches the character list. Quote Link to comment Share on other sites More sharing options...
phppup Posted August 15, 2023 Author Share Posted August 15, 2023 (edited) @kicken Ahhhh, now I see it. Quote a string of unlimited length so long as it matches the character list. So, with the parenthesis I can receive an input as you described. But without the parenthesis I am successfully limiting the totality of characters submitted to 1+48+1 (adding the first and last). At this point, I'm kinda okay with using only the American alphabet, but still want to accommodate complex names with apostrophes and hyphens (I'm still skeptical of names with underscores; and the few with numerals will need to come up with their own nickname ie:Musk's kid). Thanks for the info Edited August 15, 2023 by phppup Typos Quote Link to comment Share on other sites More sharing options...
requinix Posted August 15, 2023 Share Posted August 15, 2023 Parentheses are for grouping. You use them when you want to deal with things as a group instead of each one individually. "Alice is driving to work, and Bob is driving to work, and Cindy is driving to work": the three of them are each taking their own cars to work and contributing to local traffic problems. "(Alice and Bob and Cindy) are driving to work": the three of them are carpooling like responsible human beings. Having parentheses for the sake of having parentheses is wasteful but not inherently wrong. But when you throw other things into a regex, like + or *, and to apply them to the parenthesized group, then you change what the regex does. "(Alice and Bob and Cindy)+ are driving to work": there are some number of people, every one of them named Alice or Bob or Cindy, and they are all driving to work together in one comically-oversized minivan. "(Alice and Bob and Cindy)* are driving to work": maybe there are three people driving to work, or maybe there are more than three people, or maybe there aren't any people at all because it's the weekend and they don't work on the weekend. Try this. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.