Jump to content

Adding a range to RegEx


phppup
Go to solution Solved by requinix,

Recommended Posts

Trying to utilize this expression

$pattern = "/^[A-Z][a-zA-Z '&-]*[A-Za-z]$/";

to take name to start with a capital letter, allow some symbols or space, and then continue with more letters.

How can I require the length of all characters be involved to be between 3 and 50 keystrokes?

Ice tried {3,50} in several positions but none seem to be making the connection that I want.

Thanks

 

 

Edited by phppup
Clean up post
Link to comment
Share on other sites

Think about it a little more:

1. The name must start with an uppercase letter. You already have that. It will account for 1 character.
2. The name must end with a letter. You already have that. It will account for 1 character.
3. If you require 3-50 characters total then that means the part in the middle must be between ___ and ___ characters long.

Link to comment
Share on other sites

Ahhhh, so that's the trick!!

I used parenthesis (that seem to resolve the issue) but I'd like to avoid any unforeseeable surprised.

$pattern = "/^([A-Z][a-zA-Z '&-]{1,48})*[A-Za-z]$/";

Please check to confirm that this working solution is the most correct implementation.

Link to comment
Share on other sites

@requinix Some deeper testing has revealed an unexpected issue.

For test purposes I changed the range 

$pattern = "/^([A-Z][a-zA-Z '&-]{0,4})*[A-Za-z]$/";

and discovered that while the RegEx seems to have a few holes for what would be allowed, it is NOT returning 0 to keep the minimum requirement at two characters.

In fact, a single uppercase A passed the test of acceptance.

Edited by phppup
Typos
Link to comment
Share on other sites

5 hours ago, phppup said:

Now, it's confirmed to my needs,

Then you didn't test it very well.

Nevermind your current problem. Go back to what you had before and make one simple change to it.
That's the puzzle for you to solve: there is a single change - replacing one thing with another - that will make your original regex do what you need to do.

 

Link to comment
Share on other sites

@requinix Ok, I think I got it.

 

$pattern = "/^([A-Z][a-zA-Z '&-]{2,4})*[A-Za-z]$/";

Please tell me if this is the [correct] solution that you eluded to.

I moved the opening parenthesis to the middle section.

Ironically, I may have inadvertently discovered something, as I altered the range for easier testing (but stumbled upon new conditions).

Effectively, this modified range of 2,4 had to affect ONLY this mid-section of the string.

So, the first character is [A-Z]

And the last is [A-Za-z]

So with the mid-section {2,4} a total of 3 characters will FAIL the test but 4 thru 6 pass, and then 7 or greater fails.

Of course, a minimum of 1 will elevate the minimum to a total of 3 (if a special character is used)

I did notice that 2 characters only (being first and last) will pass, but I imagine that adding a minimum to the last set of characters would correct that.

My final solution is

$pattern = "/^([A-Z][a-zA-Z '&-]{1,48})*[A-Za-z]$/";

Is this the same as your solution?

Link to comment
Share on other sites

When working on a regex, it helps to use something like Regex101 so you can easily test and modify your expression.

If what you are trying to validate is names of people/places, it's generally best to not bother as names are complicated.  I just check a maximum length for such things to ensure it fits in the database column.

Link to comment
Share on other sites

  • Solution
2 hours ago, phppup said:

My final solution is

$pattern = "/^([A-Z][a-zA-Z '&-]{1,48})*[A-Za-z]$/";

Is this the same as your solution?

No.

When testing software, your goal should be to break it. To make it do something you don't want, or to not do something you do want. Simply testing some examples of what you want and what you don't isn't enough.

Since I have other things to do today,

$pattern = "/^[A-Z][a-zA-Z '&-]{1,48}[A-Za-z]$/";

Try both your solution and my solution against the string

AbcdefghijklmnopqrstuvwxyzAbcdefghijklmnopqrstuvwxyzAbcdefghijklmnopqrstuvwxyz
Link to comment
Share on other sites

@kicken Yes. I am building a form and want PHP to "scold" users that try to submit junk. (It will be used internally by a staff group that needs to be reliied on and trusted. LOL)

I realize the need to sanitize names and was going to simply require a-zA-z to eliminate numbers and characters.

Then I realized that there are names with hyphens, apostrophes, and spaces, so down the rabbit hole I went.

@requinix  

Quote

I moved the opening parenthesis to the middle section.

which left me with

$pattern = "/^[A-Z]([a-zA-Z '&-]{1,48})*[A-Za-z]$/";

Is that the same as yours?

Do my parentheses alter the outcome?

So I guess, thanks to your assistance, I made progress after all.

I just have to be more careful about using cut & paste BEFORE my coffee.

Link to comment
Share on other sites

1 hour ago, phppup said:

Yes. I am building a form and want PHP to "scold" users that try to submit junk.

The point is that when it comes to things like names, the difference between junk and not junk is hard to define, and you're often better off just not even trying.  Better to accept a few junk records than to tell someone their real legal name is not valid. If you want to provide some filtering, you need to be a lot more permissive than you currently are.  Your regex for example would be telling Ms Bérénice Bejo that her name is invalid. 

We had an issue with a public request information form with a bunch of junk submissions, particularly name fields including Emoji characters.  What I ended up doing was applying a filter that checked the Unicode code point for each character in the name to ensure it was within a particular set of allowed unicode characters.  The sets of allowed characters I went with is pretty broad.  There's still plenty of opportunity for junk, but it does stop quite a bit of junk.

function validate_unicode_codepoints($allowedRanges, ...$strings) : bool{
    foreach ($strings as $str){
        $chars = mb_str_split($str, 1, 'utf-8');
        foreach (array_map('mb_ord', $chars) as $codePoint){
            $isInRange = false;
            foreach ($allowedRanges as $range){
                $isInRange = $isInRange || $codePoint >= $range[0] && $codePoint <= $range[1];
            }

            if (!$isInRange){
                return false;
            }
        }
    }

    return true;
}

 

1 hour ago, phppup said:

Do my parentheses alter the outcome?

The parenthesis on their own do not change things, but the * after them does.  * means "match the previous expression 0 or more times".  The parenthesis are "previous expression", which allows for between 1 and 48 occurrences of the indicated characters.  So your overall expression then would allow 0 or more instances of between 1 and 48 characters.  Effectively, a string of unlimited length so long as it matches the character list.

 

Link to comment
Share on other sites

@kicken Ahhhh, now I see it.

Quote

a string of unlimited length so long as it matches the character list.

So, with the parenthesis I can receive an input as you described.

But without the parenthesis I am successfully limiting the totality of characters submitted to  1+48+1 (adding the first and last).

At this point, I'm kinda okay with using only the American alphabet, but still want to accommodate complex names with apostrophes and hyphens (I'm still skeptical of names with underscores; and the few with numerals will need to come up with their own nickname ie:Musk's kid).

Thanks for the info

Edited by phppup
Typos
Link to comment
Share on other sites

Parentheses are for grouping. You use them when you want to deal with things as a group instead of each one individually.

"Alice is driving to work, and Bob is driving to work, and Cindy is driving to work": the three of them are each taking their own cars to work and contributing to local traffic problems.
"(Alice and Bob and Cindy) are driving to work": the three of them are carpooling like responsible human beings.

Having parentheses for the sake of having parentheses is wasteful but not inherently wrong. But when you throw other things into a regex, like + or *, and to apply them to the parenthesized group, then you change what the regex does.

"(Alice and Bob and Cindy)+ are driving to work": there are some number of people, every one of them named Alice or Bob or Cindy, and they are all driving to work together in one comically-oversized minivan.
"(Alice and Bob and Cindy)* are driving to work": maybe there are three people driving to work, or maybe there are more than three people, or maybe there aren't any people at all because it's the weekend and they don't work on the weekend.

Try this.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.