Jump to content

Recommended Posts

I agree with salathe that this is something that regex is not really built for.

 

You could validate it fairly easily with basic PHP though.

<?php
// Sample of date that might be posted.
$date = '30.02.2009'; # Invalid date.

// Fetch parts of the date that we can work with.
$date = preg_replace('/[^0-9]/', '', $date);
$dateDay = substr($date, 0, 2);
$dateMonth = substr($date, 2, 2);
$dateYear = substr($date, 4, 4);

// Get the last day of the month.
$lastDayOfMonth = date('d', mktime(0, 0, 0, $dateMonth + 1, 0, $dateYear));

// Ensure the chosen day is valid for the chosen month.
if ($dateDay > $lastDayOfMonth) {
// Error: Selected day is > the last day of the month.
}

There are different ways you can go about parsing $date. I just chose one that seemed most appropriate.  How you would parse it in the final version may vary depending on how flexible you want the field to be for users on the frontend.

Thanks for that. I was interested specifically in regex pattern, because that's what my1 JS widget needs to be given to display only dates I want.

I would probably need to hack into its internals to configure it in more sensible way, so I just decided it's not worth my time ;)

 


1: 'my' as in 'the one I am using'

I was bored, how does this grab you...

 

$pattern =
"~^(??:(?:0[1-9])|(?:[1-2][0-9])|(?:30))\.(??:01)|(?:03)|(?:05)|(?:07)|(?:08)|(?:10)|(?:12))|(??:0[1-9])|(?:[1-2][0-9]))\.(??:04)|(?:06)|(?:09)|(?:11))|(??:0[1-9])|(?:1[0-9])|(?:2[0-7]))\.(??:02)))\.\d{4}$~";

 

On a side note, that pattern looks exactly the same width as my screen....

I most certainly do have too much time on my hands, lol, that's what being unemployed does for you!

 

Your right of course that would be a neater/more efficient solution. I'm not sure what I was thinking there. Well actually I think for some reason I was thinking of alternation only matching a single character ie gra|ey matching gray or grey, but obviously that's only true if you include grouping to make it gr(a|e)y. So I went a little OTT on the grouping  :wtf:

 

Think I've tried to read about too many new Regular Expressions features today and pushed some of the old stuff out of my head, Homer style.  :shrug:

 

~^(??:0[1-9]|[1-2][0-9]|30)\.(?:01|03|05|07|08|10|12)|(?:0[1-9]|[1-2][0-9])\.(?:04|06|09|11)|(?:0[1-9]|1[0-9]|2[0-7])\.02)\.\d{4}$~

Good idea, not sure why I didn't think of that since that's essentially how the day section works.

 

~^(??:0[1-9]|[1-2][0-9]|30)\.(?:0[13578]|1[02])|(?:0[1-9]|[1-2][0-9])\.(?:0[469]|11)|(?:0[1-9]|1[0-9]|2[0-7])\.02)\.\d{4}$~

 

Edit: It's gradually getting shorter too! At this rate we'll have it down to a few chars, lol

The OP needs to be aware that this won't work properly for Februaries in leap years (the next being 2012). If that's not a concern then the regex above is a good starting point.  It could also be shortened further (you're doing a great job thus far; 217 down to 125 characters) if you feel like it.  ;)

Of cource if we were merely concentrating on character count we could simply scrap the ?: to save another 10 characters or so. One change that shouldn't change effeciency just sprung to mind though, swapping [0-9] for \d saves around 9 characters...

 

~^(??:0[1-9]|[1-2]\d|30)\.(?:0[13578]|1[02])|(?:0[1-9]|[1-2]\d)\.(?:0[469]|11)|(?:0[1-9]|1\d|2[0-7])\.02)\.\d{4}$~

 

...and without the year as suggested by the OP...

 

~^(??:0[1-9]|[1-2]\d|30)\.(?:0[13578]|1[02])|(?:0[1-9]|[1-2]\d)\.(?:0[469]|11)|(?:0[1-9]|1\d|2[0-7])\.02)$~

 

Faux Edit: Oh and since 1-2 is the same as saying 1 or 2. We can save 2 characters by getting rid of the ranging dash.

 

~^(??:0[1-9]|[12]\d|30)\.(?:0[13578]|1[02])|(?:0[1-9]|[12]\d)\.(?:0[469]|11)|(?:0[1-9]|1\d|2[0-7])\.02)$~

 

 

Of cource if we were merely concentrating on character count we could simply scrap the ?: to save another 10 characters or so. One change that shouldn't change effeciency just sprung to mind though, swapping [0-9] for d saves around 9 characters...

 

I haven't had a good read in this thread, and as a result, haven't had a good look at the pattern / situation at large, but personally, if you are not making use of captures, then there is no harm in using (?: ... ). I mean, sure it bloats the actual pattern a bit.. but at the end of the day, using that instead of needless captures makes regex faster and use less memory (thus is more efficient).

 

As for using \d, while in all likelyhood you won't run into exponents, do be aware that depending on your locale, \d might include these (not sure which ones do though off the top of my head..).. so if there isn't much of a chance of running into exponents (which doesn't seem likely), then using \d should be safe (I'm much more concerned with character class short hands like \w, as this is a potential wild canon as far as I'm concerned - but I digress). Of course, the benefit of using [0-9] ensures that only 0-9 will be considered, thus no surpises. As far as performance / efficiency is concerned, I don't think there would be much of a difference between the two (but again, depending on locale, \d might check for more than simply 0-9, which might make it *slower* then simply [0-9].. if this is the case, the speed difference will be infinitesimal I'm sure).

Good, good. You really wouldn't want to delve into checking for leap year dates (though it's possible)! :)

 

How would you write a regular expression that checks if a year is a leap year? It's a leap year if year mod 100 != 0 and year mod 4 = 0 or year mod 400 = 0. Then you also have to consider that leap years weren't invented up until the late 16th century (can't remember the exact year).

 

Even if you could somehow craft a regex that would do that it would still be ridiculous and this would be way faster:

$isLeap = ($year % 4 == 0 && $year % 100 != 0 || $year % 400 == 0) && $year >= 1582;

 

Edit: So we started with this leap year business in 1582. Updated code.

I haven't had a good read in this thread, and as a result, haven't had a good look at the pattern / situation at large, but personally, if you are not making use of captures, then there is no harm in using (?: ... ). I mean, sure it bloats the actual pattern a bit.. but at the end of the day, using that instead of needless captures makes regex faster and use less memory (thus is more efficient).

 

Yes indeed, I was being facetious hence the fact I didn't change the pattern. :)

 

As for using \d, while in all likelyhood you won't run into exponents, do be aware that depending on your locale, \d might include these (not sure which ones do though off the top of my head..).. so if there isn't much of a chance of running into exponents (which doesn't seem likely), then using \d should be safe (I'm much more concerned with character class short hands like \w, as this is a potential wild canon as far as I'm concerned - but I digress). Of course, the benefit of using [0-9] ensures that only 0-9 will be considered, thus no surpises. As far as performance / efficiency is concerned, I don't think there would be much of a difference between the two (but again, depending on locale, \d might check for more than simply 0-9, which might make it *slower* then simply [0-9].. if this is the case, the speed difference will be infinitesimal I'm sure).

 

I've never seen anywhere that \d would match more than 0-9 so it's interesting to know that it can and would be more interesting to know in what cases this can occur.

 

Even if you could somehow craft a regex that would do that it would still be ridiculous and this would be way faster:

 

Of course this whole topic could have been achieved more simply/more accurately without Regex, but it seems that wouldn't exactly help Mchl in this case due to the manner in which it's being used.

I've never seen anywhere that \d would match more than 0-9 so it's interesting to know that it can and would be more interesting to know in what cases this can occur.

 

Ask, and thou shalt receive :)  (granted, depending on your locale, this might not be an issue at all).

 

http://www.phpfreaks.com/forums/index.php/topic,252010.msg1183648.html#msg1183648

I'm much more concerned with character class short hands like \w, as this is a potential wild canon as far as I'm concerned - but I digress

 

Not in PCRE. There \w is always equivalent to [a-zA-Z0-9_] regardless of locale.

 

I understand... I wasn't specific (but now appears that I must be ;) ). Let me elaborate; depending on locale, \w might match more than simply [a-zA-Z0-9_].. so if this is what someone is checking for, and their locale matches more than this (think accented characters along with exponents for example), then one must either manually set their ctype accordingly, or explicitly use [a-zA-Z0-9_], otherwise they might be in for a surprise by leaving things to \w (my previous post to cags with the link discusses this issue).

I've never seen anywhere that \d would match more than 0-9 so it's interesting to know that it can and would be more interesting to know in what cases this can occur.

 

Ask, and thou shalt receive :)  (granted, depending on your locale, this might not be an issue at all).

 

http://www.phpfreaks.com/forums/index.php/topic,252010.msg1183648.html#msg1183648

 

Well, I never actually tested it and I've never needed it, but I looked it up in Mastering Regular Expressions:

 

If Unicode is supported, \w usually refers to all alphanumerics; notable exceptions include java.util.regex and PCRE (and by extension, PHP), whose \w are exactly [a-zA-Z0-9_].

 

After researching a bit more:

When running in UTF-8 mode' date=' this  applies  only  to characters  with  codes  less than 128. Higher-valued codes never match escapes such as \w or \d, but can be tested with \p if  PCRE  is  built with  Unicode  character property support.[/quote']

 

So we're both right I guess.

Well, I never actually tested it and I've never needed it, but I looked it up in Mastering Regular Expressions:

 

If Unicode is supported, \w usually refers to all alphanumerics; notable exceptions include java.util.regex and PCRE (and by extension, PHP), whose \w are exactly [a-zA-Z0-9_].

 

After researching a bit more:

When running in UTF-8 mode' date=' this  applies  only  to characters  with  codes  less than 128. Higher-valued codes never match escapes such as \w or \d, but can be tested with \p if  PCRE  is  built with  Unicode  character property support.[/quote']

 

Right (I never knew you had the book.. good stuff :) ). In my tests, if I only wanted [a-zA-Z0-9_], \w 'as is' doesn't cut it.. but mind you, this is due to my locale  LC_CTYPE setting. But again, this might not be an issue for others.. This is why I say I'm more concerned with throwing \w around more than \d (as odds are, running into exponents is probably slim...[unless one is scraping a math site perhaps, or other sites I'm not thinking about]). But with \w, this might involve more than desired.

Disregarding that talking about leap years isn't applicable before 1582, this is what I came up with for matching leap years only:

 

^[0-9]*?(?:[02468]?[048]|[13579][26])(?:00)?$

 

I think it should work.

 

Edit: Nevermind. It doesn't work. Crap.

 

Edit 2: Okay, so this regex will match all natural numbers where mod 400 = 0:

^0$|^[0-9]*?(?:[13579][26]|(?(?<=^)[02468]?|[02468])[048])00$

To be continued...

 

Edit 3: And I think this should match all natural numbers n where n mod 100 != 0 and n mod 4 == 0:

^[0-9]*?(?:[13579][26]|(?:[2468][048]|(?<=^)[048]))$

 

Edit 4: The above one doesn't work. This one almost works, but it matches all the hundreds which it isn't allowed to do:

^[0-9]*?(?:[13579][26]|(?:[02468][048]|(?<=^)[048]))$

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.