not(Last day of the month)

Mchl · November 4, 2009

Here's something tricky, that would help me quite a lot.

Date format is DD.MM.YYYY

How would you create a regex that would match all days except last day of each month (assume February has 28 days)

salathe · November 4, 2009

I haven't thought about it too much, but this doesn't feel like a comfortable fit for regex. Why not just use other date parsing/generating methods (the date or DateTime extensions)?

Mchl · November 4, 2009

It'd be for JS datepicker widget. But I guess I'll stick wit something easier (drop down list)

simshaun · November 4, 2009

I agree with salathe that this is something that regex is not really built for.

You could validate it fairly easily with basic PHP though.

<?php
// Sample of date that might be posted.
$date = '30.02.2009'; # Invalid date.

// Fetch parts of the date that we can work with.
$date = preg_replace('/[^0-9]/', '', $date);
$dateDay = substr($date, 0, 2);
$dateMonth = substr($date, 2, 2);
$dateYear = substr($date, 4, 4);

// Get the last day of the month.
$lastDayOfMonth = date('d', mktime(0, 0, 0, $dateMonth + 1, 0, $dateYear));

// Ensure the chosen day is valid for the chosen month.
if ($dateDay > $lastDayOfMonth) {
// Error: Selected day is > the last day of the month.
}

There are different ways you can go about parsing $date. I just chose one that seemed most appropriate. How you would parse it in the final version may vary depending on how flexible you want the field to be for users on the frontend.

Mchl · November 4, 2009

Thanks for that. I was interested specifically in regex pattern, because that's what my¹ JS widget needs to be given to display only dates I want.

I would probably need to hack into its internals to configure it in more sensible way, so I just decided it's not worth my time

¹: 'my' as in 'the one I am using'

cags · November 6, 2009

I was bored, how does this grab you...

$pattern =
"~^(??:(?:0[1-9])|(?:[1-2][0-9])|(?:30))\.(??:01)|(?:03)|(?:05)|(?:07)|(?:08)|(?:10)|(?:12))|(??:0[1-9])|(?:[1-2][0-9]))\.(??:04)|(?:06)|(?:09)|(?:11))|(??:0[1-9])|(?:1[0-9])|(?:2[0-7]))\.(??:02)))\.\d{4}$~";

On a side note, that pattern looks exactly the same width as my screen....

Mchl · November 6, 2009

Wow

It seems to do it's job within preg. I have some problems with implementing it into JS widget though. Must check if there are any syntax differences in JS regexp (yes, I removed delimiters ).

Kudos

MadTechie · November 6, 2009

Notices that cags has too much free time, nice job :thumb-up:

as a side note,

not sure about the use of non-captures

ie

(??:01)|(?:03)|(?:05)|(?:07)|(?:08)|(?:10)|(?:12))

wouldn't this work just as well ?

(?:01|03|05|07|08|10|12)

cags · November 6, 2009

I most certainly do have too much time on my hands, lol, that's what being unemployed does for you!

Your right of course that would be a neater/more efficient solution. I'm not sure what I was thinking there. Well actually I think for some reason I was thinking of alternation only matching a single character ie gra|ey matching gray or grey, but obviously that's only true if you include grouping to make it gr(a|e)y. So I went a little OTT on the grouping :wtf:

Think I've tried to read about too many new Regular Expressions features today and pushed some of the old stuff out of my head, Homer style. :shrug:

~^(??:0[1-9]|[1-2][0-9]|30)\.(?:01|03|05|07|08|10|12)|(?:0[1-9]|[1-2][0-9])\.(?:04|06|09|11)|(?:0[1-9]|1[0-9]|2[0-7])\.02)\.\d{4}$~

nrg_alpha · November 6, 2009

(?:01|03|05|07|08|10|12)

Since character classes are faster than alternations, I suppose that last part could also combine both and become:

(?:0[13578]|1[02])

cags · November 6, 2009

Good idea, not sure why I didn't think of that since that's essentially how the day section works.

~^(??:0[1-9]|[1-2][0-9]|30)\.(?:0[13578]|1[02])|(?:0[1-9]|[1-2][0-9])\.(?:0[469]|11)|(?:0[1-9]|1[0-9]|2[0-7])\.02)\.\d{4}$~

Edit: It's gradually getting shorter too! At this rate we'll have it down to a few chars, lol

salathe · November 6, 2009

The OP needs to be aware that this won't work properly for Februaries in leap years (the next being 2012). If that's not a concern then the regex above is a good starting point. It could also be shortened further (you're doing a great job thus far; 217 down to 125 characters) if you feel like it.

Mchl · November 6, 2009

OP is aware of that

(assume February has 28 days)

Asd for this specific application I wouldn't actually need YEAR part, so that's another 7 chars less

salathe · November 6, 2009

Good, good. You really wouldn't want to delve into checking for leap year dates (though it's possible)!

cags · November 6, 2009

Of cource if we were merely concentrating on character count we could simply scrap the ?: to save another 10 characters or so. One change that shouldn't change effeciency just sprung to mind though, swapping [0-9] for \d saves around 9 characters...

~^(??:0[1-9]|[1-2]\d|30)\.(?:0[13578]|1[02])|(?:0[1-9]|[1-2]\d)\.(?:0[469]|11)|(?:0[1-9]|1\d|2[0-7])\.02)\.\d{4}$~

...and without the year as suggested by the OP...

~^(??:0[1-9]|[1-2]\d|30)\.(?:0[13578]|1[02])|(?:0[1-9]|[1-2]\d)\.(?:0[469]|11)|(?:0[1-9]|1\d|2[0-7])\.02)$~

Faux Edit: Oh and since 1-2 is the same as saying 1 or 2. We can save 2 characters by getting rid of the ranging dash.

~^(??:0[1-9]|[12]\d|30)\.(?:0[13578]|1[02])|(?:0[1-9]|[12]\d)\.(?:0[469]|11)|(?:0[1-9]|1\d|2[0-7])\.02)$~

nrg_alpha · November 6, 2009

Of cource if we were merely concentrating on character count we could simply scrap the ?: to save another 10 characters or so. One change that shouldn't change effeciency just sprung to mind though, swapping [0-9] for d saves around 9 characters...

I haven't had a good read in this thread, and as a result, haven't had a good look at the pattern / situation at large, but personally, if you are not making use of captures, then there is no harm in using (?: ... ). I mean, sure it bloats the actual pattern a bit.. but at the end of the day, using that instead of needless captures makes regex faster and use less memory (thus is more efficient).

As for using \d, while in all likelyhood you won't run into exponents, do be aware that depending on your locale, \d might include these (not sure which ones do though off the top of my head..).. so if there isn't much of a chance of running into exponents (which doesn't seem likely), then using \d should be safe (I'm much more concerned with character class short hands like \w, as this is a potential wild canon as far as I'm concerned - but I digress). Of course, the benefit of using [0-9] ensures that only 0-9 will be considered, thus no surpises. As far as performance / efficiency is concerned, I don't think there would be much of a difference between the two (but again, depending on locale, \d might check for more than simply 0-9, which might make it *slower* then simply [0-9].. if this is the case, the speed difference will be infinitesimal I'm sure).

Daniel0 · November 6, 2009

Good, good. You really wouldn't want to delve into checking for leap year dates (though it's possible)!

How would you write a regular expression that checks if a year is a leap year? It's a leap year if year mod 100 != 0 and year mod 4 = 0 or year mod 400 = 0. Then you also have to consider that leap years weren't invented up until the late 16th century (can't remember the exact year).

Even if you could somehow craft a regex that would do that it would still be ridiculous and this would be way faster:

$isLeap = ($year % 4 == 0 && $year % 100 != 0 || $year % 400 == 0) && $year >= 1582;

Edit: So we started with this leap year business in 1582. Updated code.

cags · November 6, 2009

I haven't had a good read in this thread, and as a result, haven't had a good look at the pattern / situation at large, but personally, if you are not making use of captures, then there is no harm in using (?: ... ). I mean, sure it bloats the actual pattern a bit.. but at the end of the day, using that instead of needless captures makes regex faster and use less memory (thus is more efficient).

Yes indeed, I was being facetious hence the fact I didn't change the pattern.

As for using \d, while in all likelyhood you won't run into exponents, do be aware that depending on your locale, \d might include these (not sure which ones do though off the top of my head..).. so if there isn't much of a chance of running into exponents (which doesn't seem likely), then using \d should be safe (I'm much more concerned with character class short hands like \w, as this is a potential wild canon as far as I'm concerned - but I digress). Of course, the benefit of using [0-9] ensures that only 0-9 will be considered, thus no surpises. As far as performance / efficiency is concerned, I don't think there would be much of a difference between the two (but again, depending on locale, \d might check for more than simply 0-9, which might make it *slower* then simply [0-9].. if this is the case, the speed difference will be infinitesimal I'm sure).

I've never seen anywhere that \d would match more than 0-9 so it's interesting to know that it can and would be more interesting to know in what cases this can occur.

Even if you could somehow craft a regex that would do that it would still be ridiculous and this would be way faster:

Of course this whole topic could have been achieved more simply/more accurately without Regex, but it seems that wouldn't exactly help Mchl in this case due to the manner in which it's being used.

Daniel0 · November 6, 2009

I'm much more concerned with character class short hands like \w, as this is a potential wild canon as far as I'm concerned - but I digress

Not in PCRE. There \w is always equivalent to [a-zA-Z0-9_] regardless of locale.

nrg_alpha · November 6, 2009

I've never seen anywhere that \d would match more than 0-9 so it's interesting to know that it can and would be more interesting to know in what cases this can occur.

Ask, and thou shalt receive (granted, depending on your locale, this might not be an issue at all).

http://www.phpfreaks.com/forums/index.php/topic,252010.msg1183648.html#msg1183648

nrg_alpha · November 6, 2009

I'm much more concerned with character class short hands like \w, as this is a potential wild canon as far as I'm concerned - but I digress

Not in PCRE. There \w is always equivalent to [a-zA-Z0-9_] regardless of locale.

I understand... I wasn't specific (but now appears that I must be ). Let me elaborate; depending on locale, \w might match more than simply [a-zA-Z0-9_].. so if this is what someone is checking for, and their locale matches more than this (think accented characters along with exponents for example), then one must either manually set their ctype accordingly, or explicitly use [a-zA-Z0-9_], otherwise they might be in for a surprise by leaving things to \w (my previous post to cags with the link discusses this issue).

Daniel0 · November 6, 2009

I've never seen anywhere that \d would match more than 0-9 so it's interesting to know that it can and would be more interesting to know in what cases this can occur.

Ask, and thou shalt receive (granted, depending on your locale, this might not be an issue at all).

http://www.phpfreaks.com/forums/index.php/topic,252010.msg1183648.html#msg1183648

Well, I never actually tested it and I've never needed it, but I looked it up in Mastering Regular Expressions:

If Unicode is supported, \w usually refers to all alphanumerics; notable exceptions include java.util.regex and PCRE (and by extension, PHP), whose \w are exactly [a-zA-Z0-9_].

After researching a bit more:

When running in UTF-8 mode' date=' this applies only to characters with codes less than 128. Higher-valued codes never match escapes such as \w or \d, but can be tested with \p if PCRE is built with Unicode character property support.[/quote']

So we're both right I guess.

nrg_alpha · November 6, 2009

Well, I never actually tested it and I've never needed it, but I looked it up in Mastering Regular Expressions:

If Unicode is supported, \w usually refers to all alphanumerics; notable exceptions include java.util.regex and PCRE (and by extension, PHP), whose \w are exactly [a-zA-Z0-9_].

After researching a bit more:

When running in UTF-8 mode' date=' this applies only to characters with codes less than 128. Higher-valued codes never match escapes such as \w or \d, but can be tested with \p if PCRE is built with Unicode character property support.[/quote']

Right (I never knew you had the book.. good stuff ). In my tests, if I only wanted [a-zA-Z0-9_], \w 'as is' doesn't cut it.. but mind you, this is due to my locale LC_CTYPE setting. But again, this might not be an issue for others.. This is why I say I'm more concerned with throwing \w around more than \d (as odds are, running into exponents is probably slim...[unless one is scraping a math site perhaps, or other sites I'm not thinking about]). But with \w, this might involve more than desired.

Daniel0 · November 7, 2009

Disregarding that talking about leap years isn't applicable before 1582, this is what I came up with for matching leap years only:

^[0-9]*?(?:[02468]?[048]|[13579][26])(?:00)?$

I think it should work.

Edit: Nevermind. It doesn't work. Crap.

Edit 2: Okay, so this regex will match all natural numbers where mod 400 = 0:

^0$|^[0-9]*?(?:[13579][26]|(?(?<=^)[02468]?|[02468])[048])00$

To be continued...

Edit 3: And I think this should match all natural numbers n where n mod 100 != 0 and n mod 4 == 0:

^[0-9]*?(?:[13579][26]|(?:[2468][048]|(?<=^)[048]))$

Edit 4: The above one doesn't work. This one almost works, but it matches all the hundreds which it isn't allowed to do:

^[0-9]*?(?:[13579][26]|(?:[02468][048]|(?<=^)[048]))$

Mchl · November 7, 2009

Once we have this regex working, we should have it patented

Sign In

not(Last day of the month)

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Important Information