Mchl Posted November 4, 2009 Share Posted November 4, 2009 Here's something tricky, that would help me quite a lot. Date format is DD.MM.YYYY How would you create a regex that would match all days except last day of each month (assume February has 28 days) Quote Link to comment https://forums.phpfreaks.com/topic/180264-notlast-day-of-the-month/ Share on other sites More sharing options...
salathe Posted November 4, 2009 Share Posted November 4, 2009 I haven't thought about it too much, but this doesn't feel like a comfortable fit for regex. Why not just use other date parsing/generating methods (the date or DateTime extensions)? Quote Link to comment https://forums.phpfreaks.com/topic/180264-notlast-day-of-the-month/#findComment-950933 Share on other sites More sharing options...
Mchl Posted November 4, 2009 Author Share Posted November 4, 2009 It'd be for JS datepicker widget. But I guess I'll stick wit something easier (drop down list) Quote Link to comment https://forums.phpfreaks.com/topic/180264-notlast-day-of-the-month/#findComment-950934 Share on other sites More sharing options...
simshaun Posted November 4, 2009 Share Posted November 4, 2009 I agree with salathe that this is something that regex is not really built for. You could validate it fairly easily with basic PHP though. <?php // Sample of date that might be posted. $date = '30.02.2009'; # Invalid date. // Fetch parts of the date that we can work with. $date = preg_replace('/[^0-9]/', '', $date); $dateDay = substr($date, 0, 2); $dateMonth = substr($date, 2, 2); $dateYear = substr($date, 4, 4); // Get the last day of the month. $lastDayOfMonth = date('d', mktime(0, 0, 0, $dateMonth + 1, 0, $dateYear)); // Ensure the chosen day is valid for the chosen month. if ($dateDay > $lastDayOfMonth) { // Error: Selected day is > the last day of the month. } There are different ways you can go about parsing $date. I just chose one that seemed most appropriate. How you would parse it in the final version may vary depending on how flexible you want the field to be for users on the frontend. Quote Link to comment https://forums.phpfreaks.com/topic/180264-notlast-day-of-the-month/#findComment-951227 Share on other sites More sharing options...
Mchl Posted November 4, 2009 Author Share Posted November 4, 2009 Thanks for that. I was interested specifically in regex pattern, because that's what my1 JS widget needs to be given to display only dates I want. I would probably need to hack into its internals to configure it in more sensible way, so I just decided it's not worth my time 1: 'my' as in 'the one I am using' Quote Link to comment https://forums.phpfreaks.com/topic/180264-notlast-day-of-the-month/#findComment-951232 Share on other sites More sharing options...
cags Posted November 6, 2009 Share Posted November 6, 2009 I was bored, how does this grab you... $pattern = "~^(??:(?:0[1-9])|(?:[1-2][0-9])|(?:30))\.(??:01)|(?:03)|(?:05)|(?:07)|(?:08)|(?:10)|(?:12))|(??:0[1-9])|(?:[1-2][0-9]))\.(??:04)|(?:06)|(?:09)|(?:11))|(??:0[1-9])|(?:1[0-9])|(?:2[0-7]))\.(??:02)))\.\d{4}$~"; On a side note, that pattern looks exactly the same width as my screen.... Quote Link to comment https://forums.phpfreaks.com/topic/180264-notlast-day-of-the-month/#findComment-952621 Share on other sites More sharing options...
Mchl Posted November 6, 2009 Author Share Posted November 6, 2009 Wow It seems to do it's job within preg. I have some problems with implementing it into JS widget though. Must check if there are any syntax differences in JS regexp (yes, I removed delimiters ). Kudos Quote Link to comment https://forums.phpfreaks.com/topic/180264-notlast-day-of-the-month/#findComment-952674 Share on other sites More sharing options...
MadTechie Posted November 6, 2009 Share Posted November 6, 2009 Notices that cags has too much free time, nice job as a side note, not sure about the use of non-captures ie (??:01)|(?:03)|(?:05)|(?:07)|(?:08)|(?:10)|(?:12)) wouldn't this work just as well ? (?:01|03|05|07|08|10|12) Quote Link to comment https://forums.phpfreaks.com/topic/180264-notlast-day-of-the-month/#findComment-952688 Share on other sites More sharing options...
cags Posted November 6, 2009 Share Posted November 6, 2009 I most certainly do have too much time on my hands, lol, that's what being unemployed does for you! Your right of course that would be a neater/more efficient solution. I'm not sure what I was thinking there. Well actually I think for some reason I was thinking of alternation only matching a single character ie gra|ey matching gray or grey, but obviously that's only true if you include grouping to make it gr(a|e)y. So I went a little OTT on the grouping Think I've tried to read about too many new Regular Expressions features today and pushed some of the old stuff out of my head, Homer style. ~^(??:0[1-9]|[1-2][0-9]|30)\.(?:01|03|05|07|08|10|12)|(?:0[1-9]|[1-2][0-9])\.(?:04|06|09|11)|(?:0[1-9]|1[0-9]|2[0-7])\.02)\.\d{4}$~ Quote Link to comment https://forums.phpfreaks.com/topic/180264-notlast-day-of-the-month/#findComment-952704 Share on other sites More sharing options...
nrg_alpha Posted November 6, 2009 Share Posted November 6, 2009 (?:01|03|05|07|08|10|12) Since character classes are faster than alternations, I suppose that last part could also combine both and become: (?:0[13578]|1[02]) Quote Link to comment https://forums.phpfreaks.com/topic/180264-notlast-day-of-the-month/#findComment-952722 Share on other sites More sharing options...
cags Posted November 6, 2009 Share Posted November 6, 2009 Good idea, not sure why I didn't think of that since that's essentially how the day section works. ~^(??:0[1-9]|[1-2][0-9]|30)\.(?:0[13578]|1[02])|(?:0[1-9]|[1-2][0-9])\.(?:0[469]|11)|(?:0[1-9]|1[0-9]|2[0-7])\.02)\.\d{4}$~ Edit: It's gradually getting shorter too! At this rate we'll have it down to a few chars, lol Quote Link to comment https://forums.phpfreaks.com/topic/180264-notlast-day-of-the-month/#findComment-952726 Share on other sites More sharing options...
salathe Posted November 6, 2009 Share Posted November 6, 2009 The OP needs to be aware that this won't work properly for Februaries in leap years (the next being 2012). If that's not a concern then the regex above is a good starting point. It could also be shortened further (you're doing a great job thus far; 217 down to 125 characters) if you feel like it. Quote Link to comment https://forums.phpfreaks.com/topic/180264-notlast-day-of-the-month/#findComment-952737 Share on other sites More sharing options...
Mchl Posted November 6, 2009 Author Share Posted November 6, 2009 OP is aware of that (assume February has 28 days) Asd for this specific application I wouldn't actually need YEAR part, so that's another 7 chars less Quote Link to comment https://forums.phpfreaks.com/topic/180264-notlast-day-of-the-month/#findComment-952739 Share on other sites More sharing options...
salathe Posted November 6, 2009 Share Posted November 6, 2009 Good, good. You really wouldn't want to delve into checking for leap year dates (though it's possible)! Quote Link to comment https://forums.phpfreaks.com/topic/180264-notlast-day-of-the-month/#findComment-952742 Share on other sites More sharing options...
cags Posted November 6, 2009 Share Posted November 6, 2009 Of cource if we were merely concentrating on character count we could simply scrap the ?: to save another 10 characters or so. One change that shouldn't change effeciency just sprung to mind though, swapping [0-9] for \d saves around 9 characters... ~^(??:0[1-9]|[1-2]\d|30)\.(?:0[13578]|1[02])|(?:0[1-9]|[1-2]\d)\.(?:0[469]|11)|(?:0[1-9]|1\d|2[0-7])\.02)\.\d{4}$~ ...and without the year as suggested by the OP... ~^(??:0[1-9]|[1-2]\d|30)\.(?:0[13578]|1[02])|(?:0[1-9]|[1-2]\d)\.(?:0[469]|11)|(?:0[1-9]|1\d|2[0-7])\.02)$~ Faux Edit: Oh and since 1-2 is the same as saying 1 or 2. We can save 2 characters by getting rid of the ranging dash. ~^(??:0[1-9]|[12]\d|30)\.(?:0[13578]|1[02])|(?:0[1-9]|[12]\d)\.(?:0[469]|11)|(?:0[1-9]|1\d|2[0-7])\.02)$~ Quote Link to comment https://forums.phpfreaks.com/topic/180264-notlast-day-of-the-month/#findComment-952756 Share on other sites More sharing options...
nrg_alpha Posted November 6, 2009 Share Posted November 6, 2009 Of cource if we were merely concentrating on character count we could simply scrap the ?: to save another 10 characters or so. One change that shouldn't change effeciency just sprung to mind though, swapping [0-9] for d saves around 9 characters... I haven't had a good read in this thread, and as a result, haven't had a good look at the pattern / situation at large, but personally, if you are not making use of captures, then there is no harm in using (?: ... ). I mean, sure it bloats the actual pattern a bit.. but at the end of the day, using that instead of needless captures makes regex faster and use less memory (thus is more efficient). As for using \d, while in all likelyhood you won't run into exponents, do be aware that depending on your locale, \d might include these (not sure which ones do though off the top of my head..).. so if there isn't much of a chance of running into exponents (which doesn't seem likely), then using \d should be safe (I'm much more concerned with character class short hands like \w, as this is a potential wild canon as far as I'm concerned - but I digress). Of course, the benefit of using [0-9] ensures that only 0-9 will be considered, thus no surpises. As far as performance / efficiency is concerned, I don't think there would be much of a difference between the two (but again, depending on locale, \d might check for more than simply 0-9, which might make it *slower* then simply [0-9].. if this is the case, the speed difference will be infinitesimal I'm sure). Quote Link to comment https://forums.phpfreaks.com/topic/180264-notlast-day-of-the-month/#findComment-952781 Share on other sites More sharing options...
Daniel0 Posted November 6, 2009 Share Posted November 6, 2009 Good, good. You really wouldn't want to delve into checking for leap year dates (though it's possible)! How would you write a regular expression that checks if a year is a leap year? It's a leap year if year mod 100 != 0 and year mod 4 = 0 or year mod 400 = 0. Then you also have to consider that leap years weren't invented up until the late 16th century (can't remember the exact year). Even if you could somehow craft a regex that would do that it would still be ridiculous and this would be way faster: $isLeap = ($year % 4 == 0 && $year % 100 != 0 || $year % 400 == 0) && $year >= 1582; Edit: So we started with this leap year business in 1582. Updated code. Quote Link to comment https://forums.phpfreaks.com/topic/180264-notlast-day-of-the-month/#findComment-952788 Share on other sites More sharing options...
cags Posted November 6, 2009 Share Posted November 6, 2009 I haven't had a good read in this thread, and as a result, haven't had a good look at the pattern / situation at large, but personally, if you are not making use of captures, then there is no harm in using (?: ... ). I mean, sure it bloats the actual pattern a bit.. but at the end of the day, using that instead of needless captures makes regex faster and use less memory (thus is more efficient). Yes indeed, I was being facetious hence the fact I didn't change the pattern. As for using \d, while in all likelyhood you won't run into exponents, do be aware that depending on your locale, \d might include these (not sure which ones do though off the top of my head..).. so if there isn't much of a chance of running into exponents (which doesn't seem likely), then using \d should be safe (I'm much more concerned with character class short hands like \w, as this is a potential wild canon as far as I'm concerned - but I digress). Of course, the benefit of using [0-9] ensures that only 0-9 will be considered, thus no surpises. As far as performance / efficiency is concerned, I don't think there would be much of a difference between the two (but again, depending on locale, \d might check for more than simply 0-9, which might make it *slower* then simply [0-9].. if this is the case, the speed difference will be infinitesimal I'm sure). I've never seen anywhere that \d would match more than 0-9 so it's interesting to know that it can and would be more interesting to know in what cases this can occur. Even if you could somehow craft a regex that would do that it would still be ridiculous and this would be way faster: Of course this whole topic could have been achieved more simply/more accurately without Regex, but it seems that wouldn't exactly help Mchl in this case due to the manner in which it's being used. Quote Link to comment https://forums.phpfreaks.com/topic/180264-notlast-day-of-the-month/#findComment-952793 Share on other sites More sharing options...
Daniel0 Posted November 6, 2009 Share Posted November 6, 2009 I'm much more concerned with character class short hands like \w, as this is a potential wild canon as far as I'm concerned - but I digress Not in PCRE. There \w is always equivalent to [a-zA-Z0-9_] regardless of locale. Quote Link to comment https://forums.phpfreaks.com/topic/180264-notlast-day-of-the-month/#findComment-952797 Share on other sites More sharing options...
nrg_alpha Posted November 6, 2009 Share Posted November 6, 2009 I've never seen anywhere that \d would match more than 0-9 so it's interesting to know that it can and would be more interesting to know in what cases this can occur. Ask, and thou shalt receive (granted, depending on your locale, this might not be an issue at all). http://www.phpfreaks.com/forums/index.php/topic,252010.msg1183648.html#msg1183648 Quote Link to comment https://forums.phpfreaks.com/topic/180264-notlast-day-of-the-month/#findComment-952799 Share on other sites More sharing options...
nrg_alpha Posted November 6, 2009 Share Posted November 6, 2009 I'm much more concerned with character class short hands like \w, as this is a potential wild canon as far as I'm concerned - but I digress Not in PCRE. There \w is always equivalent to [a-zA-Z0-9_] regardless of locale. I understand... I wasn't specific (but now appears that I must be ). Let me elaborate; depending on locale, \w might match more than simply [a-zA-Z0-9_].. so if this is what someone is checking for, and their locale matches more than this (think accented characters along with exponents for example), then one must either manually set their ctype accordingly, or explicitly use [a-zA-Z0-9_], otherwise they might be in for a surprise by leaving things to \w (my previous post to cags with the link discusses this issue). Quote Link to comment https://forums.phpfreaks.com/topic/180264-notlast-day-of-the-month/#findComment-952802 Share on other sites More sharing options...
Daniel0 Posted November 6, 2009 Share Posted November 6, 2009 I've never seen anywhere that \d would match more than 0-9 so it's interesting to know that it can and would be more interesting to know in what cases this can occur. Ask, and thou shalt receive (granted, depending on your locale, this might not be an issue at all). http://www.phpfreaks.com/forums/index.php/topic,252010.msg1183648.html#msg1183648 Well, I never actually tested it and I've never needed it, but I looked it up in Mastering Regular Expressions: If Unicode is supported, \w usually refers to all alphanumerics; notable exceptions include java.util.regex and PCRE (and by extension, PHP), whose \w are exactly [a-zA-Z0-9_]. After researching a bit more: When running in UTF-8 mode' date=' this applies only to characters with codes less than 128. Higher-valued codes never match escapes such as \w or \d, but can be tested with \p if PCRE is built with Unicode character property support.[/quote'] So we're both right I guess. Quote Link to comment https://forums.phpfreaks.com/topic/180264-notlast-day-of-the-month/#findComment-952803 Share on other sites More sharing options...
nrg_alpha Posted November 6, 2009 Share Posted November 6, 2009 Well, I never actually tested it and I've never needed it, but I looked it up in Mastering Regular Expressions: If Unicode is supported, \w usually refers to all alphanumerics; notable exceptions include java.util.regex and PCRE (and by extension, PHP), whose \w are exactly [a-zA-Z0-9_]. After researching a bit more: When running in UTF-8 mode' date=' this applies only to characters with codes less than 128. Higher-valued codes never match escapes such as \w or \d, but can be tested with \p if PCRE is built with Unicode character property support.[/quote'] Right (I never knew you had the book.. good stuff ). In my tests, if I only wanted [a-zA-Z0-9_], \w 'as is' doesn't cut it.. but mind you, this is due to my locale LC_CTYPE setting. But again, this might not be an issue for others.. This is why I say I'm more concerned with throwing \w around more than \d (as odds are, running into exponents is probably slim...[unless one is scraping a math site perhaps, or other sites I'm not thinking about]). But with \w, this might involve more than desired. Quote Link to comment https://forums.phpfreaks.com/topic/180264-notlast-day-of-the-month/#findComment-952806 Share on other sites More sharing options...
Daniel0 Posted November 7, 2009 Share Posted November 7, 2009 Disregarding that talking about leap years isn't applicable before 1582, this is what I came up with for matching leap years only: ^[0-9]*?(?:[02468]?[048]|[13579][26])(?:00)?$ I think it should work. Edit: Nevermind. It doesn't work. Crap. Edit 2: Okay, so this regex will match all natural numbers where mod 400 = 0: ^0$|^[0-9]*?(?:[13579][26]|(?(?<=^)[02468]?|[02468])[048])00$ To be continued... Edit 3: And I think this should match all natural numbers n where n mod 100 != 0 and n mod 4 == 0: ^[0-9]*?(?:[13579][26]|(?:[2468][048]|(?<=^)[048]))$ Edit 4: The above one doesn't work. This one almost works, but it matches all the hundreds which it isn't allowed to do: ^[0-9]*?(?:[13579][26]|(?:[02468][048]|(?<=^)[048]))$ Quote Link to comment https://forums.phpfreaks.com/topic/180264-notlast-day-of-the-month/#findComment-953098 Share on other sites More sharing options...
Mchl Posted November 7, 2009 Author Share Posted November 7, 2009 Once we have this regex working, we should have it patented Quote Link to comment https://forums.phpfreaks.com/topic/180264-notlast-day-of-the-month/#findComment-953108 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.