waynew Posted May 19, 2009 Share Posted May 19, 2009 I'm not going to try to hide it. I'm crap at regex and have never found the time to brush up on it. But for a site I'm doing, I sure could do with some help on a number of user-submitted fields. 1: Probably the easiest. I ask them to enter their Bebo username. Bebo usernames only allow alphanumeric characters and underscores; so I'd have to make sure that was the case when they entered theirs on my site. 2: A little more difficult, They have to enter a link similar to http://www.bebo.com/Profile.jsp?PreviewSkinId=4867349407 The only thing that will be different is the PreviewSkinId number. Hopefully I can learn a little from any examples. Or at least, re-use the code in the future. Quote Link to comment Share on other sites More sharing options...
Axeia Posted May 19, 2009 Share Posted May 19, 2009 For the first one [0-9A-Z_] should work (use the i flag the end for case insensitivity as mentioned on (http://php.net/preg_match) For the second one substr with some negative numbers if the number at the end is always the same length. Quote Link to comment Share on other sites More sharing options...
Maq Posted May 19, 2009 Share Posted May 19, 2009 Did some minor testing, I'm not regexpert but they seem to work fine: $s = "2user_n3ame_"; if(preg_match("~^([\w\d_])+$~i", $s)) { echo "valid"; } else { echo "invalid"; } $u = "http://www.bebo.com/Profile.jsp?PreviewSkinId=4867349407"; if(preg_match("~http://www\.bebo\.com/Profile\.jsp\?previewSkinId=([\d])+$~i", $u)) { echo "\nvalid"; } else { echo "\ninvalid"; } ?> Quote Link to comment Share on other sites More sharing options...
waynew Posted May 19, 2009 Author Share Posted May 19, 2009 Thanks guys. For your help you will be awarded seventeen virgins in heaven. Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted May 20, 2009 Share Posted May 20, 2009 Did some minor testing, I'm not regexpert but they seem to work fine: You're getting there, Maq Just note that for ([\w\d_])+, the shorthand character class \w by default will match a-zA-Z0-9_ (locale issues of potentially matching even more characters than that aside), so you don't need the \d nor the _ afterwards (and as a result, nor is the character class [] characters themselves needed). The parenthesis is for capturing, but since the goal is simply to check for a format (at least from the look of things anyways), those shouldn't be needed: if(preg_match('~^\w+$~i', $s)) Same kind of ordeal with the second solution (again, assuming it's just a format, not doing anything with the numbers): ...previewSkinId=([\d])+$ could simply be previewSkinId=\d+$ Quote Link to comment Share on other sites More sharing options...
GingerRobot Posted May 20, 2009 Share Posted May 20, 2009 If they're entering the same link with just the ID different, why not just ask for that? Be easier to validate and would mean if bebo alter anything it shouldn't affect you. Quote Link to comment Share on other sites More sharing options...
waynew Posted May 20, 2009 Author Share Posted May 20, 2009 If they're entering the same link with just the ID different, why not just ask for that? Be easier to validate and would mean if bebo alter anything it shouldn't affect you. Oh I wish. But you see, I'm expecting the majority of my users to be between 14 & 18. The majority of them probably don't (nor want to) understand what number is needed for the link to be valid. Ok... I've edited Maq's code and added nrg_alphas recommendation... so is the code below fit for use? <?php $u = "http://www.bebo.com/Profile.jsp?PreviewSkinId=4867349407"; if(preg_match("~http://www\.bebo\.com/Profile\.jsp\?previewSkinId=\d+$",$u)) { echo "\nvalid"; } else { echo "\ninvalid"; } ?> Quote Link to comment Share on other sites More sharing options...
Maq Posted May 20, 2009 Share Posted May 20, 2009 You're getting there, Maq Haha thanks, I try. Just note that for ([\w\d_])+, the shorthand character class \w by default will match a-zA-Z0-9_ (locale issues of potentially matching even more characters than that aside), so you don't need the \d nor the _ afterwards (and as a result, nor is the character class [] characters themselves needed). The parenthesis is for capturing, but since the goal is simply to check for a format (at least from the look of things anyways), those shouldn't be needed: Is this true for just PCRE, or all regex engines? Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted May 20, 2009 Share Posted May 20, 2009 Just note that for ([\w\d_])+, the shorthand character class \w by default will match a-zA-Z0-9_ (locale issues of potentially matching even more characters than that aside), so you don't need the \d nor the _ afterwards (and as a result, nor is the character class [] characters themselves needed). The parenthesis is for capturing, but since the goal is simply to check for a format (at least from the look of things anyways), those shouldn't be needed: Is this true for just PCRE, or all regex engines? You mean about \w? I only use PCRE, so I'm not versed in other engines.. but according to the Master Regular Expressions book: Perl and most other programs consider alphanumerics and underscore to be part of a word \w Part-of-word character Often the same as [a-zA-Z0-9_]. Some tools omit the underscore, while others include all alphanumerics in the current locale. If Unicode is supported, \w usually refers to all alphanumerics; notable exceptions include java.util.regex and PCRE (and by extension, PHP), whose \w are exactly [a-zA-Z0-9_]. But yeah, depending on your locale, in may not be exactly [a-zA-Z0-9_]. For me, if I want that to be the case, I have to set my LC_CTYPE variable to 'C' (I just link to threads to save some retyping). But I digress... All in all your solution works, and that's the important thing! Quote Link to comment Share on other sites More sharing options...
Maq Posted May 20, 2009 Share Posted May 20, 2009 Just note that for ([\w\d_])+, the shorthand character class \w by default will match a-zA-Z0-9_ (locale issues of potentially matching even more characters than that aside), so you don't need the \d nor the _ afterwards (and as a result, nor is the character class [] characters themselves needed). The parenthesis is for capturing, but since the goal is simply to check for a format (at least from the look of things anyways), those shouldn't be needed: Is this true for just PCRE, or all regex engines? You mean about \w? I only use PCRE, so I'm not versed in other engines.. but according to the Master Regular Expressions book: Perl and most other programs consider alphanumerics and underscore to be part of a word \w Part-of-word character Often the same as [a-zA-Z0-9_]. Some tools omit the underscore, while others include all alphanumerics in the current locale. If Unicode is supported, \w usually refers to all alphanumerics; notable exceptions include java.util.regex and PCRE (and by extension, PHP), whose \w are exactly [a-zA-Z0-9_]. But yeah, depending on your locale, in may not be exactly [a-zA-Z0-9_]. For me, if I want that to be the case, I have to set my LC_CTYPE variable to 'C' (I just link to threads to save some retyping). But I digress... All in all your solution works, and that's the important thing! I agree, thanks for the info. Quote Link to comment Share on other sites More sharing options...
.josh Posted May 20, 2009 Share Posted May 20, 2009 I used to like the shortcut char classes but then I found out about potential locality discrepancies in interpretation, so I usually use a char class explicitly writing the stuff out. Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted May 20, 2009 Share Posted May 20, 2009 I suppose for all intents and purposes, \w, \d and the like won't get you into trouble via matching stuff you didn't expect (but I guess you can never be too sure - one day, it's bound to bite someone in rear). Definitely declaring things explicitly in your own character class is a sure fire way.. either that or simply use setlocale(LC_CTYPE, 'C'); to make sure that those shorthand character classes behave as expected. Quote Link to comment Share on other sites More sharing options...
.josh Posted May 20, 2009 Share Posted May 20, 2009 well using setlocale is fine and dandy within php environment but from a portability perspective.... Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted May 20, 2009 Share Posted May 20, 2009 True enough. I'm always assuming it's from a php environment (unless the OP specifies otherwise) as this is a php forum after all. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.