appobs Posted July 22, 2014 Share Posted July 22, 2014 For a very 'loose' email address validator, does the following regex need the backslash before the dot? ^\S+@\S+\.\S+$ (Checks for anything@anything.anything but I'm gonna change that to anything.anything to make it even 'looser') I tested with and without the backslash using rubular.com and it seems superfluous but I'd like a second opinion or two please. ALSO: does rubular.com use the right engine to be correct for php? I hope so coz it's the first time I've been able to fully understand some regexes I've been blindly implementing for some time! Many thanks in advance Quote Link to comment https://forums.phpfreaks.com/topic/290054-does-this-regex-need-the-backslash-before-the-dot-sss/ Share on other sites More sharing options...
requinix Posted July 22, 2014 Share Posted July 22, 2014 (edited) For a very 'loose' email address validator, does the following regex need the backslash before the dot? ^\S+@\S+\.\S+$ If it has the backslash then it means a literal period. If it does not then it means any character. So what would you think? I tested with and without the backslash using rubular.com and it seems superfluous but I'd like a second opinion or two please.Are you making sure to test with valid and invalid strings? ALSO: does rubular.com use the right engine to be correct for php? I hope so coz it's the first time I've been able to fully understand some regexes I've been blindly implementing for some time!You did see that it's called "a Ruby regular expression editor", right? There are many other options that specifically say they're good for PHP and/or PCRE (which is what PHP uses). Ruby's regex syntax looks the same but I wouldn't rely on that. Edited July 22, 2014 by requinix Quote Link to comment https://forums.phpfreaks.com/topic/290054-does-this-regex-need-the-backslash-before-the-dot-sss/#findComment-1485893 Share on other sites More sharing options...
cyberRobot Posted July 22, 2014 Share Posted July 22, 2014 Side note: did you know that PHP has a built-in function for validating email addresses? http://php.net/manual/en/filter.examples.validation.php 1 Quote Link to comment https://forums.phpfreaks.com/topic/290054-does-this-regex-need-the-backslash-before-the-dot-sss/#findComment-1485897 Share on other sites More sharing options...
appobs Posted July 22, 2014 Author Share Posted July 22, 2014 snip Tested with some invalid strings but not enough obviously! So characters that do not have a special meaning in regex don't need the backslash to be literal but even . is literal without the backslash if it's contained within [ ] ?? I'd love to get some handle on regex. This is the regex that came with the original version of the contact form I've been adapting: /^([a-zA-Z0-9])+([a-zA-Z0-9._-])*@([a-zA-Z0-9_-])+([a-zA-Z0-9._-]+)+$/ Are ._- and ._- literal because they're inside [ ]? Quote Link to comment https://forums.phpfreaks.com/topic/290054-does-this-regex-need-the-backslash-before-the-dot-sss/#findComment-1485922 Share on other sites More sharing options...
appobs Posted July 22, 2014 Author Share Posted July 22, 2014 Side note: did you know that PHP has a built-in function for validating email addresses? http://php.net/manual/en/filter.examples.validation.php Yeah I know, since regex was used in the contact form I'm adapting it seems the right time to learn a bit about it. Doesn't the php email address validator have flaws? Maybe there's a newer one or maybe those flaws just don't matter. I've read so much waffle about email address validation, not sure where I am with it anymore. Certainly don't want to use something as far outside my understanding as a certain monstrous RFC regex I've seen. Quote Link to comment https://forums.phpfreaks.com/topic/290054-does-this-regex-need-the-backslash-before-the-dot-sss/#findComment-1485923 Share on other sites More sharing options...
kicken Posted July 22, 2014 Share Posted July 22, 2014 ... as a certain monstrous RFC regex I've seen.That is what PHP uses behind the scenes to power the email validation filter. Maybe not the same regex you saw but it's a pretty big one. There are no flaws with it you need to be concerned about. Quote Link to comment https://forums.phpfreaks.com/topic/290054-does-this-regex-need-the-backslash-before-the-dot-sss/#findComment-1485930 Share on other sites More sharing options...
requinix Posted July 22, 2014 Share Posted July 22, 2014 a certain monstrous RFC regex I've seen.Probably not the one, but I love showing this off any time I get the chance. /^(((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P\(((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(([\x21-\x27\x2A-\x5B\x5D-\x7E]|[\x01-\x08\x0B\x0C\x0E-\x1F\x7F])|(\\([\x21-\x7E]|[\x20\t])|\\(\0|[\x01-\x08\x0B\x0C\x0E-\x1F\x7F]|\n|\r))|(?P>COMMENT)))*(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?\)))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?(([A-Za-z]|[0-9]|[!#$%&'*+\-\/=?^_`{|}~]))+(\.(([A-Za-z]|[0-9]|[!#$%&'*+\-\/=?^_`{|}~]))+)*((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?|((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?\x22((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(([\x21\x23-\x5B\x5D-\x7E]|[\x01-\x08\x0B\x0C\x0E-\x1F\x7F])|(\\([\x21-\x7E]|[\x20\t])|\\(\0|[\x01-\x08\x0B\x0C\x0E-\x1F\x7F]|\n|\r))))*(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?\x22((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?|(((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?(([A-Za-z]|[0-9]|[!#$%&'*+\-\/=?^_`{|}~]))+((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?|((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?\x22((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(([\x21\x23-\x5B\x5D-\x7E]|[\x01-\x08\x0B\x0C\x0E-\x1F\x7F])|(\\([\x21-\x7E]|[\x20\t])|\\(\0|[\x01-\x08\x0B\x0C\x0E-\x1F\x7F]|\n|\r))))*(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?\x22((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?)(\.(((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?(([A-Za-z]|[0-9]|[!#$%&'*+\-\/=?^_`{|}~]))+((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?|((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?\x22((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(([\x21\x23-\x5B\x5D-\x7E]|[\x01-\x08\x0B\x0C\x0E-\x1F\x7F])|(\\([\x21-\x7E]|[\x20\t])|\\(\0|[\x01-\x08\x0B\x0C\x0E-\x1F\x7F]|\n|\r))))*(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?\x22((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?))*)@(((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?(([A-Za-z]|[0-9]|[!#$%&'*+\-\/=?^_`{|}~]))+(\.(([A-Za-z]|[0-9]|[!#$%&'*+\-\/=?^_`{|}~]))+)*((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?|((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?\[((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?([\x21-\x5A\x5E-\x7E]|([\x01-\x08\x0B\x0C\x0E-\x1F\x7F]|(\\([\x21-\x7E]|[\x20\t])|\\(\0|[\x01-\x08\x0B\x0C\x0E-\x1F\x7F]|\n|\r)))))*(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?\]((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?|((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?(([A-Za-z]|[0-9]|[!#$%&'*+\-\/=?^_`{|}~]))+((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?(\.((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?(([A-Za-z]|[0-9]|[!#$%&'*+\-\/=?^_`{|}~]))+((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?)*)$/Programmatically generated so it can be cleaned up a bit, but still... Quote Link to comment https://forums.phpfreaks.com/topic/290054-does-this-regex-need-the-backslash-before-the-dot-sss/#findComment-1485934 Share on other sites More sharing options...
Jacques1 Posted July 22, 2014 Share Posted July 22, 2014 (edited) So characters that do not have a special meaning in regex don't need the backslash to be literal You generally must not prepend a backslash to literal characters. For example, “\s” is obviously something completely different than a simple “s”. but even . is literal without the backslash if it's contained within [ ] Yes. I'd love to get some handle on regex. This is the regex that came with the original version of the contact form I've been adapting: /^([a-zA-Z0-9])+([a-zA-Z0-9._-])*@([a-zA-Z0-9_-])+([a-zA-Z0-9._-]+)+$/ Like almost all home-made e-mail regexes, this is nonsense, both technically and semantically. Whoever wrote this doesn't really understand regexes and hasn't bothered to read the standards. So don't use this is a template or for learning. Are ._- and ._- literal because they're inside [ ]? An underscore never has a special meaning. And a hyphen within a character class is only taken literally if it's at the beginning or the end. Otherwise it signifies a range (as you've already seen). Doesn't the php email address validator have flaws? Maybe there's a newer one or maybe those flaws just don't matter. I've read so much waffle about email address validation, not sure where I am with it anymore. Certainly don't want to use something as far outside my understanding as a certain monstrous RFC regex I've seen. Whatever flaws the PHP validator may have: They're nothing against the garbage that floats around on the Internet. So do use the built-in validator and avoid home-made regex hacks. Regexes are generally overrated. I understand that they're fascinating for beginners, because they seem like an almighty text processing tool. I used to feel the same. But after a while, you'll see the deficiencies and realize that they're the wrong tool for most jobs. A regex is great for simple patterns like a date or something. But it's just not powerful enough for anything more complex like an e-mail address or even a full-blown language like HTML. That's where the abuse begins. Edited July 22, 2014 by Jacques1 Quote Link to comment https://forums.phpfreaks.com/topic/290054-does-this-regex-need-the-backslash-before-the-dot-sss/#findComment-1485935 Share on other sites More sharing options...
appobs Posted July 22, 2014 Author Share Posted July 22, 2014 I'll be working on this contact form later on tonight so I'll try the php validator. What about this one, purely for warning the user they copy/pasted the wrong data in the email from box:^\S+@\S+$ Allows gmail addys with + sign which the previous one didn't. I'm still tempted to restrict string length. I'm wondering now if there's any point validating user input email addresses at all for a contact form. Signups need proper user/email verification so this kind of validation's pointless there too. In any case, now I know learning from other peoples' regexes can help with syntax but not necessarily good practice! And I have a good builder/tester now that educates me and covers PCRE, JS & PythonNB: This was the 'monstrous' one I read about before. Generated by a Perl script.Thanks everyone Quote Link to comment https://forums.phpfreaks.com/topic/290054-does-this-regex-need-the-backslash-before-the-dot-sss/#findComment-1485954 Share on other sites More sharing options...
requinix Posted July 23, 2014 Share Posted July 23, 2014 What about this one, purely for warning the user they copy/pasted the wrong data in the email from box: ^\S+@\S+$ Allows gmail addys with + sign which the previous one didn't. I'm still tempted to restrict string length. Email addresses suck, as you've seen. Go only for the rudimentary structure, prompt twice to try to catch typos, and send a message to really confirm it. The only other thing you can do beforehand is to look up the domain name and make sure it exists, but everything imaginable does now so you won't catch much. I'm wondering now if there's any point validating user input email addresses at all for a contact form. Signups need proper user/email verification so this kind of validation's pointless there too.A strict check will only piss off the people with unusual but legitimate addresses. You know, the 0.001% who do it just so they can bitch about how validation forms don't accept it. The other people will just enter in a made up email like "foo@bar.com" and no regex can possibly catch that. NB: This was the 'monstrous' one I read about before. Generated by a Perl script."The regular expression does not cope with comments in email addresses"... Mine does Quote Link to comment https://forums.phpfreaks.com/topic/290054-does-this-regex-need-the-backslash-before-the-dot-sss/#findComment-1485959 Share on other sites More sharing options...
appobs Posted July 23, 2014 Author Share Posted July 23, 2014 Hmmm... email addresses do in fact suck... I thought I'd send a confirmation message containing the email they entered - so they have another chance to spot any mistake - also containing the message they sent - so it doesn't look like I'm second guessing them on knowing their own email address.............. Clever............. Then I saw the mistake in that! Maybe I do need a field for "your email address again". Irritating as it is, it's not against user expectations. Since this is a contact form, not a signup I don't think asking user to click a verification in an email before their message is actually sent is in order... Unless it's a lie and the mesage is sent anyway... Suppose ultimately it's about what seems more thorough and professional. Quote Link to comment https://forums.phpfreaks.com/topic/290054-does-this-regex-need-the-backslash-before-the-dot-sss/#findComment-1485961 Share on other sites More sharing options...
appobs Posted July 23, 2014 Author Share Posted July 23, 2014 Edit: If I use checkdnsrr, will that be reliable or wrongly reject user submissions? Quote Link to comment https://forums.phpfreaks.com/topic/290054-does-this-regex-need-the-backslash-before-the-dot-sss/#findComment-1485962 Share on other sites More sharing options...
Jacques1 Posted July 23, 2014 Share Posted July 23, 2014 (edited) If I use checkdnsrr, will that be reliable or wrongly reject user submissions? That check is supposed to help you how exactly? I mean, as you've already found out, you cannot prevent people from purposely giving you a fake e-mail address unless you force them to go through a confirmation procedure. So all of your checks are limited to catching mistakes. Formal validation obviously makes sense for that, but what do you expect to get from a DNS check? Are you afraid that people might accidentally enter something like “gmail.comm” instead of “gmail.com”? I find that rather far-fetched. Edited July 23, 2014 by Jacques1 Quote Link to comment https://forums.phpfreaks.com/topic/290054-does-this-regex-need-the-backslash-before-the-dot-sss/#findComment-1485982 Share on other sites More sharing options...
appobs Posted July 24, 2014 Author Share Posted July 24, 2014 Are you afraid that people might accidentally enter something like “gmail.comm” instead of “gmail.com”? I find that rather far-fetched. Yep, seen it happen. The website owner does it regularly, ignoring the "annoying popup" (autosuggest - which incidentally goes away the moment you stray from the CORRECT spelling of the email - counter to positive reinforcement!). (I find it far-fetched too!) what do you expect to get from a DNS check? Another percent chance of catching a mistake and, more importantly, demonstration of thoroughness to the client. Is there a chance of a genuine domain failing this check? If not I can use it but if it *might* cause a prob, I can't. On balance, so far a combination of requiring email entered twice and telling the user to expect a confirmation email is worth implementing as it's not against user expectations and doesn't actually disallow anything so can't make things worse. DNS check *can* exclude stuff so adds risk of that. you cannot prevent people from purposely giving you a fake e-mail address Fakes don't matter here. The website owner loses nothing from someone deliberately submitting a fake and anyone using this form *does* actually want to get in touch. Also, I prefer not to restrict the user - if they WANT to give a fake and it doesn't break the system in some way, why second guess their motivations? I've done it myself - want to tell someone something but don't want a response... Quote Link to comment https://forums.phpfreaks.com/topic/290054-does-this-regex-need-the-backslash-before-the-dot-sss/#findComment-1486095 Share on other sites More sharing options...
kicken Posted July 24, 2014 Share Posted July 24, 2014 (edited) If you want to implement a DNS check you need to make sure you check for A/AAAA records as well as MX records. If a domain is missing a MX record, mail servers will fall back to the A record, so some domains might only have their A record setup and no MX records if they would point to the same IPs. I would also suggest that if the only failure is the DNS check, perhaps provide a way for the user to ignore that error and submit anyway. A temporary failure of DNS on your system could cause the check to fail for legitimate domains, possibly for an extended period if the failure response is cached by a DNS server somewhere in the lookup chain. I've had experience on my mail server with a particular domain who's NS records were improperly configured so about 50% of the time a DNS lookup would fail and email to that domain would not be sent right away. Sometimes it would end up hanging around in the mail queue for several days before the DNS lookup was successful and the mail could be sent. Edited July 24, 2014 by kicken Quote Link to comment https://forums.phpfreaks.com/topic/290054-does-this-regex-need-the-backslash-before-the-dot-sss/#findComment-1486111 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.