Jump to content

Does this regex need the backslash before the dot? ^\S+@\S+\.\S+$


appobs

Recommended Posts

For a very 'loose' email address validator, does the following regex need the backslash before the dot?

 

^\S+@\S+\.\S+$

 

(Checks for anything@anything.anything but I'm gonna change that to anything.anything to make it even 'looser')

 

I tested with and without the backslash using rubular.com and it seems superfluous but I'd like a second opinion or two please.

 

ALSO: does rubular.com use the right engine to be correct for php? I hope so coz it's the first time I've been able to fully understand some regexes I've been blindly implementing for some time!

 

 

Many thanks in advance :)

Link to comment
Share on other sites

For a very 'loose' email address validator, does the following regex need the backslash before the dot?

 

^\S+@\S+\.\S+$

If it has the backslash then it means a literal period. If it does not then it means any character.

 

So what would you think?

 

I tested with and without the backslash using rubular.com and it seems superfluous but I'd like a second opinion or two please.

Are you making sure to test with valid and invalid strings?

 

ALSO: does rubular.com use the right engine to be correct for php? I hope so coz it's the first time I've been able to fully understand some regexes I've been blindly implementing for some time!

You did see that it's called "a Ruby regular expression editor", right? There are many other options that specifically say they're good for PHP and/or PCRE (which is what PHP uses).

 

Ruby's regex syntax looks the same but I wouldn't rely on that.

Edited by requinix
Link to comment
Share on other sites

snip

 

Tested with some invalid strings but not enough obviously!

So characters that do not have a special meaning in regex don't need the backslash to be literal but even . is literal without the backslash if it's contained within [ ]

 

??

 

 

 

I'd love to get some handle on regex. This is the regex that came with the original version of the contact form I've been adapting:

 

/^([a-zA-Z0-9])+([a-zA-Z0-9._-])*@([a-zA-Z0-9_-])+([a-zA-Z0-9._-]+)+$/

 

Are ._- and ._- literal because they're inside [ ]?

Link to comment
Share on other sites

Side note: did you know that PHP has a built-in function for validating email addresses?

http://php.net/manual/en/filter.examples.validation.php

 

Yeah I know, since regex was used in the contact form I'm adapting it seems the right time to learn a bit about it.

 

Doesn't the php email address validator have flaws? Maybe there's a newer one or maybe those flaws just don't matter. I've read so much waffle about email address validation, not sure where I am with it anymore. Certainly don't want to use something as far outside my understanding as a certain monstrous RFC regex I've seen.

Link to comment
Share on other sites

a certain monstrous RFC regex I've seen.

Probably not the one, but I love showing this off any time I get the chance.

/^(((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P\(((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(([\x21-\x27\x2A-\x5B\x5D-\x7E]|[\x01-\x08\x0B\x0C\x0E-\x1F\x7F])|(\\([\x21-\x7E]|[\x20\t])|\\(\0|[\x01-\x08\x0B\x0C\x0E-\x1F\x7F]|\n|\r))|(?P>COMMENT)))*(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?\)))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?(([A-Za-z]|[0-9]|[!#$%&'*+\-\/=?^_`{|}~]))+(\.(([A-Za-z]|[0-9]|[!#$%&'*+\-\/=?^_`{|}~]))+)*((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?|((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?\x22((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(([\x21\x23-\x5B\x5D-\x7E]|[\x01-\x08\x0B\x0C\x0E-\x1F\x7F])|(\\([\x21-\x7E]|[\x20\t])|\\(\0|[\x01-\x08\x0B\x0C\x0E-\x1F\x7F]|\n|\r))))*(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?\x22((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?|(((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?(([A-Za-z]|[0-9]|[!#$%&'*+\-\/=?^_`{|}~]))+((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?|((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?\x22((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(([\x21\x23-\x5B\x5D-\x7E]|[\x01-\x08\x0B\x0C\x0E-\x1F\x7F])|(\\([\x21-\x7E]|[\x20\t])|\\(\0|[\x01-\x08\x0B\x0C\x0E-\x1F\x7F]|\n|\r))))*(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?\x22((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?)(\.(((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?(([A-Za-z]|[0-9]|[!#$%&'*+\-\/=?^_`{|}~]))+((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?|((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?\x22((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(([\x21\x23-\x5B\x5D-\x7E]|[\x01-\x08\x0B\x0C\x0E-\x1F\x7F])|(\\([\x21-\x7E]|[\x20\t])|\\(\0|[\x01-\x08\x0B\x0C\x0E-\x1F\x7F]|\n|\r))))*(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?\x22((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?))*)@(((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?(([A-Za-z]|[0-9]|[!#$%&'*+\-\/=?^_`{|}~]))+(\.(([A-Za-z]|[0-9]|[!#$%&'*+\-\/=?^_`{|}~]))+)*((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?|((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?\[((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?([\x21-\x5A\x5E-\x7E]|([\x01-\x08\x0B\x0C\x0E-\x1F\x7F]|(\\([\x21-\x7E]|[\x20\t])|\\(\0|[\x01-\x08\x0B\x0C\x0E-\x1F\x7F]|\n|\r)))))*(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?\]((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?|((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?(([A-Za-z]|[0-9]|[!#$%&'*+\-\/=?^_`{|}~]))+((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?(\.((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?(([A-Za-z]|[0-9]|[!#$%&'*+\-\/=?^_`{|}~]))+((((((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?(?P>COMMENT))+(((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*))?|((([\x20\t])*\r\n)?([\x20\t])+|([\x20\t])+(\r\n([\x20\t])+)*)))?)*)$/
Programmatically generated so it can be cleaned up a bit, but still...
Link to comment
Share on other sites

 

So characters that do not have a special meaning in regex don't need the backslash to be literal

 

You generally must not prepend a backslash to literal characters. For example, “\s” is obviously something completely different than a simple “s”.

 

 

 

but even . is literal without the backslash if it's contained within [ ]

 

Yes.

 

 
 

I'd love to get some handle on regex. This is the regex that came with the original version of the contact form I've been adapting:

/^([a-zA-Z0-9])+([a-zA-Z0-9._-])*@([a-zA-Z0-9_-])+([a-zA-Z0-9._-]+)+$/

 

Like almost all home-made e-mail regexes, this is nonsense, both technically and semantically. Whoever wrote this doesn't really understand regexes and hasn't bothered to read the standards.

 

So don't use this is a template or for learning.

 

 

 

Are ._- and ._- literal because they're inside [ ]?

 

An underscore never has a special meaning. And a hyphen within a character class is only taken literally if it's at the beginning or the end. Otherwise it signifies a range (as you've already seen).

 

 

 

Doesn't the php email address validator have flaws? Maybe there's a newer one or maybe those flaws just don't matter. I've read so much waffle about email address validation, not sure where I am with it anymore. Certainly don't want to use something as far outside my understanding as a certain monstrous RFC regex I've seen.

 

Whatever flaws the PHP validator may have: They're nothing against the garbage that floats around on the Internet. So do use the built-in validator and avoid home-made regex hacks.

 

Regexes are generally overrated. I understand that they're fascinating for beginners, because they seem like an almighty text processing tool. I used to feel the same. But after a while, you'll see the deficiencies and realize that they're the wrong tool for most jobs. A regex is great for simple patterns like a date or something. But it's just not powerful enough for anything more complex like an e-mail address or even a full-blown language like HTML. That's where the abuse begins.

Edited by Jacques1
Link to comment
Share on other sites

  I'll be working on this contact form later on tonight so I'll try the php validator.


  What about this one, purely for warning the user they copy/pasted the wrong data in the email from box:


^\S+@\S+$


  Allows gmail addys with + sign which the previous one didn't. I'm still tempted to restrict string length.

  I'm wondering now if there's any point validating user input email addresses at all for a contact form. Signups need proper user/email verification so this kind of validation's pointless there too.



  In any case, now I know learning from other peoples' regexes can help with syntax but not necessarily good practice! And I have a good builder/tester now that educates me and covers PCRE, JS & Python




NB: This was the 'monstrous' one I read about before. Generated by a Perl script.




Thanks everyone :)






 

Link to comment
Share on other sites

What about this one, purely for warning the user they copy/pasted the wrong data in the email from box:

 

^\S+@\S+$

 

  Allows gmail addys with + sign which the previous one didn't. I'm still tempted to restrict string length.

Email addresses suck, as you've seen. Go only for the rudimentary structure, prompt twice to try to catch typos, and send a message to really confirm it.

The only other thing you can do beforehand is to look up the domain name and make sure it exists, but everything imaginable does now so you won't catch much.

 

I'm wondering now if there's any point validating user input email addresses at all for a contact form. Signups need proper user/email verification so this kind of validation's pointless there too.

A strict check will only piss off the people with unusual but legitimate addresses. You know, the 0.001% who do it just so they can bitch about how validation forms don't accept it.

The other people will just enter in a made up email like "foo@bar.com" and no regex can possibly catch that.

 

NB: This was the 'monstrous' one I read about before. Generated by a Perl script.

"The regular expression does not cope with comments in email addresses"... Mine does :tease-01:
Link to comment
Share on other sites

Hmmm... email addresses do in fact suck...

 

I thought I'd send a confirmation message containing the email they entered - so they have another chance to spot any mistake - also containing the message they sent - so it doesn't look like I'm second guessing them on knowing their own email address.............. Clever.............

 

Then I saw the mistake in that!

 

 

Maybe I do need a field for "your email address again". Irritating as it is, it's not against user expectations.

 

Since this is a contact form, not a signup I don't think asking user to click a verification in an email before their message is actually sent is in order... Unless it's a lie and the mesage is sent anyway...

 

Suppose ultimately it's about what seems more thorough and professional.

Link to comment
Share on other sites

If I use checkdnsrr, will that be reliable or wrongly reject user submissions?

 

That check is supposed to help you how exactly?

 

I mean, as you've already found out, you cannot prevent people from purposely giving you a fake e-mail address unless you force them to go through a confirmation procedure. So all of your checks are limited to catching mistakes. Formal validation obviously makes sense for that, but what do you expect to get from a DNS check? Are you afraid that people might accidentally enter something like “gmail.comm” instead of “gmail.com”? I find that rather far-fetched.

Edited by Jacques1
Link to comment
Share on other sites

Are you afraid that people might accidentally enter something like “gmail.comm” instead of “gmail.com”? I find that rather far-fetched.

 

Yep, seen it happen. The website owner does it regularly, ignoring the "annoying popup" (autosuggest - which incidentally goes away the moment you stray from the CORRECT spelling of the email - counter to positive reinforcement!).

 

(I find it far-fetched too!)

 

 

what do you expect to get from a DNS check?

 

Another percent chance of catching a mistake and, more importantly, demonstration of thoroughness to the client.

 

Is there a chance of a genuine domain failing this check? If not I can use it but if it *might* cause a prob, I can't.

 

On balance, so far a combination of requiring email entered twice and telling the user to expect a confirmation email is worth implementing as it's not against user expectations and doesn't actually disallow anything so can't make things worse. DNS check *can* exclude stuff so adds risk of that.

 

 

you cannot prevent people from purposely giving you a fake e-mail address

Fakes don't matter here. The website owner loses nothing from someone deliberately submitting a fake and anyone using this form *does* actually want to get in touch. Also, I prefer not to restrict the user - if they WANT to give a fake and it doesn't break the system in some way, why second guess their motivations? I've done it myself - want to tell someone something but don't want a response...

Link to comment
Share on other sites

If you want to implement a DNS check you need to make sure you check for A/AAAA records as well as MX records. If a domain is missing a MX record, mail servers will fall back to the A record, so some domains might only have their A record setup and no MX records if they would point to the same IPs.

 

I would also suggest that if the only failure is the DNS check, perhaps provide a way for the user to ignore that error and submit anyway. A temporary failure of DNS on your system could cause the check to fail for legitimate domains, possibly for an extended period if the failure response is cached by a DNS server somewhere in the lookup chain.

 

I've had experience on my mail server with a particular domain who's NS records were improperly configured so about 50% of the time a DNS lookup would fail and email to that domain would not be sent right away. Sometimes it would end up hanging around in the mail queue for several days before the DNS lookup was successful and the mail could be sent.

Edited by kicken
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.