Jump to content
gw1500se

regexp for Typical US address

Recommended Posts

I am struggling to come up with a regexp for verifying a typical US address line of the form:

City , ST zip

Since City can have spaces, . and ' I need to include that in my regexp. The state should be 2 upper case letters (I don't care if it is really a correct abbreviation) followed by a zip code. Here is what I think it should be and I am asking for verification.

[a-zA-Z \.'], [A-Z]{2} [0-9]{5}?

TIA.

Share this post


Link to post
Share on other sites

There are probably also cities with hyphens in the name. So I wouldn't even try to validate the name: just make sure something is there, then a comma, the state and zip.

Share this post


Link to post
Share on other sites

.* will allow there to be nothing. Should be .+

If this is supposed to match the entire address line then use ^ and $ anchors (if not somehow already implied) and add support for zip+4 codes.

Share this post


Link to post
Share on other sites

There won't be any plus 4 zips. Not sure what you mean with the anchors.

.+ , [A-Z]{2) [0-9]{5}?

 

Share this post


Link to post
Share on other sites

Without the anchors then your regex will only check that the string contains something that matches it. Like there could be stuff after the zip code. With the anchors (or at least the end-of-string $ anchor) you make sure the entire string matches and not just a part of it.

Share this post


Link to post
Share on other sites

Oh, I thought the ? did that. Does the ? make the zip optional? So it should be:

^.+[ ]*,[ ]*[A-Z]{2} [0-9]{5}$

I'm not sure that the first [ ]* is not redundant.

Edited by gw1500se

Share this post


Link to post
Share on other sites

It's not so much redundant as unnecessary.

In order, your regex will match:

  1. The beginning of the string
  2. At least one character
  3. Zero or more spaces
  4. A comma
  5. Zero or more spaces
  6. Two capital letters
  7. A space
  8. Five digits
  9. The end of the string

Beginning of string + at least one character doesn't really do much because the engine will start matching at the beginning anyways and you're not requiring that the string start with anything in particular. In fact the ^.+[ ]* together are only saying that there must be at least one character before the comma.

You could simply the whole thing to just

.,[ ]*[A-Z]{2} [0-9]{5}$

which will ensure there is a character before the comma, possible spaces after the comma, the state, a space, and the zip code.

Thing is that still doesn't require there to be a city name - it could match the string " , NY 12345". The regex should require at least one non-space character before the comma, keeping in mind that spaces before or after the city name should be allowed.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.