Jump to content

Need Suggestions Street Address Regex


naffets77

Recommended Posts

I'm working on a method to extract street addresses from blocks of text.  This is the first time I've used php/regex so I want to know if there's a better way to do some of this, or if anyone has any tips for what I've got so far. Or maybe a better way to search for addresses?

 

/*regular Expression

search for

([0-9]{1,6})  	     : House number - from 1 to 6 digits long
.+            		: House address - any number of characters
$streetSuffix 	     : holds all known suffixes, case insensitive ex. Dr | DR | Drive|  .. etc.. 
.+			: City - any number of characters
&states       	       : holds state abbreviations and full state names ex. CA | California | AL ..etc.. 
(([0-9]{5}-[0-9]{4})|([0-9]{5})) : takes care of 5 and 9 digit zip codes

*/

$regEx = "/ (([a-z]+) ([a-z]+)) ([0-9]{1,6}).+ (".$streetSuffix.").+, (".$states.") (([0-9]{5}-[0-9]{4})|([0-9]{5}))/i";
preg_match_all($regEx, $textblock,$returnArray);

 

One problem I have is getting the name at the beginning, because it could be 1 - 3 strings.. first middle last, and I don't want to accidentally grab extra information before the name. Maybe it's possible to do a conditional if the second string is one character long (middle initial)? Also, might be a better way to do the .+ 's that i have to pick up city and name of street .. because i could reasonably somehow say that city name won't ever be longer than like 3 - 4 words?

 

Anyway, any suggestions would be very helpful

 

Thanks!

 

Link to comment
https://forums.phpfreaks.com/topic/82223-need-suggestions-street-address-regex/
Share on other sites

The .+s should be lazy: .+?. Also, optimize the suffixes. For example, since you're using /i there's no need to look for "Dr" and "DR". You can also combine "Dr" and "Drive" into Dr(?:ive)?. A similar approach can be taken with the states.

 

What's the context of the data, i.e., what is surrounding these addresses? Anything?

Don't know about the context, I don't think there would be anyway to use it .. just blocks of text and searching for addresses, don't actually know how it's going to be used. One example might be if you receive a message from someone and they've put their address in it, the script would recognize it's an address and let you click on it to 'add to address book', that sort of thing.

 

Thanks for the tips so far

You're wandering into some ambiguous territory. There's plenty of ways to approach address capture, e.g., (in Perl) Geo::StreetAddress::US and Lingua::EN::AddressParse, but names are a whole different story due to their variations, and especially when they're intermixed with data.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.