Jump to content

Recommended Posts

I'm working on a method to extract street addresses from blocks of text.  This is the first time I've used php/regex so I want to know if there's a better way to do some of this, or if anyone has any tips for what I've got so far. Or maybe a better way to search for addresses?

 

/*regular Expression

search for

([0-9]{1,6})  	     : House number - from 1 to 6 digits long
.+            		: House address - any number of characters
$streetSuffix 	     : holds all known suffixes, case insensitive ex. Dr | DR | Drive|  .. etc.. 
.+			: City - any number of characters
&states       	       : holds state abbreviations and full state names ex. CA | California | AL ..etc.. 
(([0-9]{5}-[0-9]{4})|([0-9]{5})) : takes care of 5 and 9 digit zip codes

*/

$regEx = "/ (([a-z]+) ([a-z]+)) ([0-9]{1,6}).+ (".$streetSuffix.").+, (".$states.") (([0-9]{5}-[0-9]{4})|([0-9]{5}))/i";
preg_match_all($regEx, $textblock,$returnArray);

 

One problem I have is getting the name at the beginning, because it could be 1 - 3 strings.. first middle last, and I don't want to accidentally grab extra information before the name. Maybe it's possible to do a conditional if the second string is one character long (middle initial)? Also, might be a better way to do the .+ 's that i have to pick up city and name of street .. because i could reasonably somehow say that city name won't ever be longer than like 3 - 4 words?

 

Anyway, any suggestions would be very helpful

 

Thanks!

 

Link to comment
https://forums.phpfreaks.com/topic/82223-need-suggestions-street-address-regex/
Share on other sites

The .+s should be lazy: .+?. Also, optimize the suffixes. For example, since you're using /i there's no need to look for "Dr" and "DR". You can also combine "Dr" and "Drive" into Dr(?:ive)?. A similar approach can be taken with the states.

 

What's the context of the data, i.e., what is surrounding these addresses? Anything?

Don't know about the context, I don't think there would be anyway to use it .. just blocks of text and searching for addresses, don't actually know how it's going to be used. One example might be if you receive a message from someone and they've put their address in it, the script would recognize it's an address and let you click on it to 'add to address book', that sort of thing.

 

Thanks for the tips so far

You're wandering into some ambiguous territory. There's plenty of ways to approach address capture, e.g., (in Perl) Geo::StreetAddress::US and Lingua::EN::AddressParse, but names are a whole different story due to their variations, and especially when they're intermixed with data.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.