naffets77 Posted December 18, 2007 Share Posted December 18, 2007 I'm working on a method to extract street addresses from blocks of text. This is the first time I've used php/regex so I want to know if there's a better way to do some of this, or if anyone has any tips for what I've got so far. Or maybe a better way to search for addresses? /*regular Expression search for ([0-9]{1,6}) : House number - from 1 to 6 digits long .+ : House address - any number of characters $streetSuffix : holds all known suffixes, case insensitive ex. Dr | DR | Drive| .. etc.. .+ : City - any number of characters &states : holds state abbreviations and full state names ex. CA | California | AL ..etc.. (([0-9]{5}-[0-9]{4})|([0-9]{5})) : takes care of 5 and 9 digit zip codes */ $regEx = "/ (([a-z]+) ([a-z]+)) ([0-9]{1,6}).+ (".$streetSuffix.").+, (".$states.") (([0-9]{5}-[0-9]{4})|([0-9]{5}))/i"; preg_match_all($regEx, $textblock,$returnArray); One problem I have is getting the name at the beginning, because it could be 1 - 3 strings.. first middle last, and I don't want to accidentally grab extra information before the name. Maybe it's possible to do a conditional if the second string is one character long (middle initial)? Also, might be a better way to do the .+ 's that i have to pick up city and name of street .. because i could reasonably somehow say that city name won't ever be longer than like 3 - 4 words? Anyway, any suggestions would be very helpful Thanks! Quote Link to comment Share on other sites More sharing options...
effigy Posted December 18, 2007 Share Posted December 18, 2007 The .+s should be lazy: .+?. Also, optimize the suffixes. For example, since you're using /i there's no need to look for "Dr" and "DR". You can also combine "Dr" and "Drive" into Dr(?:ive)?. A similar approach can be taken with the states. What's the context of the data, i.e., what is surrounding these addresses? Anything? Quote Link to comment Share on other sites More sharing options...
naffets77 Posted December 18, 2007 Author Share Posted December 18, 2007 Don't know about the context, I don't think there would be anyway to use it .. just blocks of text and searching for addresses, don't actually know how it's going to be used. One example might be if you receive a message from someone and they've put their address in it, the script would recognize it's an address and let you click on it to 'add to address book', that sort of thing. Thanks for the tips so far Quote Link to comment Share on other sites More sharing options...
effigy Posted December 18, 2007 Share Posted December 18, 2007 You're wandering into some ambiguous territory. There's plenty of ways to approach address capture, e.g., (in Perl) Geo::StreetAddress::US and Lingua::EN::AddressParse, but names are a whole different story due to their variations, and especially when they're intermixed with data. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.