Scooby08 Posted June 29, 2012 Share Posted June 29, 2012 I have the following regex: /([pP]{1}(.*?)[oO]{1}(.*?))?([bB][oO]?[xX])(?![a-zA-Z])(\s+)?[#]?(\d+)?/i I'm trying to also get it to match the following line: P.O. 863 Also, I'm open for any pointers on tidying it up as well.. Thanks! Quote Link to comment Share on other sites More sharing options...
xyph Posted June 30, 2012 Share Posted June 30, 2012 With the 'i' flag, checking for everything in both upper and lower case is redundant Quote Link to comment Share on other sites More sharing options...
.josh Posted June 30, 2012 Share Posted June 30, 2012 to add to xyph's post, with that i modifier, most of the char classes are pointless also, having a range of exactly 1 is pointless. also, that negative lookahead is mostly pointless since you expect spaces next anyways. also, you have a lot of stuff grouped individually with the 0 or 1 quantifier. Since they are not nested, you will get some unexpected matches. also, based on your regex, it looks like you are missing a lot of valid "post office box" formats. also, your match alls will give you unexpected matches. I suggest you google "valid po box formats". There's a lot of info about what is and is not valid, and I even saw some regexes show up in results. Unless there are only specific formats you are wanting to allow, then please list all format examples of what you want to allow. Quote Link to comment Share on other sites More sharing options...
Scooby08 Posted July 3, 2012 Author Share Posted July 3, 2012 Hello Josh... I cannot let you know all of the possible situations that the P.O. Box address can show up, but I can let you know that each day I receive at least 2-4 more possibilities.. Here are todays: POB 533 pob 1598 P O Box 1805 Po bo, 444 Please help as this will be the ultimate P.O. Box regex!! Quote Link to comment Share on other sites More sharing options...
.josh Posted July 3, 2012 Share Posted July 3, 2012 I suggest you google "valid po box formats". There's a lot of info about what is and is not valid, and I even saw some regexes show up in results. Quote Link to comment Share on other sites More sharing options...
Scooby08 Posted July 3, 2012 Author Share Posted July 3, 2012 Hello again Josh.. I don't want valid po box formats.. I'm trying to filter out any address that has anything to do with a po box.. I have been playing around some more and am pretty close to what my goal is.. Here's the regex: /(p(.*)?o(.*?)?)(\s+)?(bo?x)?(\s+)?[#]?(\d+)?/i Now that matches all of the following except for the marked ones with explanations next to them.. po bx #45 po 45 POB 533 pob 1598 P O Box 1805 Po bo, 444 1555 box elder rd ** does not match and that's what we want 1555 pine box elder rd ** should not match but does because of the p in "pine" p.o. box 345 45 W North Rd Box 12 ** want to match "Box 12" PO Box 3232 5 red road po box 15 POST OFFICE BOX #21 Post Office Box 465 5 red road po box PO Box 45 5748 randy RD POBOX 513 9387 dandy road p o box 513 p box 3 7114 mandy Rd Pobox 513 555 mangy rd po box Thank you for replying Josh! Quote Link to comment Share on other sites More sharing options...
.josh Posted July 3, 2012 Share Posted July 3, 2012 Friend, you are making this harder than it needs to be. PO Box matching is a common thing that many others have already tackled and figured out, no need to reinvent the wheel. If you are trying to filter out all po box type addresses, you would still use the same po box validation regexes... if (is valid po box regex) { // throw it out } else { // "good" address, do something } Quote Link to comment Share on other sites More sharing options...
Scooby08 Posted July 3, 2012 Author Share Posted July 3, 2012 Thanks but that will not work in this case.. I only want addresses that aren't po boxes.. and I have no control as to how they enter the po box address so I have to work with what I have and this is the only way.. The above are all examples of po box addresses I have received.. Quote Link to comment Share on other sites More sharing options...
.josh Posted July 3, 2012 Share Posted July 3, 2012 did you even look at the code example? Quote Link to comment Share on other sites More sharing options...
Pikachu2000 Posted July 3, 2012 Share Posted July 3, 2012 Thanks but that will not work in this case.. I only want addresses that aren't po boxes.. and I have no control as to how they enter the po box address so I have to work with what I have and this is the only way.. The above are all examples of po box addresses I have received.. That's a shame. I guess you now have no choice but to go through them all manually and delete the ones you don't want. Quote Link to comment Share on other sites More sharing options...
Scooby08 Posted July 3, 2012 Author Share Posted July 3, 2012 Yes I did Josh.. The code I need would be more like so... if (is po box regex that is valid or not) { // throw it out } else { // "good" address, do something } And yes, that's exactly right Pikachu2000... Quote Link to comment Share on other sites More sharing options...
.josh Posted July 3, 2012 Share Posted July 3, 2012 Scooby, either a "PO Box" address is valid or it is not. There's no in-between... if the US Post office receives a letter with an invalid formatted address, it's going to be returned to sender. The best you can do is compare to what is valid and toss it if it is valid, and assume keep it if it is not a validly formatted PO Box format. In the 3 examples you listed: 1555 box elder rd ** does not match and that's what we want 1555 pine box elder rd ** should not match but does because of the p in "pine" 45 W North Rd Box 12 ** want to match "Box 12" All 3 of those are valid non-"po box" address formats. Quote Link to comment Share on other sites More sharing options...
Scooby08 Posted July 3, 2012 Author Share Posted July 3, 2012 You are correct Josh, except for the "Box 12" one.. I am posting these to a third party and they run a validation on these as well and they are saying that it is a po box address.. All I'm trying to do is create a custom filter that will work for this third party so I don't post them po box address as they do not want them.. It's really close to being where I need it.. I am already using the code and it's working great, but the "Box 12" types still get by and in the even that an address actually has a "p" in front of the work "box" it'll treat it as a po box when it really is not.. (1555 pine box elder rd) I really do appreciate the help Josh! Thank you Quote Link to comment Share on other sites More sharing options...
.josh Posted July 3, 2012 Share Posted July 3, 2012 You say you want to only post addresses to a 3rd party that are not po box addresses. The solution is to use an established regex for matching valid po boxes and toss them out if they match, and send the ones that don't match. This is exactly what you are trying to do, only you are insisting on reinventing the wheel with your own regex, which is failing for many reasons, some of which have already been pointed out to you. Honestly I don't really know how more I can help you, except to tell you to reread the advice already given. Quote Link to comment Share on other sites More sharing options...
Scooby08 Posted July 3, 2012 Author Share Posted July 3, 2012 Thanks again Josh.. I'll get er' from here.. By the way, if anybody can answer just the regex question to match "Box 12" and not match "1555 pine box elder rd", regardless as to what I'm using it for, that would be most helpful!! Quote Link to comment Share on other sites More sharing options...
Pikachu2000 Posted July 3, 2012 Share Posted July 3, 2012 All you can do is send them properly filtered data; the third party needs to use the right pattern too. If you send them a valid street address, and they say it's a PO box, then THEY have the problem, not you. Quote Link to comment Share on other sites More sharing options...
Scooby08 Posted July 3, 2012 Author Share Posted July 3, 2012 This is true Pikachu2000.. Quote Link to comment Share on other sites More sharing options...
xyph Posted July 3, 2012 Share Posted July 3, 2012 Thanks again Josh.. I'll get er' from here.. By the way, if anybody can answer just the regex question to match "Box 12" and not match "1555 pine box elder rd", regardless as to what I'm using it for, that would be most helpful!! Of course, someone doing the work for you would be helpful... Quote Link to comment Share on other sites More sharing options...
Scooby08 Posted July 3, 2012 Author Share Posted July 3, 2012 That's how I roll! Quote Link to comment Share on other sites More sharing options...
.josh Posted July 3, 2012 Share Posted July 3, 2012 Thanks again Josh.. I'll get er' from here.. By the way, if anybody can answer just the regex question to match "Box 12" and not match "1555 pine box elder rd", regardless as to what I'm using it for, that would be most helpful!! Okay look, here is a piece of regex that will match "Box 12" and not match "1555 pine box elder rd" .*box(?!\s+[0-9]+$).* This does not match anything before "box" nor does it match other valid (or non, depending on which way you wanna go) formats. The problem is that with regex, you must regard what it is used for. You cannot write a valid regular expression unless you properly scope out the purpose of it. You can't just say "make it do this regardless of what I want it for" because you have to know what you want it for in order to make it work proper. Asking for things like this, and us helping you like this, is not helpful. We know what you want it for. You want to be able to only send to some 3rd party, addresses that are not po box addresses. Please for the love of God and country, stop making this harder on yourself than it needs to be. Or else, please give more explanation as to why you refuse to take the advice given and insist on trying to bandaid up a regex that is not going to work anyway! edit: Actually I had that backwards: .*box(?!\s+[0-9]+$).* will match "1555 pine box elder rd" .*box(?=\s+[0-9]+$).* will match "box 12" It doesn't really matter which you use, just have to reverse condition. But that doesn't take away from the additional statements I made, though maybe it sheds light on what your problem is to begin with: I think you're basically trying to match a negative. Matching for the absence of something is much harder to do in the regex world. Even where possible, it is a lot harder to read/understand the regex involved, and is completely unnecessary, as all you have to do is reverse the condition it is in. But again, your regex does not account for all (in)valid po box formats. Worse, it actually (dis)allows (in)valid po box formats because overall it is poorly written. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.