Jump to content

Alter P.O. Box regex to match other variations


Scooby08

Recommended Posts

I have the following regex:

 

/([pP]{1}(.*?)[oO]{1}(.*?))?([bB][oO]?[xX])(?![a-zA-Z])(\s+)?[#]?(\d+)?/i

 

I'm trying to also get it to match the following line:

 

P.O. 863

 

Also, I'm open for any pointers on tidying it up as well..

 

Thanks!

Link to comment
Share on other sites

to add to xyph's post,  with that i modifier, most of the char classes are pointless

 

also, having a range of exactly 1 is pointless.

 

also, that negative lookahead is mostly pointless since you expect spaces next anyways.

 

also, you have a lot of stuff grouped individually with the 0 or 1 quantifier.  Since they are not nested, you will get some unexpected matches.

 

also, based on your regex, it looks like you are missing a lot of valid "post office box" formats. 

 

also, your match alls will give you unexpected matches.

 

I suggest you google "valid po box formats".  There's a lot of info about what is and is not valid, and I even saw some regexes show up in results.

 

Unless there are only specific formats you are wanting to allow, then please list all format examples of what you want to allow. 

Link to comment
Share on other sites

Hello Josh...

 

I cannot let you know all of the possible situations that the P.O. Box address can show up, but I can let you know that each day I receive at least 2-4 more possibilities.. Here are todays:

 

POB 533

pob 1598

P O Box 1805

Po bo, 444

 

Please help as this will be the ultimate P.O. Box regex!!

Link to comment
Share on other sites

Hello again Josh..

 

I don't want valid po box formats.. I'm trying to filter out any address that has anything to do with a po box.. I have been playing around some more and am pretty close to what my goal is.. Here's the regex:

 

/(p(.*)?o(.*?)?)(\s+)?(bo?x)?(\s+)?[#]?(\d+)?/i

 

Now that matches all of the following except for the marked ones with explanations next to them..

 

po bx #45

po 45

POB 533

pob 1598

P  O Box 1805

Po bo, 444

1555 box elder rd        ** does not match and that's what we want

1555 pine box elder rd      ** should not match but does because of the p in "pine"

p.o. box 345

45 W North Rd Box 12        ** want to match "Box 12"

PO Box 3232

5 red road po box 15

POST OFFICE BOX #21

Post Office Box 465

5 red road po box

PO Box 45

5748 randy RD  POBOX 513

9387 dandy road p o box 513

p box 3

7114 mandy Rd Pobox 513

555 mangy rd po box

 

Thank you for replying Josh!

Link to comment
Share on other sites

Friend, you are making this harder than it needs to be.  PO Box matching is a common thing that many others have already tackled and figured out, no need to reinvent the wheel.  If you are trying to filter out all po box type addresses, you would still use the same po box validation regexes...

 

 

if (is valid po box regex) {
  // throw it out
} else {
  // "good" address, do something
}

Link to comment
Share on other sites

Thanks but that will not work in this case.. I only want addresses that aren't po boxes.. and I have no control as to how they enter the po box address so I have to work with what I have and this is the only way.. The above are all examples of po box addresses I have received..

Link to comment
Share on other sites

Thanks but that will not work in this case.. I only want addresses that aren't po boxes.. and I have no control as to how they enter the po box address so I have to work with what I have and this is the only way.. The above are all examples of po box addresses I have received..

 

That's a shame. I guess you now have no choice but to go through them all manually and delete the ones you don't want.

Link to comment
Share on other sites

Scooby, either a "PO Box" address is valid or it is not.  There's no in-between... if the US Post office receives a letter with an invalid formatted address, it's going to be returned to sender.  The best you can do is compare to what is valid and toss it if it is valid, and assume keep it if it is not a validly formatted PO Box format.

 

In the 3 examples you listed:

 

1555 box elder rd        ** does not match and that's what we want
1555 pine box elder rd       ** should not match but does because of the p in "pine"
45 W North Rd Box 12        ** want to match "Box 12" 

 

All 3 of those are valid non-"po box" address formats. 

 

 

 

 

 

 

Link to comment
Share on other sites

You are correct Josh, except for the "Box 12" one.. I am posting these to a third party and they run a validation on these as well and they are saying that it is a po box address.. All I'm trying to do is create a custom filter that will work for this third party so I don't post them po box address as they do not want them.. It's really close to being where I need it.. I am already using the code and it's working great, but the "Box 12" types still get by and in the even that an address actually has a "p" in front of the work "box" it'll treat it as a po box when it really is not.. (1555 pine box elder rd)

 

I really do appreciate the help Josh! Thank you

Link to comment
Share on other sites

You say you want to only post addresses to a 3rd party that are not po box addresses.  The solution is to use an established regex for matching valid po boxes and toss them out if they match, and send the ones that don't match.  This is exactly what you are trying to do, only you are insisting on reinventing the wheel with your own regex, which is failing for many reasons, some of which have already been pointed out to you.  Honestly I don't really know how more I can help you, except to tell you to reread the advice already given.

Link to comment
Share on other sites

Thanks again Josh.. I'll get er' from here..

 

By the way, if anybody can answer just the regex question to match "Box 12" and not match "1555 pine box elder rd", regardless as to what I'm using it for, that would be most helpful!!

Link to comment
Share on other sites

Thanks again Josh.. I'll get er' from here..

 

By the way, if anybody can answer just the regex question to match "Box 12" and not match "1555 pine box elder rd", regardless as to what I'm using it for, that would be most helpful!!

 

Of course, someone doing the work for you would be helpful...

Link to comment
Share on other sites

Thanks again Josh.. I'll get er' from here..

 

By the way, if anybody can answer just the regex question to match "Box 12" and not match "1555 pine box elder rd", regardless as to what I'm using it for, that would be most helpful!!

 

 

Okay look, here is a piece of regex that will match  "Box 12" and not match "1555 pine box elder rd"

 

.*box(?!\s+[0-9]+$).*

 

 

This does not match anything before "box" nor does it match other valid (or non, depending on which way you wanna go) formats.  The problem is that with regex, you must regard what it is used for.  You cannot write a valid regular expression unless you properly scope out the purpose of it.  You can't just say "make it do this regardless of what I want it for" because you have to know what you want it for in order to make it work proper.  Asking for things like this, and us helping you like this, is not helpful

 

We know what you want it for.  You want to be able to only send to some 3rd party, addresses that are not po box addresses.  Please for the love of God and country, stop making this harder on yourself than it needs to be.  Or else, please give more explanation as to why you refuse to take the advice given and insist on trying to bandaid up a regex that is not going to work anyway!

 

edit: Actually I had that backwards:

 

.*box(?!\s+[0-9]+$).* will match "1555 pine box elder rd"

 

.*box(?=\s+[0-9]+$).* will match "box 12"

 

It doesn't really matter which you use, just have to reverse condition.  But that doesn't take away from the additional statements I made, though maybe it sheds light on what your problem is to begin with: I think you're basically trying to match a negative.  Matching for the absence of something is much harder to do in the regex world.  Even where possible, it is a lot harder to read/understand the regex involved, and is completely unnecessary, as all you have to do is reverse the condition it is in.  But again, your regex does not account for all (in)valid po box formats.  Worse, it actually (dis)allows (in)valid po box formats because overall it is poorly written.

 

 

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.