[RegEx] parsing an email header for a domain match

rbrown · March 6, 2008

I'm trying to do a match against the "return-path" and "from" and I need it to compare the root domain and return true if they match.

The program needs a regex with a back reference.

The from field may or may not have ' "whatever they put here" ' before the email address.

Otherwise if I was doing this in php I would have been done by now.

I have this from someone else who is MIA at the moment:

^Return-path:[\w\d]*(@[\d\w.-]*)>.*^From: [^\m\r]*\1>

But it returns true if the one of them is a subdomain of the root domain

And if it has ' "whatever they put here" ' before the email address.

I need it to ignore the subdomain and anything before the from email address and just compare the domains.

The emails addresses in the examples below are a from a spammer.

Thanks,

Return-path: <fluffy2fluff@yahoo.com>

Envelope-to: shan@thedomain.com

Delivery-date: Wed, 05 Mar 2008 21:55:08 -0500

Received: from ahk178.neoplus.adsl.tpnet.pl ([83.25.192.178]:2290)

by bay.dnsprotect.com with esmtp (Exim 4.68)

(envelope-from <fluffy2fluff@yahoo.com>)

id 1JX6GA-00074O-Ln

for shannon@thedomain.com; Wed, 05 Mar 2008 21:55:08 -0500

Received: from [83.25.192.178] by c.mx.mail.yahoo.com; Thu, 7 Mar 2008 03:55:24 +0100

From: "werwerwer" <fluffy2fluff@yahoo.com>

To: <shan@thedomain.com>

Subject: We can ship yourmedications overnight FREE

Date: Thu, 7 Mar 2008 03:55:24 +0100

MIME-Version: 1.0

Content-Type: multipart/alternative;

boundary="----=_NextPart_000_0006_01C88007.12A8D500"

X-Mailer: Microsoft Office Outlook, Build 11.0.5510

Thread-Index: Aca6Q81E0F265S3BK1FI1Q4S99N60M==

X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180

Message-ID: <01c88007$12a8d500$b2c01953@fluffy2fluff>

X-Spam-Status: No, score=

X-Spam-Score:

X-Spam-Bar:

X-Spam-Flag: NO

effigy · March 6, 2008

/(?:Return-path|From):[^@]+@.*?([^.]+)\..{2,3}>/

rbrown · March 6, 2008

In regex buddy it finds the domains, but if the domains match or don't match, RegEx Buddy says:

"The regular expression does not match the test subject"

For both ways.

So I tried it in the program for the fun if it and when I put it in the program it returns an error.

It says:

Error in regular expression '(?:Return-path|From):[^@]+@.*?([^.]+)\..{2,3}>' at postion 19: Unreconized modifier

I tried messing with it but I can't get it.

Thanks for your help...

effigy · March 6, 2008

What language is this in? Where did the delimiters go? Can we can see some code?

rbrown · March 6, 2008

I'm using this for a spam filter in Mailwasher pro.

So there isn't really any source code.

You can download a "free" version to play with

The free versions is limited only by the number of email accounts you can run through it.

I have over 100 so I bought it.

http://firetrust.com/download/mailwasher-pro

They finally answered me but basically left me to my own devices to figure it out.

They have done this in the past with other questions and I'm not a real happy camper with them.

And I just found out they use this program to test the regex's

I'm not a regex wizard...

This is what they said to use:

For the package used in MW, the tester is here:

http://www.regexpstudio.com/Downloads/TestRExp.zip

The doumentation for the package is here: http://www.regexpstudio.com/TRegExpr/Help/RegExp_Syntax.html

I'm thinking my next project will be an server side PHP version of MWP...

But will allow more flexibility for the filters...

effigy · March 6, 2008

Does it support PREG? What about modifiers? Without code to process the results, you might try this:

/Return-path:[^@]+@.*?([^.]+)\..{2,3}>.+?From:[^@]+@.*?(\1)\..{2,3}>/s

rbrown · March 6, 2008

Does it support PREG? no

What about modifiers? yes

The TregExpr program I gave you the link for has a test but it doesn't help me figure out what should be in what order or tell me what it is doing like regex buddy.

It assumes you knw what you are doing.

I tried what you sent no errors but it didn't match anything...

Here is what is says about modifiers:

==============================================

Modifiers

Modifiers are for changing behaviour of TRegExpr.

There are many ways to set up modifiers.

Any of these modifiers may be embedded within the regular expression itself using the (?...) construct.

Also, You can assign to appropriate TRegExpr properties (ModifierX for example to change /x, or ModifierStr to change all modifiers together). The default values for new instances of TRegExpr object defined in global variables, for example global variable RegExprModifierX defines value of new TRegExpr instance ModifierX property.

i

Do case-insensitive pattern matching (using installed in you system locale settings), see also InvertCase.

m

Treat string as multiple lines. That is, change "^'' and "$'' from matching at only the very start or end of the string to the start or end of any line anywhere within the string, see also Line separators.

s

Treat string as single line. That is, change ".'' to match any character whatsoever, even a line separators (see also Line separators), which it normally would not match.

g

Non standard modifier. Switching it Off You'll switch all following operators into non-greedy mode (by default this modifier is On). So, if modifier /g is Off then '+' works as '+?', '*' as '*?' and so on

x

Extend your pattern's legibility by permitting whitespace and comments (see explanation below).

r

Non-standard modifier. If is set then range à-ÿ additional include russian letter '¸', À-ß additional include '¨', and à-ß include all russian symbols.

Sorry for foreign users, but it's set by default. If you want switch if off by default - set false to global variable RegExprModifierR.

The modifier /x itself needs a little more explanation. It tells the TRegExpr to ignore whitespace that is neither backslashed nor within a character class. You can use this to break up your regular expression into (slightly) more readable parts. The # character is also treated as a metacharacter introducing a comment, for example:

(

(abc) # comment 1

| # You can use spaces to format r.e. - TRegExpr ignores it

(efg) # comment 2

)

This also means that if you want real whitespace or # characters in the pattern (outside a character class, where they are unaffected by /x), that you'll either have to escape them or encode them using octal or hex escapes. Taken together, these features go a long way towards making regular expressions text more readable.

effigy · March 7, 2008

This worked for me in the tester with /s checked:

Return-path:[^@]+@([^.]+\.)*([^.]+)\..{2,3}>.+?From:[^@]+@([^.]+\.)*(\2)\..{2,3}>

rbrown · March 8, 2008

I had to clean out my mail boxes and now I'm waiting for emails that fix the criteria.

So I'll let you know as soon as I run all the tests.

They come in waves of every other day so as soon as they rehit the servers I'll let you know.

Thanks,

rbrown · March 13, 2008

That works! Thank you for your help.

Sign In

[RegEx] parsing an email header for a domain match

Recommended Posts

rbrown

Link to comment

Share on other sites

effigy

Link to comment

Share on other sites

rbrown

Link to comment

Share on other sites

effigy

Link to comment

Share on other sites

rbrown

Link to comment

Share on other sites

effigy

Link to comment

Share on other sites

rbrown

Link to comment

Share on other sites

effigy

Link to comment

Share on other sites

rbrown

Link to comment

Share on other sites

rbrown

Link to comment

Share on other sites

Join the conversation

Browse

Activity

Important Information