Jump to content

[RegEx] parsing an email header for a domain match


Recommended Posts

I'm trying to do a match against the "return-path" and "from" and I need it to compare the root domain and return true if they match.

The program needs a regex with a back reference.

The from field may or may not have ' "whatever they put here" ' before the email address.

 

Otherwise if I was doing this in php I would have been done by now.

 

I have this from someone else who is MIA at the moment:

^Return-path:[\w\d]*(@[\d\w.-]*)>.*^From: [^\m\r]*\1>

 

But it returns true if the one of them is a subdomain of the root domain

And if it has ' "whatever they put here" ' before the email address.

 

I need it to ignore the subdomain and anything before the from email address and just compare the domains.

 

The emails addresses in the examples below are a from a spammer.

Thanks,

 

 

Return-path: <fluffy2fluff@yahoo.com>

Envelope-to: shan@thedomain.com

Delivery-date: Wed, 05 Mar 2008 21:55:08 -0500

Received: from ahk178.neoplus.adsl.tpnet.pl ([83.25.192.178]:2290)

        by bay.dnsprotect.com with esmtp (Exim 4.68)

        (envelope-from <fluffy2fluff@yahoo.com>)

        id 1JX6GA-00074O-Ln

        for shannon@thedomain.com; Wed, 05 Mar 2008 21:55:08 -0500

Received: from [83.25.192.178] by c.mx.mail.yahoo.com; Thu, 7 Mar 2008 03:55:24 +0100

From: "werwerwer" <fluffy2fluff@yahoo.com>

To: <shan@thedomain.com>

Subject: We can ship yourmedications overnight FREE

Date: Thu, 7 Mar 2008 03:55:24 +0100

MIME-Version: 1.0

Content-Type: multipart/alternative;

        boundary="----=_NextPart_000_0006_01C88007.12A8D500"

X-Mailer: Microsoft Office Outlook, Build 11.0.5510

Thread-Index: Aca6Q81E0F265S3BK1FI1Q4S99N60M==

X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180

Message-ID: <01c88007$12a8d500$b2c01953@fluffy2fluff>

X-Spam-Status: No, score=

X-Spam-Score:

X-Spam-Bar:

X-Spam-Flag: NO

 

 

 

 

 

Link to comment
Share on other sites

In regex buddy it finds the domains, but if the domains match or don't match, RegEx Buddy says:

"The regular expression does not match the test subject"

For both ways.

 

 

So I tried it in the program for the fun if it and when I put it in the program it returns an error.

 

It says:

Error in regular expression '(?:Return-path|From):[^@]+@.*?([^.]+)\..{2,3}>' at postion 19: Unreconized modifier

 

I tried messing with it but I can't get it.

 

Thanks for your help...

Link to comment
Share on other sites

I'm using this for a spam filter in Mailwasher pro.

So there isn't really any source code.

You can download a "free" version to play with

The free versions is limited only by the number of email accounts you can run through it.

I have over 100 so I bought it.

http://firetrust.com/download/mailwasher-pro

 

They finally answered me but basically left me to my own devices to figure it out.

They have done this in the past with other questions and I'm not a real happy camper with them.

 

And I just found out they use this program to test the regex's

I'm not a regex wizard...

 

This is what they said to use:

For the package used in MW, the tester is here:

http://www.regexpstudio.com/Downloads/TestRExp.zip

 

The doumentation for the package is here: http://www.regexpstudio.com/TRegExpr/Help/RegExp_Syntax.html

 

 

I'm thinking my next project will be an server side PHP version of MWP... 

But will allow more flexibility for the filters...

 

Link to comment
Share on other sites

Does it support PREG? no

What about modifiers? yes

The TregExpr program I gave you the link for has a test but it doesn't help me figure out what should be in what order or tell me what it is doing like regex buddy.

It assumes you knw what you are doing.

 

I tried what you sent no errors but it didn't match anything...

 

Here is what is says about modifiers:

 

==============================================

Modifiers

 

 

 

Modifiers are for changing behaviour of TRegExpr.

 

There are many ways to set up modifiers.

Any of these modifiers may be embedded within the regular expression itself using the (?...) construct.

Also, You can assign to appropriate TRegExpr properties (ModifierX for example to change /x, or ModifierStr to change all modifiers together). The default values for new instances of TRegExpr object defined in global variables, for example global variable RegExprModifierX defines value of new TRegExpr instance ModifierX property.

 

i

Do case-insensitive pattern matching (using installed in you system locale settings), see also InvertCase. 

 

 

m

Treat string as multiple lines. That is, change "^'' and "$'' from matching at only the very start or end of the string to the start or end of any line anywhere within the string, see also Line separators. 

 

 

s

Treat string as single line. That is, change ".'' to match any character whatsoever, even a line separators (see also Line separators), which it normally would not match. 

 

 

g

Non standard modifier. Switching it Off You'll switch all following operators into non-greedy mode (by default this modifier is On). So, if modifier /g is Off then '+' works as '+?', '*' as '*?' and so on 

 

 

x

Extend your pattern's legibility by permitting whitespace and comments (see explanation below). 

 

 

r

Non-standard modifier. If is set then range à-ÿ additional include russian letter '¸', À-ß additional include '¨', and à-ß include all russian symbols. 

 

Sorry for foreign users, but it's set by default. If you want switch if off by default - set false to global variable RegExprModifierR. 

 

 

 

 

 

 

The modifier /x itself needs a little more explanation. It tells the TRegExpr to ignore whitespace that is neither backslashed nor within a character class. You can use this to break up your regular expression into (slightly) more readable parts. The # character is also treated as a metacharacter introducing a comment, for example:

 

 

(abc) # comment 1 

 

  |  # You can use spaces to format r.e. - TRegExpr ignores it 

 

(efg) # comment 2 

 

 

 

 

This also means that if you want real whitespace or # characters in the pattern (outside a character class, where they are unaffected by /x), that you'll either have to escape them or encode them using octal or hex escapes. Taken together, these features go a long way towards making regular expressions text more readable.

 

 

 

 

Link to comment
Share on other sites

I had to clean out my mail boxes and now I'm waiting for emails that fix the criteria.

So I'll let you know as soon as I run all the tests.

They come in waves of every other day so as soon as they rehit the servers I'll let you know.

 

Thanks,

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.