Jump to content

Delimited Text


mbeals

Recommended Posts

I'm writing a script to strip out data from incoming e-mails that are of a standard format.

 

If I have a line that reads:

 

.    Name: Foo Bar<br />

 

I need to extract just the "Foo Bar" part.

 

I'm using this regex:  [Name:\s.+\W], which keys off the "Name" and "<".  It works fantastic, except it returns "Name: Foo Bar"..... I don't want the leading "Name: ".

 

How do you structure it so that it searches for but does not return that opening tag?  I also suspect it is returning the < character, but it's not showing up in any of the output (CLI or web).

Link to comment
Share on other sites

okay....new unforseen issue.

 

When one of the fields has no text, it looks like:

 

Alt Phone:<br />

 

The regex I'm using us dropping down to the next line and grabbing that full string.

 

So for this text block:

 

Alt Phone:<br />

Name: Foo Bar<br />

 

The regex is returning:

 

Name: Foo Bar

 

 

Link to comment
Share on other sites

Why not post your full loop code? If you are pulling from an external file, you could easily get around this by simply reading the file line by line. If you are reading this from a variable, be sure that you don't have the multi-line match flag turned on in your regexp match.

Link to comment
Share on other sites

I'm using the mailparse extensions, so it opens up the /var/mail/$user  mail file then runs a regex search

 

the exact code in question is:

 

 

 

 

preg_match_all('/Phone:\s*(.*)/', $contents, $altphone);

preg_match_all('/Name:\s+(.+)/', $contents, $names);

 

I'd prefer not to pull it in line by line and to just let mailparse handle the input side of things.

Link to comment
Share on other sites

I'm sorry, I forgot to use the code tags and consequently a big piece of info was left out.

 

The source file looks like this:

 

Alt Phone:<br />
Name: Foo Bar<br />

 

Not like:

Alt Phone:

Name: Foo Bar

 

So I'm attempting to pull out everything between the : and the <

Link to comment
Share on other sites

How about /Name:\x20+([^<]*)/?

 

that ends up pulling in the entire remainder of the email

 

I think I resolved it.  I'm just using /Phone:(.*)/

 

It does capture the leading space when there is data, but that's not a big deal.

 

thanks for the help

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.