Jump to content

find lines that don't match RegEx?


Go to solution Solved by Jacques1,

Recommended Posts

One email address per line, thousands of lines. I was hoping to just use a text editor to do the work, but I may have to make a php loop to do it.

 

here's the regex:

^([a-z0-9-_\.]+@[a-z0-9-_]+\.[a-z]{2,3})$

 

Tested the RegEx, and it works great. It returns thousands of lines of matched email addresses. Problem is, I only want it to find the lines that don't match. How do you make RegEx return unmatched lines? :confused:

Link to comment
https://forums.phpfreaks.com/topic/288517-find-lines-that-dont-match-regex/
Share on other sites

It's as simple as adding an exclamation mark.

You didn't supply your code but I imagine it looks like this

if( preg_match_all ("/^([a-z0-9-_\.]+@[a-z0-9-_]+\.[a-z]{2,3})$/", $line) )

 

add the ! exclamation point before preg_match*

if( ! preg_match_all ("/^([a-z0-9-_\.]+@[a-z0-9-_]+\.[a-z]{2,3})$/", $line) )

 

Or, if you have something against exclamation marks, you can check if it is false

if( preg_match_all ("/^([a-z0-9-_\.]+@[a-z0-9-_]+\.[a-z]{2,3})$/", $line) === FALSE)

if( preg_match_all ("/^([a-z0-9-_\.]+@[a-z0-9-_]+\.[a-z]{2,3})$/", $line) !== TRUE)

How is the content provided? Is it a text file, a form post, or what? If this isn't provided from some programatic output that you can 100% rely upon then you should account for variability. If user entered, they may enter a comma or semi-colon between each record. It would not be hard to create a process to catch those scenarios.

 

Also, the RegEx you have will not work for some email addresses. It can result in false positives and false negatives.

It's just a text file. No comma separated list or anything complicated like that. Just one email address per line.

 

If Zane is right, it looks like I will have to depend upon PHP to do the work and not RegEx. Not a problem, I was just hoping that RegEx would have some kind of solution.

There is one thing you can try.

 

 

1. Use a RegEx to find valid records as you are now

2. Use the file() function on the text file to read the contents of the file into an array (each line is an element in the array).

3. Use array_diff() between the two arrays to get all the lines that do not contain valid email addresses.

  • Solution

I would neither use a regex nor load the entire file content into an array.

 

E-mail regexes always suffer from the same two problems: They're wrong, and they're a (bad) reinvention of the wheel. The problem of validating e-mail addresses has been solved already: filter_var().

 

And iterating over the lines of a file can be done ad hoc with fgets(). This is more efficient and won't run into memory issues:

<?php

$list_path = '/path/to/file';

$list = fopen($list_path, 'r');
while ($line = fgets($list))
{
    if (!filter_var(trim($line), FILTER_VALIDATE_EMAIL))
    {
        echo nl2br($line);
    }
}
fclose($list);
This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.