Jump to content

preg_match not matching strings across multiple lines


Recommended Posts

Hi all,

I'm trying to pattern match street address from a paragraph. My RexEx code is able to match the address if it is found in a single line. But I'm not able to make it work for an address that spreads over multiple lines.

 

Following is the code:

 

$address = "I need a Regular expression to find a complete postal address for example Jack John 
1100 Glendon, Los Angeles, CA, 90024 United States . Thanx in advance i need it very urgently ";

$pattern1 = "/((\d+)[\-, ]*(\w+[\- ]*\w+[\- ]*\w+))[\-, ]*(\w+[\- ]?\w+)[\-, ]*([A-Z]{2})[\-, ]*(\d{5})/";

if ( preg_match( $pattern1, $address, $matches ) == 1 ) {
echo "Match found";
}

 

The above gives out "Match found" as output.

 

But I need to modify this RegEX to work for the paragraph below.

"I need a Regular expression to find a complete postal address for example

Jack John

1100 Glendon,

Los Angeles,

CA, 90024

United States

Thanx in advance i need it very urgently "

 

I tried the following RegEx. But it is not working

$pattern1 = "/((\d+)[\-, ]*(\w+[\- ]*\w+[\- ]*\w+))\s(\w+[\- ]?\w+)[\-, ]*([A-Z]{2})[\-, ]*(\d{5})/s";

 

Kindly help me on this.

 

P.S. I'm not very much familiar with advanced topics of RegEx. I arrive at this RegEx only with the help of google.

 

Regards,

ursvmg

<?php
$address = "I need a Regular expression to find a complete postal address for example Jack John 
1100 Glendon, Los Angeles, CA, 90024 United States . Thanx in advance i need it very urgently ";

$pattern1 = "/((\d+)[\-, ]*(\w+[\- ]*\w+[\- ]*\w+))[\-, ]*(\w+[\- ]?\w+)[\-, ]*([A-Z]{2})[\-, ]*(\d{5})/";

if ( preg_match( $pattern1, $address, $matches ) == 1 ) {

echo "<pre>";
print_r($matches);
echo "</pre>";
}

?>

Your pattern makes extensive use of the \w backslash character. This stands for 'non-vertical whitespace' characters, therefore they will never match vertical whitespace. The sections in between these you only allow for '\', '-' and ' '. You will need to decide where you wish to allow line breaks and add in the appropriate characters as required ('\r' and '\n').

Salathe just pointed out that I obviously miss-represented the definition of \w in that last post, \w of course represents word characters, not non-vertical whitespace, but the point still stands that it will not match \r and \n characters.

When i viewed the html source code, it had something like below.

"I need a Regular expression to find a complete postal address for example

<br>

Jack John

<br>

1100 Glendon,

<br>

Los Angeles,

<br>

CA, 90024

<br>

United States

<br>

Thanx in advance i need it very urgently "

 

So I did a work around by removing all the html tags using strip_tags() function. Now i have continuous piece of data to pattern match and its working fine.

 

Please let me know if there is any other easy way to do it.

 

Regards,

ursvmg

Ahh, if it is <br> tags that are splitting the data then strip_tags is probably as good as anything. If they are in particular places you could add them to the pattern if you wanted, but if you don't need the tags I don't see a downside to using strip_tags.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.