Jump to content

lookahead not so greedy


johnrdorazio

Recommended Posts

I need to match n occurrences of "." in a string following specific rules.

The number of occurrences found by the regex should correspond with the exact amount of "."'s in the string.

 

The rules are:

1) every "." must be preceded by at least one digit from 1-9, and can be preceded by up to two other digits from 0-9 (basically, values from 1-999)

2) every "." must be followed by at least one digit from 1-9, and can be followed by up to two other digits from 0-9 (again, values from 1-999).

 

Sample valid strings are:

1) "1.3.5.7.9"

2) "21.23.25.27.29"

 

Sample invalid strings are:

1)"01.5.7.9" (first digit of the number preceding the dot cannot be a zero)

2) "1.05.7.9" (first digit of the number preceding or following the dot cannot be a zero)

3) "1.5.7777.9999" (cannot have more than 1 digit with value 1-9 and up to two more digits with value 0-9 (so up to 3 digit numbers); here we have four digit numbers)

 

I'm using preg_match_all to match all occurrences of the dot in the string, and I'm using lookaheads seeing that the digits matched are shared between the dots (the same digit that is preceding a dot is following another dot). The problem is, when I have two or three digit numbers, I'm getting more matches than I need.

 

Here is my pattern: /(?=([1-9][0-9]{0,2}\.[1-9][0-9]{0,2}))/

Here is my test subject: "Mt1,1-15.17.19"

 

There are only two dots, but I am getting four matches:

 

array (
  0 => 
  array (
    0 => '',
    1 => '',
    2 => '',
    3 => '',
  ),
  1 => 
  array (
    0 => '15.17',
    1 => '5.17',
    2 => '17.19',
    3 => '7.19',
  ),
)
 
I would like to get only two matches ('15.17' and '17.19'), as many matches as there are dots that fall within the rules but no more.
<?php
$string = "Mt1,15.17.19";
if(preg_match_all("/(?=([1-9][0-9]{0,2}\.[1-9][0-9]{0,2}))/",$string ) != substr_count($string ,".") ){
  echo "There are ".substr_count($string ,".")." dots in this string, but there are ".preg_match_all("/(?=([1-9][0-9]{0,2}\.[1-9][0-9]{0,2}))/",$string )." valid occurrences of the dot!";
}
?>

Since there can't logically be more valid occurrences than actual occurrences, for the time being I'm just using a "less than" operator instead of a "not equals" operator.

But I would like to learn how to get a regex pattern that matches the exact amount of valid instances.

 

 

Link to comment
Share on other sites

  • 2 weeks later...

Sorry I only got back to this now, I just saw your reply today. However your suggestion does not seem to work... I tried it on https://www.functions-online.com/preg_match_all.html but the result is still 4:

 

result:

4

 

$matches:

array (
  0 => 
  array (
    0 => '',
    1 => '',
    2 => '',
    3 => '',
  ),
  1 => 
  array (
    0 => '15.17',
    1 => '5.17',
    2 => '17.19',
    3 => '7.19',
  ),
)
Link to comment
Share on other sites

Never mind I got it. The negation was in the middle of the lookbehind, just had to switch those two around:

 

/(?<![0-9])(?=([1-9][0-9]{0,2}\.[1-9][0-9]{0,2}))/

 

Now I get two occurrences:

 

result:

2

 

$matches:

array (
  0 => 
  array (
    0 => '',
    1 => '',
  ),
  1 => 
  array (
    0 => '15.17',
    1 => '17.19',
  ),
)
 
Thanks a lot for the tip!
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.