Jump to content

How isit possible to match Something Or Nothing


RuleBritannia
Go to solution Solved by .josh,

Recommended Posts

In the order of match something, then nothing, I know '?' will check for nothing or something, But using this it will accept the first match being nothing, so it wont bother to match something after, If I can actually match something before, then nothing after, something will be included in the result.?

 

Other explanation

 

1st match = finish bothering

 

? = this will accept in order of nothing, then something             (match : nothing)

+? = this will accept in order something, but NOT nothing         (match : fail)

*? = this will accept in order nothing, then something                (match : nothing)

 

Overall, something is never checked to even bother a return, as the regex is happy with nothing.

 

How is this even possible to acheive?

Edited by RuleBritannia
Link to comment
Share on other sites

So your question is about how to make it try to match something first before it tries to match nothing? The answer is one of:

- Use ? (which may not do the job depending on what else you have in your regex, as happened in your other thread),

- Force the engine to backtrack (by sending it too far into the string after which it must backtrack to match the pattern successfully), or

- That's not how most/all regex engines work (because they generally want to match as quickly as possible)

 

Maybe you have a specific example?

Link to comment
Share on other sites

I agree with requinix.. you're not making a whole lot of sense here, and providing actual example(s) of what you want to happen would be a lot more helpful.

 

But it sounds like what you want is already how the engine behaves. It sounds like you want it to match something, and only match nothing if it has to (to fulfill the rest of the pattern). Well that's exactly how it already works with the * quantifier, and the ? when used as a quantifier (not a non-greedy flag).

 

Examples with 0 or more * quantifier:

 

$string = "foobar";
preg_match('~(\w*)~',$string,$m);
// $m[1] contains "foobar"

preg_match('~(\w*)(\w)~',$string,$m);
// $m[1] contains "fooba"
// $m[2] contains "r";

preg_match('~(\w*)(oobar)~',$string,$m);
// $m[1] contains "f"
// $m[2] contains "oobar";

preg_match('~(\w*)(foobar)~',$string,$m);
// $m[1] contains ""
// $m[2] contains "foobar";
Same thing with the ? as a 0 or 1 quantifier:

 

$string = "foobar";
preg_match('~(\w?)~',$string,$m);
// $m[1] == "f"

preg_match('~(\w?)(\w)~',$string,$m);
// $m[1] == "f"
// $m[2] == "o";

preg_match('~(\w?)(oobar)~',$string,$m);
// $m[1] == "f"
// $m[2] == "oobar";

preg_match('~(\w?)(foobar)~',$string,$m);
// $m[1] == ""
// $m[2] == "foobar";
The "summary" definition of a quantifier "matches "n or more.." can be a bit confusing when you don't read the fine print under it.

By default, quantifiers are in a "greedy" state, and will consume everything they can. They will only give up what they have consumed if there is more to the pattern and that pattern cannot be satisfied unless they give something up. And they only give up exactly what they have to. So by default, it's more of a reverse situation, where it matches "everything it can or less".

 

Also, it's worth pointing out that ? means 2 different things, depending on the context. If it is after a pattern to be matched (single char, char class, group, etc.), then it is a quantifier, meaning "match 0 or 1" of the thing before it. If it comes after another quantifier, it becomes the "make this quantifier non-greedy" flag.

 

What does that mean? Well it (sort of) reverses the "greedy" behavior of the quantifier. Instead of eating everything in site and barfing up what it has to afterwards, it asks the rest of the pattern if it needs the next character and if it doesn't, then it consumes it. Then it moves on to the next, wash rinse and repeat until it comes to the first character its pattern can't match. So it's more of a "look before you leap" approach, effectively making it more accurate to the "match [n] or more" description.

 

So why isn't non-greedy matching the default? Long story short, it boils down to efficiency. In most cases it's faster to make the pattern consume everything it can and then back-track, than to keep looking ahead at every character. But if you need to, you can make it the default by using the U modifier. This will reverse the behavior. For example.* will be non-greedy and .*? will be greedy. Seriously though, there is rarely a case where this will actually increase efficiency. 99.99% of the time it's just swapping syntax, which is only "useful" if you're trying for the "shortest possible pattern" award. If you are in a position of thinking it might help, then you should already be calling yourself a regex-master. If you you are not a regex-master, then at best you will just wind up confusing yourself (and others) trying to understand your pattern. I probably shouldn't have even mentioned it. Do yourself a favor and just delete this paragraph from your memory.

Link to comment
Share on other sites

Hello

 

Thanks for the in depth replies both of you, However what you have posted I seem to have comprehended already, Apart from the U modifier.

I must be doing something the wrong way.

Here is my example, which either of you may remember from the previuos post, but this time I have just appended one word.

$string = 'I HAVE HAD ENOUGH  THIS HEADACHE OK';

preg_match("/(I HAVE HAD ENOUGH).*?(THIS HEADACHE)?/i",$string,$match);

echo '<pre>';
var_dump($match);
echo '</pre>';

match will consist of

array(2) {
  [0]=>
  string(17) "I HAVE HAD ENOUGH"
  [1]=>
  string(17) "I HAVE HAD ENOUGH"
}

The regex seems to see that because (THIS HEADACHE)? is optional, It is happy to accept nothing before checking, OR, if I am wrong, and it IS checking (THIS HEADACHE), if it finds a match, it will not return it in the results, So this is why I came to my conclusion of its happy with nothing over something,

 

If however I remove the ? after (THIS HEADACHE)?, The match is correct.

array(3) {
  [0]=>
  string(32) "I HAVE HAD ENOUGH  THIS HEADACHE"
  [1]=>
  string(17) "I HAVE HAD ENOUGH"
  [2]=>
  string(13) "THIS HEADACHE"
}

Adding the optional ? to the group says if its there, find it but dont bother returning it who cards, if its not there then nothing to return anyway who cares. I cant understand why it would work like this

 

Im also now going to re-read your post josh as its so in depth.

 

Thanks in advance

Edited by RuleBritannia
Link to comment
Share on other sites

I seem to have found one unpractical way of doing it, completly against code brevity

$string = 'I HAVE HAD ENOUGH OF THIS HEAACHE OK';

preg_match("/((I HAVE HAD ENOUGH).*?(THIS HEADACHE))|((I HAVE HAD ENOUGH).*?(THIS HEADACHE)?)/i",$string,$match);

echo '<pre>';
var_dump($match);
echo '</pre>';

This will return the (THIS HEADACHE) in the match first if its there.

 

result

array(4) {
  [0]=>
  string(34) "I HAVE HAD ENOUGH OF THIS HEADACHE"
  [1]=>
  string(34) "I HAVE HAD ENOUGH OF THIS HEADACHE"
  [2]=>
  string(17) "I HAVE HAD ENOUGH"
  [3]=>
  string(13) "THIS HEADACHE"
}

Or, If the subgroup doesnt match, the OR statement checks the whole main group again with the ? parameter at the last sub group, happily accepting nothing

 

result

array(6) {
  [0]=>
  string(17) "I HAVE HAD ENOUGH"
  [1]=>
  string(0) ""
  [2]=>
  string(0) ""
  [3]=>
  string(0) ""
  [4]=>
  string(17) "I HAVE HAD ENOUGH"
  [5]=>
  string(17) "I HAVE HAD ENOUGH"
}

There must be a correct way to do this, surely this way is not the msot pleasent way to acheive such a small thing.

 

Thanks in advance

Edited by RuleBritannia
Link to comment
Share on other sites

Okay, so this:

 

preg_match('~(foo)(.*?)(bar)?~','foo bar',$m);
will give you this:

 

Array
(
    [0] => foo
    [1] => foo
    [2] => 
)
[0] is the full pattern match

[1] is from (foo)

[2] is from (.*?)

 

Okay so (.*?) matches nothing, not even the space, because it is non-greedy and therefore matches for nothing. So, the pointer is currently at the space between "foo bar". So (bar)? is greedy, but won't match, because "b" != " ". Since there's nothing else to match for, that's all you get.

 

Now from your other thread, where a $ was thrown onto the end:

 

preg_match('~(foo)(.*?)(bar)?$~','foo bar',$m);
will give you this:

 

Array
(
    [0] => foo bar
    [1] => foo
    [2] =>  
    [3] => bar
)
Aha, now this is where things get interesting. Since there is something after the (bar)? that HAS to match (the $), the engine now has to backtrack to match for it. Why? because the next characters after "foo" are " bar" not end-of-string. So it can't just match "foo" as if "foo" is at the end of the string. Now there are 2 ways this can happen:

 

1) match a space " " with (.*?) and "bar" from (bar)?

2) match " bar" with (.*?) (in which case (bar)? doesn't match for anything)

 

Well the regex engine goes for option #1 because it's the path of least resistance. Remember, (.*?) is still lazy, so it's only going to match what it has to. Since (bar)? doesn't match for a space, (.*?) MUST match the space, in order to satisfy the $ match. And since (bar)? is greedy and can fill in the rest of the blanks, that is used.

 

More accurately what happens is that it's backtracking from end of string one character at a time to fill in the blanks between "foo" and end of string. IOW it's moving backwards to come up with " bar". Well the first thing it runs up against is that (bar)? and it matches that, and now all it needs is the space, so then it backtracks some more and can fulfill it from the (.*?) So in the first example, [2] is an empty string, but in the 2nd example, [2] is a space " "

 

 

I know, it's pretty confusing, and takes some getting used to, and to fully understand it, you have to dive into how the engine works, not just what the symbols do. There's a reason regex has a high learning curve and few people dare to climb that mountain!

Link to comment
Share on other sites

Hello Josh

 

Thanks again for your detailed explanation, But it doesnt work for this example I gave here.

In this thread, I noted at the top that I apended the string from the previuos post, Your working example only works for the previuos example, Not this example

 

Current example

'I HAVE HAD ENOUGH OF THIS HEADACHE OK';

We want

'I HAVE HAD ENOUGH' and if its there 'THIS HEADACHE'

 

So lets apply your method

preg_match('~(I HAVE HAD ENOUGH)(.*?)(THIS HEADACHE)?$~','I HAVE HAD ENOUGH OF THIS HEADACHE OK',$m);

var_dump($m);

Now, when we run this, it will return

array(2) {
  [0]=>
  string(37) "I HAVE HAD ENOUGH OF THIS HEADACHE OK"
  [1]=>
  string(17) "I HAVE HAD ENOUGH"

the match does not display 'THIS HEADACHE' subgroup.

 

But if we do this.

preg_match('~(I HAVE HAD ENOUGH).*?(THIS HEADACHE)~','I HAVE HAD ENOUGH OF THIS HEADACHE OK',$m);

Dont give this subgroup the option to pick and choose what it wants to return, removed ?, so we want that group.

Also had to remove $ to match main result.

 

This returns.

array(3) {
  [0]=>
  string(34) "I HAVE HAD ENOUGH OF THIS HEADACHE"
  [1]=>
  string(17) "I HAVE HAD ENOUGH"
  [2]=>
  string(13) "THIS HEADACHE"
}

This result here is correct, But the regex is not, But the subgroup 'THIS HEADACHE' must only return if its there, But ofc, when we give it this option, when the subgroup is there, it will not be returned in the result, so we wont know if it was there or it wasnt(unless we do other string search and manipulation which should'nt have to be done)

 

Sorry If I have not made full sense here, Been working on this pattern for over 8 hours now.

 

Thanks

Edited by RuleBritannia
Link to comment
Share on other sites

  • Solution

Okay, maybe you are trying to "simplify" things for me, and there's more code at play that's somehow not right?

 

Because your first code snippet:

 

preg_match('~(I HAVE HAD ENOUGH)(.*?)(THIS HEADACHE)?$~','I HAVE HAD ENOUGH OF THIS HEADACHE OK',$m);
var_dump($m);
Returns this:

 

Array
(
    [0] => I HAVE HAD ENOUGH OF THIS HEADACHE OK
    [1] => I HAVE HAD ENOUGH
    [2] =>  OF THIS HEADACHE OK
)
So firstly, the returned results don't match what you've posted. 2nd, (THIS HEADACHE)? will not match, even with the $ on the end. Why? Because your string doesn't end in "THIS HEADACHE". It ends in "THIS HEADACHE OK". However, you will see that [2] shows the rest of the string because the (.*?) will match the rest of the string to satisfy matching the $ on the end.

 

If you want to make it optionally match for "THIS HEADACHE" when it is not the end of the string, you will need to group the .*? and (THIS HEADACHE)? together:

 

preg_match('~(I HAVE HAD ENOUGH)(.*?(THIS HEADACHE))?~','I HAVE HAD ENOUGH OF THIS HEADACHE OK',$m);
var_dump($m);
This will match:

 

Array
(
[0] => I HAVE HAD ENOUGH OF THIS HEADACHE
[1] => I HAVE HAD ENOUGH
[2] => OF THIS HEADACHE
[3] => THIS HEADACHE
)
Actually, here is another example of where greedy vs. non-greedy come into play. With this example, using (.*) instead of (.*?) will give you the same result. However, the (.*) will be more efficient. HOWEVER, if your string is more complex and has more than one occurance of "THIS HEADACHE", (.*) will consume up to the last occurrence of it, whereas (.*?) will match up to the first occurrence of it.

 

Also, you can make the outer parens a non-capture group to cut down on the returned results.

 

Example:

 

preg_match('~(I HAVE HAD ENOUGH)(?:.*?(THIS HEADACHE))?~','I HAVE HAD ENOUGH OF THIS HEADACHE OK',$m);
var_dump($m);
will return

 

Array
(
    [0] => I HAVE HAD ENOUGH OF THIS HEADACHE
    [1] => I HAVE HAD ENOUGH
    [2] => THIS HEADACHE
)
Link to comment
Share on other sites

Hello Josh

 

I see my first result was not correct, I reedited the script to run diff results to post so you could see, seems i ran the result without the () around .* , my fault.

 

It seems you have found the answer I was looking for all along.

preg_match('~(I HAVE HAD ENOUGH)(?:.*?(THIS HEADACHE))?~','I HAVE HAD ENOUGH OF THIS HEADACHE OK',$m);
var_dump($m);

This is working correctly, THANKS ALOT!

9 hours working on this now.

 

When I get paid I will try to send a donation to the first link in ur sig.

 

Thanks Requinix also for the help + previuos correct answer.

 

:happy-04: :happy-04:

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.