Jump to content

(.*?) - How does it work?


Mycotheologist

Recommended Posts

I was just wondering why (.*?) works the way it does. For example, if I do preg_match('/WORD1(.*?)WORD2/'); it will match anything in between WORD1 and WORD2. I know that . is a wild card so it matches any letter. From what I've read, * matches zero or more of the preceding characters so that makes it match whole strings, rather than a single character. Its the ? that confuses me, from what I read, ? matches zero or one of the preceding characters. What purpose does that serve then? What would happen if I omitted the ?

Link to comment
Share on other sites

The question mark after * makes it ungreedy. If you leave it out, it will match everything between the first WORD1 and the last WORD2 where if you have multiple occurences of WORD2 the ungreedy operator will only match until the first WORD2

 

preg_match('!WORD1(.*?)WORD2!', 'WORD1foobar stands for FTP Operation Over Big Address Records.WORD2Explanation of these and more acronyms can be found atWORD2', $matches);
print_r($matches); // ungreedy (foobar stands for FTP Operation Over Big Address Records.)
preg_match('!WORD1(.*)WORD2!', 'WORD1foobar stands for FTP Operation Over Big Address Records.WORD2Explanation of these and more acronyms can be found atWORD2', $matches);
print_r($matches); // greedy (foobar stands for FTP Operation Over Big Address Records.WORD2Explanation of these and more acronyms can be found at) 

Link to comment
Share on other sites

Mycotheologist, let me explain what ungreedy means. A ? preceded by (normally .*) means a non-greedy match.

 

If this is your text:

 

WORD1 wohhahah WORD2 tatata WORD2

 

And you omit the ?, the preg_match .* will "eat up" (greedy) all characters from WORD1 to the last WORD2 found.

 

WORD1 wohhahah WORD2 tatata WORD2

 

If you have the ?, it will be non-greedy ("nice") and only eat up characters until the first collision with WORD2.

 

WORD1 wohhahah WORD2 tatata WORD2

 

Does that make sense?

Link to comment
Share on other sites

  • 2 weeks later...

If find it least confusing to explain what you are telling the regex engine to do.

 

Dot-star (.*) tells the regex engine: "Match any character, zero or more times, as many times as possible". The dot-star will bulldoze its way to the end of the subject. Then, if needed to allow a match, it will backtrack, one character at a time.

 

Dot-star-question-mark (.*?) tells the regex engine:  "Match any character, zero or more times, as few times as possible". The engine will start out by matching zero characters, then, because it cannot return a match (since "WORD 2" has not been found), it will match one more character, then one more, and so on.

 

For more details, you may like to check out my tut about the degrees of regex greed, and Jan's page on repetition.

 

This is a very cool but crucial concept to grasp, please don't hesitate to ask for clarifications.

 

 

Link to comment
Share on other sites

<a href="website.com">Click</a> or just visit <a href="example.com">my example page</a>

 

Let's say you want to retrieve all the a tags, in this example 2. The bold text is what is matched, while the underscore is to show you where the criteria matches. The greedy (without ?) will match the entire string, because it doesn't stop at the first match. While the second (with ?) stops matching as early as it can, and will therefor make two matches.

 

Example 1:

Regex: /<a.*>.*<\/a>/

Matches: <a href="website.com">Click</a> or just visit <a href="example.com">my example page</a>

 

Example 2:

Regex: /<a.*?>.*?<\/a>/

Matches: <a href="website.com">Click</a> or just visit <a href="example.com">my example page</a>

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.