Jump to content


Advanced Regex problem. First vs last match!

  • Please log in to reply
2 replies to this topic

#1 Anidazen

  • Members
  • PipPipPip
  • Advanced Member
  • 79 posts

Posted 13 July 2006 - 01:01 AM


When trying to run regular expressions, it seems sometimes the expression doesn't want to pick the first instance as it should.

Example would be: "/blah.*?([0-9]*)/is" on the following:
blah 25 blahblah blah 62 blah blah blahblah 33

In this example, it would return either 62 or 33, but not 25 - which would match if there were nothing after it. This is incredibly frustrating and is ofc. breaking my script's use. How can I fix this? Why does this happen!

Thanks in advance.

#2 effigy

  • Staff Alumni
  • Advanced Member
  • 3,600 posts
  • LocationIL

Posted 13 July 2006 - 01:23 AM

Your regex did not return anything for me. How about this?

Regexp | Unicode Article | Letter Database

#3 Wildbug

  • Members
  • PipPipPip
  • Advanced Member
  • 1,149 posts

Posted 13 July 2006 - 07:57 PM

I'm surprised that expression returned any numbers at all for that example.

The reason it's not doing what you want is because of the asterisk after the number class ([0-9]*).  What your entire expression means is "find 'blah' followed by zero or more of anything without being greedy, followed by zero or more numbers."  But both of those "zero or more" characters match at the zero-length atom(?) just beyond "blah".  To the regular expression engine it's as if there were an invisible character between the "h" in "blah" and the space following it.  This is analagous to the word boundry class (\b) where a match occurs between a word character and a non-word character.

Effigy's suggestion will work fine for you, unless you require that there be a space between "blah" and the trailing number, in which case you'd need to use "\s+" instead of "\s*".
Twice a day my clock works PERFECTLY!  I can't figure out what's wrong with it.

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users