Jump to content


Photo

Regular Expressions - last match not next match?


  • Please log in to reply
2 replies to this topic

#1 Anidazen

Anidazen
  • Members
  • PipPipPip
  • Advanced Member
  • 79 posts

Posted 08 April 2006 - 01:13 PM

I've got some regular expressions, but they seem to be returning the last match, not the first match. How can I stop this behaviour? Even preg_match_all() only has one match - and it's the latter one.

(I am certain the expression should pick up the first match btw).

Here's the two things I'm trying to pull data from:


Web fare Adult Reg Fare 9.99 GBP
Wed, 19 Apr 06
Flight FR 663 07:55 Depart Birmingham (BHX)
09:00 Arrive Dublin (DUB)


Web fare Adult Reg Fare 9.99 GBP
Wed, 19 Apr 06
Flight FR 673 19:45 Depart Birmingham (BHX)
20:45 Arrive Dublin (DUB)



Sequentially identical. Here's the expression I'm using:

preg_match_all('/([0-9]*.[0-9]*) GBP.*(Mon|Tue|Wed|Thu|Fri|Sat|Sun), ([0-9]*) (Jan|Feb|Mar|Apr|Jun|Jul|Aug|Sep|Oct|Nov|Dec) ([0-9]*).*([0-9][0-9]\:[0-9][0-9])/s', $airhtml1, $matches);

The string has been strip_tags().

#2 Anidazen

Anidazen
  • Members
  • PipPipPip
  • Advanced Member
  • 79 posts

Posted 08 April 2006 - 01:25 PM

More information if it helps (as I am well and truly stumped).

This is how the server sees the string, exactly, after it's been strip_tags();


Web fare Adult Reg Fare 9.99 GBP Wed, 19 Apr 06 Flight FR 663 07:55 Depart Birmingham (BHX) 09:00 Arrive Dublin (DUB) Web fare Adult Reg Fare 9.99 GBP Wed, 19 Apr 06 Flight FR 673 19:45 Depart Birmingham (BHX) 20:45 Arrive Dublin (DUB)

#3 Anidazen

Anidazen
  • Members
  • PipPipPip
  • Advanced Member
  • 79 posts

Posted 08 April 2006 - 01:54 PM

Further more information. Been debugging this really intensely, trying everything.

I changed the string to be more strict:

preg_match_all('/([0-9]*\.[0-9]*) GBP.*(Mon|Tue|Wed|Thu|Fri|Sat|Sun), ([0-9]*) (Jan|Feb|Mar|Apr|Jun|Jul|Aug|Sep|Oct|Nov|Dec) ([0-9]*)\s*Flight\s*FR ([0-9]*)\s*([0-9][0-9]\:[0-9][0-9])/s', $airhtml1, $matches);

Replacing the .* characters with whitespace only characters (required me to ereg_replace out the &nsbp; things before). This will still always match the right most possible match, not the first. I changed the bold part to the actual number, and when this so - it will return the first match (being the only possible match). This is beyond baffling and beyond frustrating. I hope somebody can help me.



It only ever returns one match. Lol - so according to this PHP is matching [0-9]* with 673 but not 663.
Likewise, if the [0-9][0-9]\:[0-9][0-9] is substituted with 07:55, it matches the first one- as it should, when it's turned back to the regex - just like the flight number - it matches only the right most one. Notice I am using preg_match_all(); which really should be returning both matches each time. The problem of wrong match was the same in preg_match();



Edit yet again: As you can see from the frequent extra-info posts and edits, I am trying everything I can, and I've ran out of ideas. When I copied and pasted the string into a new script and ran the same expression, I got the same weird ass results - so it's not an issue with anything else in the script.

Maybe if somebody does want to try and help me solve this they can see if they get the same result. The string to copy and paste was posted above, along with the expression I'm using. :S. I really don't understand how this can be happening.




Is this some major PHP bug or something? :(




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users