Jump to content

Archived

This topic is now archived and is closed to further replies.

Anidazen

Regular Expressions - last match not next match?

Recommended Posts

I've got some regular expressions, but they seem to be returning the last match, not the first match. How can I stop this behaviour? Even preg_match_all() only has one match - and it's the latter one.

(I am certain the expression should pick up the first match btw).

Here's the two things I'm trying to pull data from:


Web fare Adult Reg Fare 9.99 GBP
Wed, 19 Apr 06
Flight FR 663 07:55 Depart Birmingham (BHX)
09:00 Arrive Dublin (DUB)


Web fare Adult Reg Fare 9.99 GBP
Wed, 19 Apr 06
Flight FR 673 19:45 Depart Birmingham (BHX)
20:45 Arrive Dublin (DUB)



Sequentially identical. Here's the expression I'm using:

preg_match_all('/([0-9]*.[0-9]*) GBP.*(Mon|Tue|Wed|Thu|Fri|Sat|Sun), ([0-9]*) (Jan|Feb|Mar|Apr|Jun|Jul|Aug|Sep|Oct|Nov|Dec) ([0-9]*).*([0-9][0-9]\:[0-9][0-9])/s', $airhtml1, $matches);

The string has been strip_tags().

Share this post


Link to post
Share on other sites
More information if it helps (as I am well and truly stumped).

This is how the server sees the string, exactly, after it's been strip_tags();


Web fare Adult Reg Fare 9.99 GBP Wed, 19 Apr 06 Flight FR 663 07:55 Depart Birmingham (BHX) 09:00 Arrive Dublin (DUB) Web fare Adult Reg Fare 9.99 GBP Wed, 19 Apr 06 Flight FR 673 19:45 Depart Birmingham (BHX) 20:45 Arrive Dublin (DUB)

Share this post


Link to post
Share on other sites
Further more information. Been debugging this really intensely, trying everything.

I changed the string to be more strict:

preg_match_all('/([0-9]*\.[0-9]*) GBP.*(Mon|Tue|Wed|Thu|Fri|Sat|Sun), ([0-9]*) (Jan|Feb|Mar|Apr|Jun|Jul|Aug|Sep|Oct|Nov|Dec) ([0-9]*)\s*Flight\s*FR ([b][0-9]*[/b])\s*([0-9][0-9]\:[0-9][0-9])/s', $airhtml1, $matches);

Replacing the .* characters with whitespace only characters (required me to ereg_replace out the &nsbp; things before). This will still always match the right most possible match, not the first. I changed the bold part to the actual number, and when this so - it will return the first match (being the [b]only[/b] possible match). This is beyond baffling and beyond frustrating. I hope somebody can help me.



It only ever returns one match. Lol - so according to this PHP is matching [0-9]* with 673 but not 663.
Likewise, if the [i][0-9][0-9]\:[0-9][0-9][/i] is substituted with 07:55, it matches the first one- as it should, when it's turned back to the regex - just like the flight number - it matches [b]only[/b] the right most one. Notice I am using preg_match_all(); which really should be returning both matches each time. The problem of wrong match was the same in preg_match();



Edit yet again: As you can see from the frequent extra-info posts and edits, I am trying everything I can, and I've ran out of ideas. When I copied and pasted the string into a new script and ran the same expression, I got the same weird ass results - so it's not an issue with anything else in the script.

Maybe if somebody does want to try and help me solve this they can see if they get the same result. The string to copy and paste was posted above, along with the expression I'm using. :S. I really don't understand how this can be happening.




Is this some major PHP bug or something? :(

Share this post


Link to post
Share on other sites

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.