tejama Posted March 5, 2012 Share Posted March 5, 2012 I'm trying to write code to scrape the content of a weather page to get today's forecast using preg_match_all. Below is the code I'm using that is generating the error: function get_weather() { $html = file_get_contents("http://weatheroffice.gc.ca/city/pages/nl-24_metric_e.html"); preg_match_all("/<dt>Today<\/dt><dd>(.*)<\/dd>/sm", $html, $forecast, PREG_SET_ORDER); echo $forecast[0][0]; } The piece of html content I'm trying to scrape is as follows: <dt>Today</dt> <dd>Freezing rain mixed with ice pellets changing to rain and ending near noon then cloudy. Wind becoming northeast 30 km/h gusting to 50 this morning then southwest 50 gusting to 70 near noon. High plus 4.</dd> Basically the error I'm getting is that preg_match_all is returning no results. Can anyone point out where I might be going wrong in my regex? Thanks in advance Quote Link to comment Share on other sites More sharing options...
AyKay47 Posted March 5, 2012 Share Posted March 5, 2012 you are not accounting for the spaces/newline after the closing <dt> tag and before the opening <dd> tag. $html = "<dt>Today</dt> <dd>Freezing rain mixed with ice pellets changing to rain and ending near noon then cloudy. Wind becoming northeast 30 km/h gusting to 50 this morning then southwest 50 gusting to 70 near noon. High plus 4.</dd>"; preg_match_all("/<dt>Today<\/dt>\s*<dd>([^<]+)<\/dd>/i", $html, $forecast, PREG_SET_ORDER); echo "<pre>"; print_r($forecast); echo "</pre>"; results: Array ( [0] => Array ( [0] => Today Freezing rain mixed with ice pellets changing to rain and ending near noon then cloudy. Wind becoming northeast 30 km/h gusting to 50 this morning then southwest 50 gusting to 70 near noon. High plus 4. [1] => Freezing rain mixed with ice pellets changing to rain and ending near noon then cloudy. Wind becoming northeast 30 km/h gusting to 50 this morning then southwest 50 gusting to 70 near noon. High plus 4. ) ) Quote Link to comment Share on other sites More sharing options...
tejama Posted March 5, 2012 Author Share Posted March 5, 2012 Awesome! I wasn't 100% sure how to handle whitespace in reg exs, so I figured that was part of the issue. Thanks a bunch for the help! Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.