Jerzxu Posted October 31, 2011 Share Posted October 31, 2011 So I have a problem in which I have to get a webpage using file_get_contents, strip out the HTML then parse it with Regular Expressions to get the current snow conditions for that area. With my test cases, which are just copies of the text but with modified values of the amount of snow, it works. But when it goes to get the actual live value, which is currently 0 cm (hasn't changed so I don't know if it works on a different value), it returns a empty array. I have been looking across the web for the what could be causing this, and have try many different cases but nothing seems to work. The regex is: /(24H|24 H).*?(\d+?)(cm| cm)/si The text is: Nakiska Snow Report - Official Nakiska Ski Resort Snow Conditions var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-4125530-4']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); snow report webcam weather mobile Please enable javascripts to view this page properly // Snow Report Tickets & Passes Getting Here Trail Maps Photo Gallery Snow School Rental Equipment Childcare Events Calendar Hot Deals & Packages News News Archive Media Centre Contact EmploymentSite Map at_attach("menu_parent", "menu_child", "click", "y", "pointer"); Employment Contact Us Javascript DHTML Drop Down Menu Powered by dhtml-menu-builder.com BOOK YOUR VACATION ONLINE BELOW OR CALL 1-800-258-7669 View all packages HOME > CONDITIONS > SNOW REPORT NAKISKA SNOW REPORT Saturday, October 29, 2011 2:02:10 PM Expected High Snow Runs Lifts Maximum 24 h 5°C 0 cm 4 2 New snow 24h 48h 7 Days YTD 0 cm 0 cm 0 cm 0 cm 0 in 0 in 0 in 0 in Snow Pack Upper Lower 50 cm 0 cm 20 in 0 in Weather High Low 5 °C 0 °C 41 °F 32 °F Snow Conditions Groomed Corduroy Groomed Trails 4 Run of the Day Mapmaker Activities & Events Great Opening Day with Excellent Skiing! Operation is from 10:00 AM until 3:30 PM Skiing and riding will be off the Gold Express on Upper and Middle Mapmaker , Big Bear and Morley Flats. Only Intermediate and Advanced terrain available Skiers and Riders will upload and download on Olympic Chair. We recommend only Intermediate and Advanced skiers and riders participate for this season weekend opener. RCR Ski More Card and Nakiska I Ski Card are Now on Sale!! Call 1 800 258 7669 to purchase yours and for further details The Monster Glades have been Expanded Winter Sports School Multi week programs are filling fast. Book today on line. Book your Winter Vacation with RCR reservations before November 30 and SAVE!!! Call 1-800-258-7669 to book your vacation! Calgary's Closest Mountain Resort. Only 45 minutes west of Calgary. Enjoy 2,555 vertical feet of the best Fall-Line Skiing in the Alberta Rockies! You can be riding while the rest are driving! Ski/Snowboard Nakiska this winter at Calgary's Closest Mountain Best Skiing Value in the Rockies! Additional Links - PDF Version of the Snow Report - Weather - Snow Conditions Defined Follow us on facebook Featured Photo Opening Day Turns Conditions noted are current as of the time of the report and subject to change. Surface conditions will vary with skier use and weather. Ski and Ride at your own risk. Home : Privacy Policy : Employment : Media : About Us : Contact : Site Map Nakiska Mountain Resort (403) 591-7777 Toll Free: 1-800-258-7669 E-mail: information@skinakiska.com ©2010 All Rights Reserved. /* */ /* */ My expected output: 24 h 5°C 0 cm Any help would be appreciated. Quote Link to comment Share on other sites More sharing options...
xyph Posted October 31, 2011 Share Posted October 31, 2011 We can only make a RegEx that matches your exact example. You'd have to give us specific rules to follow for a more accurate solution. In your example, you will get two different matches with the RegEx you've provided. Quote Link to comment Share on other sites More sharing options...
Jerzxu Posted October 31, 2011 Author Share Posted October 31, 2011 We can only make a RegEx that matches your exact example. You'd have to give us specific rules to follow for a more accurate solution. Well basically I am just trying to get the number value before the centimeters (in this case 0) that appears directly after 24 h. The reasoning: We are getting snow reports for the last 24 hours and generally they are 24 h some characters, then 0 cm etc. So essentially I need the first occurrence of some number cm that appears after 24h or 24 h. In your example, you will get two different matches with the RegEx you've provided. Which is true, that's what I should expect, but I actually get no matches at all. I'm using preg_match to match and file_get_contents and strip_tags to alter the text before it hits preg_match, but I can't see those modifying the output to result in nothing when a test case on the same text, with 10 cm instead, results in proper matching. Quote Link to comment Share on other sites More sharing options...
xyph Posted October 31, 2011 Share Posted October 31, 2011 You could try this one, but without seeing the source I can only tell you it works for your example posted, and allows fee spacing between conditions (\s+) /24(?:h| h)\s+\d+°(?:C|F)\s+\d+\s+cm/i Quote Link to comment Share on other sites More sharing options...
Jerzxu Posted October 31, 2011 Author Share Posted October 31, 2011 You could try this one, but without seeing the source I can only tell you it works for your example posted, and allows fee spacing between conditions (\s+) /24(?:h| h)\s+\d+°(?:C|F)\s+\d+\s+cm/i The website it's coming from is here: http://www.skinakiska.com/conditions/snow-report.aspx If that helps. Basically file_get_contents from that url put it into a var, then strip_tags, then do the preg_match. Some PHP Code: $page = file_get_contents(http://www.skinakiska.com/conditions/snow-report.aspx); $content = strip_tags($page); preg_match('/(24H|24 H).*?(\d+?)(cm| cm)/si', $content, $matches); print_r($matches); Should actually say that my expected output should be the number next to the cm, rather than what I said before. This RegEx is used for a bunch of different locations, so its put into a loop, but for now I just need this one to work. The difference between some of the other resorts is that they use 24 Hours, etc. which would just mean the first grouping would have 24 Hours|24H etc. Quote Link to comment Share on other sites More sharing options...
Jerzxu Posted October 31, 2011 Author Share Posted October 31, 2011 You could try this one, but without seeing the source I can only tell you it works for your example posted, and allows fee spacing between conditions (\s+) /24(?:h| h)\s+\d+°(?:C|F)\s+\d+\s+cm/i Didn't match anything strangely? Quote Link to comment Share on other sites More sharing options...
xyph Posted October 31, 2011 Share Posted October 31, 2011 It does in your example. That's as far as I can go with the data you've provided. Quote Link to comment Share on other sites More sharing options...
Jerzxu Posted October 31, 2011 Author Share Posted October 31, 2011 It does in your example. That's as far as I can go with the data you've provided. http://ekoverse.com/tests/test.php Doesn't seem to. Code for that page: <?php $page = file_get_contents('http://www.skinakiska.com/conditions/snow-report.aspx'); $content = strip_tags($page); preg_match('/24(?:h| h)\s+\d+°(?:C|F)\s+\d+\s+cm/i', $content, $matches); $pattern = '/24(?:h| h)\s+\d+°(?:C|F)\s+\d+\s+cm/i'; print "Matches: <br/>"; print_r($matches); print "<br/><br/>"; print "Pattern: <br/>"; print $pattern; print "<br/><br/>"; print "Content: <br/>"; print $content; ?> Guessing it doesn't work due to the degrees symbol issue. I almost got what I needed by doing this RegEx: /((24 Hr|24 Hrs|24 Hour|24 Hours|24H|24 H).*?(\d+)?(cm| cm))/si Except the value I need, the 0, isn't appearing. Possibly due to a conflict of digits? The reason I need the RegEx to be like that is because there is numerous other websites that I will be looping through to get the same sort of data so the expression can't be just for this site. Quote Link to comment Share on other sites More sharing options...
Jerzxu Posted October 31, 2011 Author Share Posted October 31, 2011 Update: So I resolved the issues. I actually went into the source code of the page I was getting and replaced all the random html characters like to regular spaces. This resolved the issue with everything and now my original RegEx works. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.