Jump to content

Regex not picking up 0 within text


Jerzxu

Recommended Posts

So I have a problem in which I have to get a webpage using file_get_contents, strip out the HTML then parse it with Regular Expressions to get the current snow conditions for that area.

 

With my test cases, which are just copies of the text but with modified values of the amount of snow, it works. But when it goes to get the actual live value, which is currently 0 cm (hasn't changed so I don't know if it works on a different value), it returns a empty array. I have been looking across the web for the what could be causing this, and have try many different cases but nothing seems to work.

 

The regex is:

/(24H|24 H).*?(\d+?)(cm| cm)/si

 

The text is:

Nakiska Snow Report - Official Nakiska Ski Resort Snow Conditions var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-4125530-4']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })();  snow report webcam weather mobile    Please enable javascripts to view this page properly // Snow Report Tickets & Passes Getting Here Trail Maps Photo Gallery Snow School Rental Equipment Childcare Events Calendar Hot Deals & Packages News News Archive Media Centre Contact EmploymentSite Map at_attach("menu_parent", "menu_child", "click", "y", "pointer");  Employment Contact Us Javascript DHTML Drop Down Menu Powered by dhtml-menu-builder.com  BOOK YOUR VACATION ONLINE BELOW OR CALL 1-800-258-7669  View all packages      HOME > CONDITIONS > SNOW REPORT  NAKISKA SNOW REPORT  Saturday, October 29, 2011 2:02:10 PM Expected High Snow Runs Lifts Maximum 24 h 5°C  0 cm 4 2 New snow 24h 48h 7 Days YTD 0 cm 0 cm 0 cm 0 cm 0 in 0 in 0 in 0 in Snow Pack Upper Lower 50 cm 0 cm 20 in 0 in Weather High Low 5 °C 0 °C 41 °F 32 °F Snow Conditions Groomed Corduroy Groomed Trails 4 Run of the Day Mapmaker Activities & Events Great Opening Day with Excellent Skiing! Operation is from 10:00 AM until 3:30 PM Skiing and riding will be off the Gold Express on Upper and Middle Mapmaker , Big Bear and Morley Flats. Only Intermediate and Advanced terrain available Skiers and Riders will upload and download on Olympic Chair. We recommend only Intermediate and Advanced skiers and riders participate for this season weekend opener. RCR Ski More Card and Nakiska I Ski Card are Now on Sale!! Call 1 800 258 7669 to purchase yours and for further details The Monster Glades have been Expanded Winter Sports School Multi week programs are filling fast. Book today on line. Book your Winter Vacation with RCR reservations before November 30 and SAVE!!! Call 1-800-258-7669 to book your vacation! Calgary's Closest Mountain Resort. Only 45 minutes west of Calgary. Enjoy 2,555 vertical feet of the best Fall-Line Skiing in the Alberta Rockies! You can be riding while the rest are driving! Ski/Snowboard Nakiska this winter at Calgary's Closest Mountain Best Skiing Value in the Rockies! Additional Links - PDF Version of the Snow Report - Weather - Snow Conditions Defined  Follow us on facebook        Featured Photo Opening Day Turns    Conditions noted are current as of the time of the report and subject to change. Surface conditions will vary with skier use and weather. Ski and Ride at your own risk.  Home : Privacy Policy : Employment : Media : About Us : Contact : Site Map  Nakiska Mountain Resort (403) 591-7777 Toll Free: 1-800-258-7669 E-mail: information@skinakiska.com ©2010 All Rights Reserved. /* */ /* */

 

My expected output:

24 h 5°C  0 cm

 

Any help would be appreciated.

Link to comment
Share on other sites

We can only make a RegEx that matches your exact example.

 

You'd have to give us specific rules to follow for a more accurate solution.

 

In your example, you will get two different matches with the RegEx you've provided.

Link to comment
Share on other sites

We can only make a RegEx that matches your exact example.

 

You'd have to give us specific rules to follow for a more accurate solution.

 

Well basically I am just trying to get the number value before the centimeters (in this case 0) that appears directly after 24 h. The reasoning: We are getting snow reports for the last 24 hours and generally they are 24 h some characters, then 0 cm etc.

 

So essentially I need the first occurrence of some number cm that appears after 24h or 24 h.

 

In your example, you will get two different matches with the RegEx you've provided.

 

Which is true, that's what I should expect, but I actually get no matches at all. I'm using preg_match to match and file_get_contents and strip_tags to alter the text before it hits preg_match, but I can't see those modifying the output to result in nothing when a test case on the same text, with 10 cm instead, results in proper matching.

Link to comment
Share on other sites

You could try this one, but without seeing the source I can only tell you it works for your example posted, and allows fee spacing between conditions (\s+)

 

/24(?:h| h)\s+\d+°(?:C|F)\s+\d+\s+cm/i

 

The website it's coming from is here: http://www.skinakiska.com/conditions/snow-report.aspx

If that helps. Basically file_get_contents from that url put it into a var, then strip_tags, then do the preg_match.

 

Some PHP Code:

$page = file_get_contents(http://www.skinakiska.com/conditions/snow-report.aspx);
$content = strip_tags($page);
preg_match('/(24H|24 H).*?(\d+?)(cm| cm)/si', $content, $matches);
print_r($matches);

 

Should actually say that my expected output should be the number next to the cm, rather than what I said before. This RegEx is used for a bunch of different locations, so its put into a loop, but for now I just need this one to work. The difference between some of the other resorts is that they use 24 Hours, etc. which would just mean the first grouping would have 24 Hours|24H etc.

 

Link to comment
Share on other sites

It does in your example. That's as far as I can go with the data you've provided.

 

http://ekoverse.com/tests/test.php

 

Doesn't seem to. Code for that page:

<?php

$page = file_get_contents('http://www.skinakiska.com/conditions/snow-report.aspx');
$content = strip_tags($page);
preg_match('/24(?:h| h)\s+\d+°(?:C|F)\s+\d+\s+cm/i', $content, $matches);

$pattern = '/24(?:h| h)\s+\d+°(?:C|F)\s+\d+\s+cm/i';

print "Matches: <br/>";
print_r($matches);
print "<br/><br/>";
print "Pattern: <br/>";
print $pattern;
print "<br/><br/>";
print "Content: <br/>";
print $content;

?>

Guessing it doesn't work due to the degrees symbol issue.

 

I almost got what I needed by doing this RegEx:

/((24 Hr|24 Hrs|24 Hour|24 Hours|24H|24 H).*?(\d+)?(cm| cm))/si

 

Except the value I need, the 0, isn't appearing. Possibly due to a conflict of digits?

 

The reason I need the RegEx to be like that is because there is numerous other websites that I will be looping through to get the same sort of data so the expression can't be just for this site.

Link to comment
Share on other sites

Update:

 

So I resolved the issues. I actually went into the source code of the page I was getting and replaced all the random html characters like   to regular spaces. This resolved the issue with everything and now my original RegEx works.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.