drexnefex Posted January 9, 2007 Share Posted January 9, 2007 Hello all - I am looking to use CURL, preg_match, and a regex function to pull specific data from unformatted text files.Based on how these files are structured...i dont have the slightest clue how to create a regex pattern.Here is one of the files:[code]1-9-2007Northwest Weather and Avalanche CenterAlpental Ski Area, WashingtonWind instruments unheated and may rimeWind speed not accurate, Precip gage under-recording 5,6 Jan MM/DD Hour Temp Temp Temp RH RH Wind Wind Wind Hour Total 24Hr Total PST F F F % % Avg Max Dir. Prec. Prec. Snow Snow 5400' 4300' 3120' 3120' 5400' 5530' 5530' 5530' 3120' 3120' 3120' 3120'------------------------------------------------------------------------------------ 1 8 1000 25 28 34 93 98 2 16 258 0 0 -0 96 1 8 1100 25 29 36 87 98 1 15 258 0 0 -0 96 1 8 1200 23 29 36 90 98 2 15 258 0 0 -0 96 1 8 1300 23 28 34 93 98 1 13 258 0 0 0 96 1 8 1400 25 28 34 96 98 1 16 258 0 0 0 97 1 8 1500 26 29 33 99 99 0 13 258 0 0 0 97 1 8 1600 28 29 32 100 99 1 16 259 0 0 0 97 1 8 1700 29 28 32 100 100 2 14 258 .02 .02 0 97 1 8 1800 30 29 32 100 100 0 16 258 0 .02 0 97 1 8 1900 31 30 32 100 100 3 17 258 .02 .04 0 97 1 8 2000 31 31 32 100 100 -1 17 258 .03 .07 0 97 1 8 2100 31 32 32 100 100 1 15 258 .06 .13 0 97 1 8 2200 31 33 32 100 100 3 17 258 .03 .16 0 96 1 8 2300 31 33 33 100 100 2 13 258 .01 .17 0 96 1 9 0 31 33 33 100 100 1 15 258 0 .17 0 96 1 9 100 31 33 33 100 100 2 15 258 0 .17 0 96 1 9 200 31 33 33 100 100 -1 12 258 .01 .18 0 96 1 9 300 31 33 33 100 100 1 16 258 0 .18 0 96 1 9 400 31 34 33 100 100 2 14 258 .01 .19 0 96 1 9 500 30 34 34 100 100 1 13 258 0 .19 0 96 1 9 600 30 33 34 100 100 1 13 258 0 .19 0 95 1 9 700 30 33 34 100 100 2 13 258 0 .19 0 95 1 9 800 30 33 34 100 100 0 13 258 0 .19 0 95 1 9 900 30 34 34 100 100 1 13 259 0 .19 0 95 .19 Page 1[/code]Source of data: http://www.nwac.us/products/OSOALPI want the regex to match the 1st 3 header rows of the 'table' and the last row of the 'table.'Essentially what i want to do is extract data from this and a few other similarly structured files and dump it into an html table.I have a handle on the curl part but the regex for this completely boggles my mind.Any ideas? Quote Link to comment Share on other sites More sharing options...
effigy Posted January 10, 2007 Share Posted January 10, 2007 [code]<pre><?php $data = <<<DATA1-9-2007Northwest Weather and Avalanche CenterAlpental Ski Area, WashingtonWind instruments unheated and may rimeWind speed not accurate, Precip gage under-recording 5,6 Jan MM/DD Hour Temp Temp Temp RH RH Wind Wind Wind Hour Total 24Hr Total PST F F F % % Avg Max Dir. Prec. Prec. Snow Snow 5400' 4300' 3120' 3120' 5400' 5530' 5530' 5530' 3120' 3120' 3120' 3120'------------------------------------------------------------------------------------ 1 8 1000 25 28 34 93 98 2 16 258 0 0 -0 96 1 8 1100 25 29 36 87 98 1 15 258 0 0 -0 96 1 8 1200 23 29 36 90 98 2 15 258 0 0 -0 96 1 8 1300 23 28 34 93 98 1 13 258 0 0 0 96 1 8 1400 25 28 34 96 98 1 16 258 0 0 0 97 1 8 1500 26 29 33 99 99 0 13 258 0 0 0 97 1 8 1600 28 29 32 100 99 1 16 259 0 0 0 97 1 8 1700 29 28 32 100 100 2 14 258 .02 .02 0 97 1 8 1800 30 29 32 100 100 0 16 258 0 .02 0 97 1 8 1900 31 30 32 100 100 3 17 258 .02 .04 0 97 1 8 2000 31 31 32 100 100 -1 17 258 .03 .07 0 97 1 8 2100 31 32 32 100 100 1 15 258 .06 .13 0 97 1 8 2200 31 33 32 100 100 3 17 258 .03 .16 0 96 1 8 2300 31 33 33 100 100 2 13 258 .01 .17 0 96 1 9 0 31 33 33 100 100 1 15 258 0 .17 0 96 1 9 100 31 33 33 100 100 2 15 258 0 .17 0 96 1 9 200 31 33 33 100 100 -1 12 258 .01 .18 0 96 1 9 300 31 33 33 100 100 1 16 258 0 .18 0 96 1 9 400 31 34 33 100 100 2 14 258 .01 .19 0 96 1 9 500 30 34 34 100 100 1 13 258 0 .19 0 96 1 9 600 30 33 34 100 100 1 13 258 0 .19 0 95 1 9 700 30 33 34 100 100 2 13 258 0 .19 0 95 1 9 800 30 33 34 100 100 0 13 258 0 .19 0 95 1 9 900 30 34 34 100 100 1 13 259 0 .19 0 95 .19 Page 1DATA; ### Get the header. preg_match('%MM/DD.+?(?=-{2,})%s', $data, $matches); print_r($matches); ### Separate the metadata from the data; if you don't, ### the date "1-9-2007" will be picked up as data. ### The separator is the line of hyphens. $data_pieces = preg_split('/^-{2,}\r?$/m', $data); $data_area = array_pop($data_pieces); ### Get the rows. preg_match_all('%^[-.\d ]+\r?$%m', $data_area, $matches); print_r($matches); ### Last row. print_r(array_pop($matches[0]));?></pre>[/code] Quote Link to comment Share on other sites More sharing options...
drexnefex Posted January 12, 2007 Author Share Posted January 12, 2007 effigy - thanks for the quick reply and much thanks for the code.on my end your code works great. it's splitting out the header row from the body rows as it should but it's returning the entire set of rows rather than just the last complete row.i've added a curl variable and an array that looks at a series of files....any idea what is preventing this from returning just the last complete row in each table? im assuming it's the match pattern....[code]<pre><?php$nwac[0] = OSOALP; $nwac[1] = OSOSNO; $nwac[2] = OSOMTB; $nwac[3] = OSOSK9; $nwac[4] = OSOCMG; $nwac[5] = OSOMHM; $nwac[6] = OSOPVC; $nwac[7] = OSOWPS; for($counter = 0; $counter < 7; $counter += 1){$data = curl_init();// set URL and other appropriate optionscurl_setopt($data, CURLOPT_URL, "http://www.nwac.us/products/$nwac[$counter]");curl_setopt($data, CURLOPT_RETURNTRANSFER, true);curl_setopt($data, CURLOPT_TIMEOUT, 30);// grab URL$output = curl_exec($data);#curl_exec($data);curl_close($data);### Get the header. preg_match('%MM/DD.+?(?=-{2,})%s', $output, $matches); print_r($matches); ### Separate the metadata from the data; if you don't, ### the date "1-9-2007" will be picked up as data. ### The separator is the line of hyphens. $data_pieces = preg_split('/^-{2,}\r?$/m', $output); $data_area = array_pop($data_pieces); ### Get the rows. preg_match_all('%^[-.\d ]+\r?$%m', $data_area, $matches); print_r($matches); ### Last row. print_r(array_pop($matches[0]));} //Part of array above.?></pre>[/code] Quote Link to comment Share on other sites More sharing options...
effigy Posted January 12, 2007 Share Posted January 12, 2007 In a way, you have to get all the rows before being able to determine the last one. You can't get the last line--that's the page number. Simply comment out the print_r statement I have under "Get the rows," and you'll be able see the following content better, which is a print out of the last row. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.