Wuhtzu Posted March 1, 2008 Share Posted March 1, 2008 Hey I'm trying to grab several pieces of information from my routers system status page and I can't figure out the smartest way to match and store the information. The page is displayed below along with its markup. <html><head> <meta http-equiv='content-type' content='text/html;charset=iso-8859-1'> <title>Web Configurator</title> <SCRIPT src="General.js"></SCRIPT> </head> <body marginwidth="0" marginheight="0"> <table border="0"> <tr> <td width="500" colspan="3"> </td></tr><tr> <td colspan="3" class="header2"> System up Time:<b> 0:01:19</b> </td></tr><tr> <td colspan="3">CPU Load:<b> 0.95%</b></td></tr><tr> <td colspan="3"> </td></tr><tr> <td colspan="3" class="header2"> WAN Port Statistics:</td></tr><tr> <td colspan="3"> Link Status:<b> Up </b> </td></tr><tr> <td colspan="3">Upstream Speed:<b> 764 kbps</b></td></tr><tr> <td colspan="3">Downstream Speed:<b> 8059 kbps</b> </td></tr><tr> <td colspan="3"> <table border="1" cellspacing="0" cellpadding="1" align=left> <tr> <td class="TableTilte"> <div align=center> Node-Link</div></td><td class="TableTilte"> <div align=center> Status</div></td><td class="TableTilte"> <div align=center> TxPkts</div></td><td class="TableTilte"> <div align=center> RxPkts</div></td><td class="TableTilte"> <div align=center> Errors</div></td><td class="TableTilte"> <div align=center> Tx B/s</div></td><td class="TableTilte"> <div align=center> Rx B/s</div></td><td class="TableTilte"> <div align=center> Up Time</div></td></tr><tr> <td class="TableItem"> <div align=center> 1-PPPoA</div></td><td><div align=center> Up </div></td><td><div align=center> 148</div></td><td><div align=center> 156</div></td><td><div align=center> 0</div></td><td><div align=center> 342</div></td><td><div align=center> 122854</div></td><td><div align=center> 0:00:19</div></td></tr></table></td></tr><tr> <td colspan="3"> </td></tr> <tr> <td colspan="3">LAN Port Statistics:</td></tr><tr> <td colspan="3"> <table border="1" cellspacing="0" cellpadding="1" align=left> <tr> <td class="TableTilte"> <div align=center> Interface:</div></td><td class="TableTilte"> <div align=center> Status</div></td><td class="TableTilte"> <div align=center> TxPkts</div></td><td class="TableTilte"> <div align=center> RxPkts</div></td><td class="TableTilte"> <div align=center> Collisions</div></td></tr><tr> <td class="TableItem"> <div align=center> Ethernet</div></td><td><div align=center>100M/Full Duplex</div></td><td><div align=center>429</div></td><td><div align=center> 451</div></td><td><div align=center> 0</div></td></tr> </table></td></tr><tr> <td colspan="3"> </td></tr></table> </body></html> I need to get the "value" of the following entries: System up Time Link Status Upstream Speed Downstream Speed Status Up Time I have all the above markup stored in a variable (obtained through curl) and now I need to extract the information from it and in the end store it in an array like this: Array ( [system_up_time] => 24:02:30 [link_status] => up [upstream speed] => 764 [ect] => ect ) I have no problem writing a regular expression which matches each piece of information separately but that way round I get a huge amount of preg_match() calls. Is that the way to do it, match each "piece of information" with it's own regex and a "dedicated" preg_match call? Or is there a smarter way round? I'm not looking for anyone to write the script, just ideas on how to structure the script / the information extraction. Any input will be much appreciated. Wuhtzu Quote Link to comment Share on other sites More sharing options...
dsaba Posted March 1, 2008 Share Posted March 1, 2008 If you're markup remains static in its structure, certain things will always follow others so an easy way would be to use .*?first item.*?seconditem to make sure the . does not eat up past the item, you will have to make the item match very specific and take a substring out of it ie: haystack: hellolobye u want to match last 'lo' the one before 'bye' instead of .*?lo you could say: .*?(lo)bye //because you know 'bye' will always come after the last 'lo' in the static structure Quote Link to comment Share on other sites More sharing options...
Wuhtzu Posted March 2, 2008 Author Share Posted March 2, 2008 Thank you for your excellent solution dsaba - your method works like a charm and is more neat than 10 calls to preg_match() <?php // Array containing the sub patterns $regex_array = array('system_up_time' => '([0-9]{1,}:[0-9]{2}:[0-9]{2})', 'cpu_usage' => '([0-9]{1,3}\.[0-9]{1,})', 'link_status' => '(Down|Initializing|Up)', 'upstream_speed' => '([0-9]{1,}) kbps', 'downstream_speed' => '([0-9]{1,}) kbps', 'status' => '<td><div align=center> (N\/A|Idle|LCP Up|Up)', 'txpkts' => '([0-9]{1,})', 'rxpkts' => '([0-9]{1,})', 'errors' => '([0-9]{1,})', 'txbs' => '([0-9]{1,})', 'rxbs' => '([0-9]{1,})', 'up_time' => '([0-9]{1,}:[0-9]{2,}:[0-9]{2,})', 'lan_txpkts' => '<div align=center>([0-9]{1,})<', 'lan_rxpkts' => '([0-9]{1,})', 'lan_collisions' => '([0-9]{1,})' ); // Array containing all the fields (name of a pieces of data) $fields_array = array_keys($regex_array); // Construct the regular expression without delimiters foreach($regex_array as $key => $subpattern) { $regex .= '.*?' . $subpattern; } // Extract information from $sysstatistics_adsl preg_match('/' . $regex . '/s',$sysstatistics_adsl,$tmp_matches); ?> Quote Link to comment Share on other sites More sharing options...
dsaba Posted March 2, 2008 Share Posted March 2, 2008 Instead of returning numeric keys with preg_match_all() you can custom keys per matched subgroup, you can incorporate this into your function ie: (?P<customKey>subgroup regex) See this post: http://www.phpfreaks.com/forums/index.php/topic,185238.msg829648.html#msg829648 This way you can get your Array ( [system_up_time] => 24:02:30 [link_status] => up [upstream speed] => 764 [ect] => ect ) directly from the matches array in preg_match_all() Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.