ravi181229 Posted December 5, 2008 Share Posted December 5, 2008 Hi, I would like to parse http://rivals.yahoo.com/ncaa/baseball/collegebroadcast html page and get all events' info for particular date: for example Sat, Dec 6 (get this date and all the events info for this date) San Francisco vs. Long Beach St. - Men's Basketball 10:00 pm EST http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10487697 Mon, Dec 8(get this date and all the events' info for this date) and so on. need help. Thanks Quote Link to comment https://forums.phpfreaks.com/topic/135709-parsing-html-page/ Share on other sites More sharing options...
premiso Posted December 5, 2008 Share Posted December 5, 2008 Post an example of the html you are parsing ( a short snippet) and I will attempt to help you. (I do not like going to external urls and finding the area myself, just plain too lazy to do that). And also post how you want it displayed. Quote Link to comment https://forums.phpfreaks.com/topic/135709-parsing-html-page/#findComment-707096 Share on other sites More sharing options...
ravi181229 Posted December 5, 2008 Author Share Posted December 5, 2008 from the following code: <table border="0" cellspacing="0" cellpadding="2" class="ysptblclbg4" width="100%" style="border-collapse: collapse" bordercolor="#111111"><tr><td class="yspdetailttl" valign="bottom" height="12"> Sat, Dec 6</td></tr> <tr><td> <table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%"> <tr> <td nowrap> • San Francisco <span class='yspscores'><a href="javascript:void(window.open('http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10487697','playerWindow','width=793,height=608,scrollbars=no'));" ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/free_video_box.gif" width="16" height="12" alt="Free Video"></a> </span> vs. Long Beach St. <span class='yspscores'></span> - Men's Basketball </td> <td nowrap> </td> <td width="100%" nowrap class="yspscores"> 10:00 pm EST </td> </tr> </table></td></tr> </table><table border="0" cellspacing="0" cellpadding="2" class="ysptblclbg4" width="100%" style="border-collapse: collapse" bordercolor="#111111"><tr><td class="yspdetailttl" valign="bottom" height="12"> Mon, Dec 8</td></tr> <tr><td> <table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%"> <tr> <td nowrap> • Lehigh <span class='yspscores'><a href="javascript:void(window.open('http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10603183','playerWindow','width=793,height=608,scrollbars=no'));" ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/pay_video_box.gif" width="16" height="12" alt="Subscription Video"></a> </span> vs. Albany <span class='yspscores'></span> - Men's Basketball </td> <td nowrap> </td> <td width="100%" nowrap class="yspscores"> 6:30 pm EST </td> </tr> </table></td></tr> </table> </table> I would like to display : Sat, Dec 6 • San Francisco Free Video vs. Long Beach St. - Men's Basketball 10:00 pm EST Mon, Dec 8 • Lehigh Subscription Video vs. Albany - Men's Basketball 6:30 pm EST Quote Link to comment https://forums.phpfreaks.com/topic/135709-parsing-html-page/#findComment-707101 Share on other sites More sharing options...
ravi181229 Posted December 5, 2008 Author Share Posted December 5, 2008 I would also like to have the link (sorry, I missed it in previous post): Sat, Dec 6 • San Francisco Free Video vs. Long Beach St. - Men's Basketball 10:00 pm EST http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10487697 Mon, Dec 8 • Lehigh Subscription Video vs. Albany - Men's Basketball 6:30 pm EST http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10603183 Quote Link to comment https://forums.phpfreaks.com/topic/135709-parsing-html-page/#findComment-707128 Share on other sites More sharing options...
premiso Posted December 5, 2008 Share Posted December 5, 2008 <?php $string = '<table border="0" cellspacing="0" cellpadding="2" class="ysptblclbg4" width="100%" style="border-collapse: collapse" bordercolor="#111111"><tr><td class="yspdetailttl" valign="bottom" height="12"> Sat, Dec 6</td></tr> <tr><td> <table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%"> <tr> <td nowrap> San Francisco <span class=\'yspscores\'><a href="javascript:void(window.open(\'http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10487697\',\'playerWindow\',\'width=793,height=608,scrollbars=no\'));" ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/free_video_box.gif" width="16" height="12" alt="Free Video"></a> </span> vs. Long Beach St. <span class=\'yspscores\'></span> - Men\'s Basketball </td> <td nowrap> </td> <td width="100%" nowrap class="yspscores"> 10:00 pm EST </td> </tr> </table></td></tr> </table><table border="0" cellspacing="0" cellpadding="2" class="ysptblclbg4" width="100%" style="border-collapse: collapse" bordercolor="#111111"><tr><td class="yspdetailttl" valign="bottom" height="12"> Mon, Dec 8</td></tr> <tr><td> <table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%"> <tr> <td nowrap> Lehigh <span class=\'yspscores\'><a href="javascript:void(window.open(\'http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10603183\',\'playerWindow\',\'width=793,height=608,scrollbars=no\'));" ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/pay_video_box.gif" width="16" height="12" alt="Subscription Video"></a> </span> vs. Albany <span class=\'yspscores\'></span> - Men\'s Basketball </td> <td nowrap> </td> <td width="100%" nowrap class="yspscores"> 6:30 pm EST </td> </tr> </table></td></tr> </table> </table>'; preg_match_all("~<td class=\"yspdetailttl\" valign=\"bottom\" height=\"12\">(.*)</td>~",$string, $matches); foreach ($matches[1] as $match) { $dates[] = $match; } //print_r($dates); preg_match_all("~<td nowrap> (.*)</td>~",$string, $matches); foreach ($matches[1] as $match) { $vs[] = " " . $match; } //print_r($vs); preg_match_all("~<td width=\"100%\" nowrap class=\"yspscores\">(.*)</td>~",$string, $matches); foreach ($matches[1] as $match) { $time[] = $match; } //print_r($time); $count = count($dates); for ($i=0; $i<$count; $i++) { echo $date[$i] . "<br />" . $vs[$i] . "\t\t" . $time[$i] . "<br /><br />"; } ?> I will let you figure out the displaying of them. Not sure if that is the most efficient way, but it works. Quote Link to comment https://forums.phpfreaks.com/topic/135709-parsing-html-page/#findComment-707134 Share on other sites More sharing options...
ravi181229 Posted December 5, 2008 Author Share Posted December 5, 2008 this code works perfectly but it does not display all the events for a particular date. for examle(there can be many events under particular date): Fri, Dec 5 • Stony Brook vs. Lehigh Subscription Audio 6:30 pm EST • Hope Free Audio vs. Carthage 6:40 pm EST • Pennsylvania Subscription Audio vs. Navy Subscription Audio 7:00 pm EST • Iowa Subscription Audio vs. Bryant 7:30 pm EST • Texas A&M vs. Arizona Free Audio 8:30 pm EST Sat, Dec 6 • Davidson vs. N.C. State Subscription Audio 12:00 pm EST • Holy Cross vs. W. Michigan Subscription Audio 12:30 pm EST • Indiana Subscription Audio vs. Gonzaga 12:30 pm EST • Iowa St. Subscription Audio vs. Oregon St. Subscription Audio 1:30 pm EST • Kansas vs. Jackson St. Free Audio 2:00 pm EST HTML code : <table border="0" cellspacing="0" cellpadding="2" class="ysprow1" width="100%" style="border-collapse: collapse" bordercolor="#111111"><tr><td class="yspdetailttl" valign="bottom" height="12"> Fri, Dec 5</td></tr> <tr><td> <table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%"> <tr> <td nowrap> • Hope <span class='yspscores'><a href="javascript:void(window.open('http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10092771','playerWindow','width=793,height=608,scrollbars=no'));" ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/free_audio_box.gif" width="16" height="12" alt="Free Audio"></a> </span> vs. Carthage <span class='yspscores'></span></td> <td nowrap> </td> <td width="100%" nowrap class="yspscores"> 5:40 pm EST </td> </tr> </table></td></tr> <tr><td> <table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%"> <tr> <td nowrap> • Stony Brook <span class='yspscores'></span> vs. Lehigh <span class='yspscores'><a href="javascript:void(window.open('http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10603182','playerWindow','width=793,height=608,scrollbars=no'));" ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/pay_audio_box.gif" width="16" height="12" alt="Subscription Audio"></a> </span></td> <td nowrap> </td> <td width="100%" nowrap class="yspscores"> 6:30 pm EST </td> </tr> </table></td></tr> <tr><td> <table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%"> <tr> <td nowrap> • Pennsylvania <span class='yspscores'><a href="javascript:void(window.open('http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10838785','playerWindow','width=793,height=608,scrollbars=no'));" ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/pay_audio_box.gif" width="16" height="12" alt="Subscription Audio"></a> </span> vs. Navy <span class='yspscores'><a href="javascript:void(window.open('http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10689020','playerWindow','width=793,height=608,scrollbars=no'));" ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/pay_audio_box.gif" width="16" height="12" alt="Subscription Audio"></a> </span></td> <td nowrap> </td> <td width="100%" nowrap class="yspscores"> 7:00 pm EST </td> </tr> </table></td></tr> <tr><td> <table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%"> <tr> <td nowrap> • Iowa <span class='yspscores'><a href="javascript:void(window.open('http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10219999','playerWindow','width=793,height=608,scrollbars=no'));" ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/pay_audio_box.gif" width="16" height="12" alt="Subscription Audio"></a> </span> vs. Bryant <span class='yspscores'></span></td> <td nowrap> </td> <td width="100%" nowrap class="yspscores"> 7:30 pm EST </td> </tr> </table></td></tr> <tr><td> <table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%"> <tr> <td nowrap> • Texas A&M <span class='yspscores'></span> vs. Arizona <span class='yspscores'><a href="javascript:void(window.open('http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10590979','playerWindow','width=793,height=608,scrollbars=no'));" ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/free_audio_box.gif" width="16" height="12" alt="Free Audio"></a> </span></td> <td nowrap> </td> <td width="100%" nowrap class="yspscores"> 8:30 pm EST </td> </tr> </table></td></tr> </table><table border="0" cellspacing="0" cellpadding="2" class="ysprow1" width="100%" style="border-collapse: collapse" bordercolor="#111111"><tr><td class="yspdetailttl" valign="bottom" height="12"> Sat, Dec 6</td></tr> <tr><td> <table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%"> <tr> <td nowrap> • Davidson <span class='yspscores'></span> vs. N.C. State <span class='yspscores'><a href="javascript:void(window.open('http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10663073','playerWindow','width=793,height=608,scrollbars=no'));" ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/pay_audio_box.gif" width="16" height="12" alt="Subscription Audio"></a> </span></td> <td nowrap> </td> <td width="100%" nowrap class="yspscores"> 12:00 pm EST </td> </tr> </table></td></tr> <tr><td> <table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%"> <tr> <td nowrap> • Holy Cross <span class='yspscores'></span> vs. W. Michigan <span class='yspscores'><a href="javascript:void(window.open('http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10592143','playerWindow','width=793,height=608,scrollbars=no'));" ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/pay_audio_box.gif" width="16" height="12" alt="Subscription Audio"></a> </span></td> <td nowrap> </td> <td width="100%" nowrap class="yspscores"> 12:30 pm EST </td> </tr> </table></td></tr> <tr><td> <table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%"> <tr> <td nowrap> • Indiana <span class='yspscores'><a href="javascript:void(window.open('http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10331117','playerWindow','width=793,height=608,scrollbars=no'));" ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/pay_audio_box.gif" width="16" height="12" alt="Subscription Audio"></a> </span> vs. Gonzaga <span class='yspscores'></span></td> <td nowrap> </td> <td width="100%" nowrap class="yspscores"> 12:30 pm EST </td> </tr> </table></td></tr> <tr><td> <table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%"> <tr> <td nowrap> • Iowa St. <span class='yspscores'><a href="javascript:void(window.open('http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10345634','playerWindow','width=793,height=608,scrollbars=no'));" ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/pay_audio_box.gif" width="16" height="12" alt="Subscription Audio"></a> </span> vs. Oregon St. <span class='yspscores'><a href="javascript:void(window.open('http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10580288','playerWindow','width=793,height=608,scrollbars=no'));" ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/pay_audio_box.gif" width="16" height="12" alt="Subscription Audio"></a> </span></td> <td nowrap> </td> <td width="100%" nowrap class="yspscores"> 1:30 pm EST </td> </tr> </table></td></tr> <tr><td> <table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%"> <tr> <td nowrap> • Providence <span class='yspscores'></span> vs. Rhode Island <span class='yspscores'><a href="javascript:void(window.open('http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10487157','playerWindow','width=793,height=608,scrollbars=no'));" ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/free_audio_box.gif" width="16" height="12" alt="Free Audio"></a> </span></td> <td nowrap> </td> <td width="100%" nowrap class="yspscores"> 2:00 pm EST </td> </tr> </table></td></tr> <tr><td> <table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%"> <tr> <td nowrap> • Kansas <span class='yspscores'></span> vs. Jackson St. <span class='yspscores'><a href="javascript:void(window.open('http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10530559','playerWindow','width=793,height=608,scrollbars=no'));" ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/free_audio_box.gif" width="16" height="12" alt="Free Audio"></a> </span></td> <td nowrap> </td> <td width="100%" nowrap class="yspscores"> 2:00 pm EST </td> </tr> </table></td></tr> Quote Link to comment https://forums.phpfreaks.com/topic/135709-parsing-html-page/#findComment-707171 Share on other sites More sharing options...
curtis Posted December 6, 2008 Share Posted December 6, 2008 Hi, I would like to parse http://rivals.yahoo.com/ncaa/baseball/collegebroadcast html page and get all events' info for particular date: <snip> If you want to parse HTML, use the DOM, not regex. PHP DOM Manual DOM XPath docs Quote Link to comment https://forums.phpfreaks.com/topic/135709-parsing-html-page/#findComment-707592 Share on other sites More sharing options...
premiso Posted December 6, 2008 Share Posted December 6, 2008 The DOM works great as long as the page obeys the standards. I tried to do this with the DOM but since yahoo does not have the page properly formatted it does not populate the DOM in PHP. I could be doing it wrong, but if you have a working example using that page I would love to see Curtis, thanks! Quote Link to comment https://forums.phpfreaks.com/topic/135709-parsing-html-page/#findComment-708030 Share on other sites More sharing options...
curtis Posted December 7, 2008 Share Posted December 7, 2008 The DOM works great as long as the page obeys the standards. I tried to do this with the DOM but since yahoo does not have the page properly formatted it does not populate the DOM in PHP. I could be doing it wrong, but if you have a working example using that page I would love to see Curtis, thanks! No, it was my mistake, you're absolutely right. The only times I've tried this, I happened to be working with standards conforming documents. Obviously, that's a rare luxury with (X)HTML. Sorry about that. There are a couple possibilities here, in order to prevent reinventing the wheel. One is to use the PEAR package, HTML_Common2, which utilizes PHP 5 OOP. The PHP 4 version is available, but not recommended. I briefly looked at some of the class members, and they seem to be using regex as well. At the very least, when writing complicated regexes, I prefer to use the /x modifier to allow whitespace and comments, because they are much easier to maintain that way. Also, there's a PECL extension called html_parse, which seems a better solution for handling HTML, but at the cost of some portability. Quote Link to comment https://forums.phpfreaks.com/topic/135709-parsing-html-page/#findComment-708193 Share on other sites More sharing options...
ravi181229 Posted December 10, 2008 Author Share Posted December 10, 2008 ok...can we write the regex to fetch the data in red color below: <table border="0" cellspacing="0" cellpadding="2" class="ysprow1" width="100%" style="border-collapse: collapse" bordercolor="#111111"><tr><td class="yspdetailttl" valign="bottom" height="12"> Wed, Dec 10</td></tr> <tr><td> <table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%"> <tr> <td nowrap> • Indiana <span class='yspscores'><a href="javascript:void(window.open('http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10331118','playerWindow','width=793,height=608,scrollbars=no'));" ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/pay_audio_box.gif" width="16" height="12" alt="Subscription Audio"></a> </span> vs. TCU <span class='yspscores'><a href="javascript:void(window.open('http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10644764','playerWindow','width=793,height=608,scrollbars=no'));" ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/free_audio_box.gif" width="16" height="12" alt="Free Audio"></a> </span></td> <td nowrap> </td> <td width="100%" nowrap class="yspscores"> 6:00 pm EST </td> </tr> </table></td></tr> <tr><td> <table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%"> <tr> <td nowrap> • Long Island <span class='yspscores'><a href="javascript:void(window.open('http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10730677','playerWindow','width=793,height=608,scrollbars=no'));" ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/free_audio_box.gif" width="16" height="12" alt="Free Audio"></a> </span> vs. Iona <span class='yspscores'></span></td> <td nowrap> </td> <td width="100%" nowrap class="yspscores"> 7:00 pm EST </td> </tr> </table></td></tr> </table> I was trying with: //$html contains above code preg_match_all("~<tr><td class=\"yspdetailttl\" valign=\"bottom\" height=\"12\"> Wed, Dec 10</td></tr><tr><td>(.*)</td></tr>~", $html, $matches); Quote Link to comment https://forums.phpfreaks.com/topic/135709-parsing-html-page/#findComment-711470 Share on other sites More sharing options...
ravi181229 Posted December 10, 2008 Author Share Posted December 10, 2008 I could fetch the data as follows: preg_match('~<tr><td class="yspdetailttl" valign="bottom" height="12"> Wed, Dec 10</td></tr>(.*)</table>~s',$html,$matches); but still have problem to get all the events' information under particular date. Quote Link to comment https://forums.phpfreaks.com/topic/135709-parsing-html-page/#findComment-711720 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.