Jump to content

Recommended Posts

Hi,

 

I would like to parse  http://rivals.yahoo.com/ncaa/baseball/collegebroadcast html page and

get all events' info for particular date:

 

for example

Sat, Dec 6 (get this date and all the events info for this date)

San Francisco vs. Long Beach St.  - Men's Basketball   10:00 pm EST

http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10487697

 

Mon, Dec 8(get this date and all the events' info for this date) and so on.

 

need help.

 

Thanks

 

 

Link to comment
https://forums.phpfreaks.com/topic/135709-parsing-html-page/
Share on other sites

from the following code:

 

<table border="0" cellspacing="0" cellpadding="2" class="ysptblclbg4" width="100%" style="border-collapse: collapse" bordercolor="#111111"><tr><td class="yspdetailttl" valign="bottom" height="12"> Sat, Dec 6</td></tr>

<tr><td>

<table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%">

<tr>

<td nowrap> &#149; San Francisco <span class='yspscores'><a href="javascript:void(window.open('http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10487697','playerWindow','width=793,height=608,scrollbars=no'));"  ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/free_video_box.gif" width="16" height="12" alt="Free Video"></a> </span> vs. Long Beach St. <span class='yspscores'></span> - Men's Basketball </td>

<td nowrap>  </td>

<td width="100%" nowrap class="yspscores">  10:00 pm EST  </td>

 

</tr>

</table></td></tr>

</table><table border="0" cellspacing="0" cellpadding="2" class="ysptblclbg4" width="100%" style="border-collapse: collapse" bordercolor="#111111"><tr><td class="yspdetailttl" valign="bottom" height="12"> Mon, Dec 8</td></tr>

<tr><td>

<table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%">

<tr>

<td nowrap> &#149; Lehigh <span class='yspscores'><a href="javascript:void(window.open('http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10603183','playerWindow','width=793,height=608,scrollbars=no'));"  ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/pay_video_box.gif" width="16" height="12" alt="Subscription Video"></a> </span> vs. Albany <span class='yspscores'></span> - Men's Basketball </td>

<td nowrap>  </td>

<td width="100%" nowrap class="yspscores">  6:30 pm EST  </td>

 

</tr>

</table></td></tr>

</table>

</table>

 

I would like to display :

 

Sat, Dec 6

• San Francisco Free Video  vs. Long Beach St.  - Men's Basketball   10:00 pm EST

Mon, Dec 8

• Lehigh Subscription Video  vs. Albany  - Men's Basketball   6:30 pm EST

 

 

 

Link to comment
https://forums.phpfreaks.com/topic/135709-parsing-html-page/#findComment-707101
Share on other sites

I would also like to have the link (sorry, I missed it in previous post):

 

Sat, Dec 6

• San Francisco Free Video  vs. Long Beach St.  - Men's Basketball          10:00 pm EST

    http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10487697

Mon, Dec 8

• Lehigh Subscription Video  vs. Albany  - Men's Basketball          6:30 pm EST

    http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10603183

Link to comment
https://forums.phpfreaks.com/topic/135709-parsing-html-page/#findComment-707128
Share on other sites

<?php
$string = '<table border="0" cellspacing="0" cellpadding="2" class="ysptblclbg4" width="100%" style="border-collapse: collapse" bordercolor="#111111"><tr><td class="yspdetailttl" valign="bottom" height="12"> Sat, Dec 6</td></tr>
<tr><td>
<table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%">
<tr>
<td nowrap>  San Francisco <span class=\'yspscores\'><a href="javascript:void(window.open(\'http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10487697\',\'playerWindow\',\'width=793,height=608,scrollbars=no\'));"  ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/free_video_box.gif" width="16" height="12" alt="Free Video"></a> </span> vs. Long Beach St. <span class=\'yspscores\'></span> - Men\'s Basketball </td>
<td nowrap>  </td>
<td width="100%" nowrap class="yspscores">  10:00 pm EST  </td>

</tr>
</table></td></tr>
</table><table border="0" cellspacing="0" cellpadding="2" class="ysptblclbg4" width="100%" style="border-collapse: collapse" bordercolor="#111111"><tr><td class="yspdetailttl" valign="bottom" height="12"> Mon, Dec 8</td></tr>
<tr><td>
<table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%">
<tr>
<td nowrap>  Lehigh <span class=\'yspscores\'><a href="javascript:void(window.open(\'http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10603183\',\'playerWindow\',\'width=793,height=608,scrollbars=no\'));"  ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/pay_video_box.gif" width="16" height="12" alt="Subscription Video"></a> </span> vs. Albany <span class=\'yspscores\'></span> - Men\'s Basketball </td>
<td nowrap>  </td>
<td width="100%" nowrap class="yspscores">  6:30 pm EST  </td>

</tr>
</table></td></tr>
</table>
</table>';

preg_match_all("~<td class=\"yspdetailttl\" valign=\"bottom\" height=\"12\">(.*)</td>~",$string, $matches);
foreach ($matches[1] as $match) {
$dates[] = $match;
}

//print_r($dates);

preg_match_all("~<td nowrap> (.*)</td>~",$string, $matches);
foreach ($matches[1] as $match) {
$vs[] = " " . $match;
}

//print_r($vs);

preg_match_all("~<td width=\"100%\" nowrap class=\"yspscores\">(.*)</td>~",$string, $matches);
foreach ($matches[1] as $match) {
$time[] = $match;
}

//print_r($time);
$count = count($dates);

for ($i=0; $i<$count; $i++) {
echo $date[$i] . "<br />" . $vs[$i] . "\t\t" . $time[$i] . "<br /><br />";
}

?>

 

I will let you figure out the displaying of them. Not sure if that is the most efficient way, but it works.

Link to comment
https://forums.phpfreaks.com/topic/135709-parsing-html-page/#findComment-707134
Share on other sites

this code works perfectly but it does not display all the events for a particular date.

for examle(there can be many events under particular date):

 

Fri, Dec 5

• Stony Brook  vs. Lehigh Subscription Audio    6:30 pm EST

• Hope Free Audio  vs. Carthage    6:40 pm EST

• Pennsylvania Subscription Audio  vs. Navy Subscription Audio    7:00 pm EST

• Iowa Subscription Audio  vs. Bryant    7:30 pm EST

• Texas A&M  vs. Arizona Free Audio    8:30 pm EST

Sat, Dec 6

• Davidson  vs. N.C. State Subscription Audio    12:00 pm EST

• Holy Cross  vs. W. Michigan Subscription Audio    12:30 pm EST

• Indiana Subscription Audio  vs. Gonzaga    12:30 pm EST

• Iowa St. Subscription Audio  vs. Oregon St. Subscription Audio    1:30 pm EST

• Kansas  vs. Jackson St. Free Audio    2:00 pm EST

 

 

HTML code :

 

<table border="0" cellspacing="0" cellpadding="2" class="ysprow1" width="100%" style="border-collapse: collapse" bordercolor="#111111"><tr><td class="yspdetailttl" valign="bottom" height="12"> Fri, Dec 5</td></tr>

<tr><td>

<table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%">

<tr>

<td nowrap> &#149; Hope <span class='yspscores'><a href="javascript:void(window.open('http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10092771','playerWindow','width=793,height=608,scrollbars=no'));"  ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/free_audio_box.gif" width="16" height="12" alt="Free Audio"></a> </span> vs. Carthage <span class='yspscores'></span></td>

 

<td nowrap>  </td>

<td width="100%" nowrap class="yspscores">  5:40 pm EST  </td>

</tr>

</table></td></tr>

 

<tr><td>

<table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%">

<tr>

<td nowrap> &#149; Stony Brook <span class='yspscores'></span> vs. Lehigh <span class='yspscores'><a href="javascript:void(window.open('http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10603182','playerWindow','width=793,height=608,scrollbars=no'));"  ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/pay_audio_box.gif" width="16" height="12" alt="Subscription Audio"></a> </span></td>

<td nowrap>  </td>

<td width="100%" nowrap class="yspscores">  6:30 pm EST  </td>

 

</tr>

</table></td></tr>

 

<tr><td>

<table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%">

<tr>

<td nowrap> &#149; Pennsylvania <span class='yspscores'><a href="javascript:void(window.open('http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10838785','playerWindow','width=793,height=608,scrollbars=no'));"  ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/pay_audio_box.gif" width="16" height="12" alt="Subscription Audio"></a> </span> vs. Navy <span class='yspscores'><a href="javascript:void(window.open('http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10689020','playerWindow','width=793,height=608,scrollbars=no'));"  ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/pay_audio_box.gif" width="16" height="12" alt="Subscription Audio"></a> </span></td>

<td nowrap>  </td>

<td width="100%" nowrap class="yspscores">  7:00 pm EST  </td>

</tr>

</table></td></tr>

 

<tr><td>

<table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%">

<tr>

<td nowrap> &#149; Iowa <span class='yspscores'><a href="javascript:void(window.open('http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10219999','playerWindow','width=793,height=608,scrollbars=no'));"  ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/pay_audio_box.gif" width="16" height="12" alt="Subscription Audio"></a> </span> vs. Bryant <span class='yspscores'></span></td>

<td nowrap>  </td>

<td width="100%" nowrap class="yspscores">  7:30 pm EST  </td>

</tr>

</table></td></tr>

 

<tr><td>

<table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%">

<tr>

 

<td nowrap> &#149; Texas A&M <span class='yspscores'></span> vs. Arizona <span class='yspscores'><a href="javascript:void(window.open('http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10590979','playerWindow','width=793,height=608,scrollbars=no'));"  ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/free_audio_box.gif" width="16" height="12" alt="Free Audio"></a> </span></td>

<td nowrap>  </td>

<td width="100%" nowrap class="yspscores">  8:30 pm EST  </td>

</tr>

</table></td></tr>

</table><table border="0" cellspacing="0" cellpadding="2" class="ysprow1" width="100%" style="border-collapse: collapse" bordercolor="#111111"><tr><td class="yspdetailttl" valign="bottom" height="12"> Sat, Dec 6</td></tr>

<tr><td>

<table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%">

<tr>

<td nowrap> &#149; Davidson <span class='yspscores'></span> vs. N.C. State <span class='yspscores'><a href="javascript:void(window.open('http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10663073','playerWindow','width=793,height=608,scrollbars=no'));"  ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/pay_audio_box.gif" width="16" height="12" alt="Subscription Audio"></a> </span></td>

 

<td nowrap>  </td>

<td width="100%" nowrap class="yspscores">  12:00 pm EST  </td>

</tr>

</table></td></tr>

 

<tr><td>

<table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%">

<tr>

<td nowrap> &#149; Holy Cross <span class='yspscores'></span> vs. W. Michigan <span class='yspscores'><a href="javascript:void(window.open('http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10592143','playerWindow','width=793,height=608,scrollbars=no'));"  ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/pay_audio_box.gif" width="16" height="12" alt="Subscription Audio"></a> </span></td>

<td nowrap>  </td>

<td width="100%" nowrap class="yspscores">  12:30 pm EST  </td>

 

</tr>

</table></td></tr>

 

<tr><td>

<table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%">

<tr>

<td nowrap> &#149; Indiana <span class='yspscores'><a href="javascript:void(window.open('http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10331117','playerWindow','width=793,height=608,scrollbars=no'));"  ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/pay_audio_box.gif" width="16" height="12" alt="Subscription Audio"></a> </span> vs. Gonzaga <span class='yspscores'></span></td>

<td nowrap>  </td>

<td width="100%" nowrap class="yspscores">  12:30 pm EST  </td>

</tr>

</table></td></tr>

 

<tr><td>

<table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%">

<tr>

<td nowrap> &#149; Iowa St. <span class='yspscores'><a href="javascript:void(window.open('http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10345634','playerWindow','width=793,height=608,scrollbars=no'));"  ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/pay_audio_box.gif" width="16" height="12" alt="Subscription Audio"></a> </span> vs. Oregon St. <span class='yspscores'><a href="javascript:void(window.open('http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10580288','playerWindow','width=793,height=608,scrollbars=no'));"  ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/pay_audio_box.gif" width="16" height="12" alt="Subscription Audio"></a> </span></td>

<td nowrap>  </td>

<td width="100%" nowrap class="yspscores">  1:30 pm EST  </td>

</tr>

</table></td></tr>

 

<tr><td>

<table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%">

<tr>

 

<td nowrap> &#149; Providence <span class='yspscores'></span> vs. Rhode Island <span class='yspscores'><a href="javascript:void(window.open('http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10487157','playerWindow','width=793,height=608,scrollbars=no'));"  ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/free_audio_box.gif" width="16" height="12" alt="Free Audio"></a> </span></td>

<td nowrap>  </td>

<td width="100%" nowrap class="yspscores">  2:00 pm EST  </td>

</tr>

</table></td></tr>

 

<tr><td>

<table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%">

<tr>

<td nowrap> &#149; Kansas <span class='yspscores'></span> vs. Jackson St. <span class='yspscores'><a href="javascript:void(window.open('http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10530559','playerWindow','width=793,height=608,scrollbars=no'));"  ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/free_audio_box.gif" width="16" height="12" alt="Free Audio"></a> </span></td>

 

<td nowrap>  </td>

<td width="100%" nowrap class="yspscores">  2:00 pm EST  </td>

</tr>

</table></td></tr>

 

 

Link to comment
https://forums.phpfreaks.com/topic/135709-parsing-html-page/#findComment-707171
Share on other sites

The DOM works great as long as the page obeys the standards. I tried to do this with the DOM but since yahoo does not have the page properly formatted it does not populate the DOM in PHP.

 

I could be doing it wrong, but if you have a working example using that page I would love to see Curtis, thanks!

Link to comment
https://forums.phpfreaks.com/topic/135709-parsing-html-page/#findComment-708030
Share on other sites

The DOM works great as long as the page obeys the standards. I tried to do this with the DOM but since yahoo does not have the page properly formatted it does not populate the DOM in PHP.

 

I could be doing it wrong, but if you have a working example using that page I would love to see Curtis, thanks!

No, it was my mistake, you're absolutely right. The only times I've tried this, I happened to be working with standards conforming documents. Obviously, that's a rare luxury with (X)HTML. Sorry about that.

 

There are a couple possibilities here, in order to prevent reinventing the wheel. One is to use the PEAR package, HTML_Common2, which utilizes PHP 5 OOP. The PHP 4 version is available, but not recommended. I briefly looked at some of the class members, and they seem to be using regex as well.

 

At the very least, when writing complicated regexes, I prefer to use the /x modifier to allow whitespace and comments, because they are much easier to maintain that way.

 

Also, there's a PECL extension called html_parse, which seems a better solution for handling HTML, but at the cost of some portability.

Link to comment
https://forums.phpfreaks.com/topic/135709-parsing-html-page/#findComment-708193
Share on other sites

ok...can we write the regex to fetch the data in red color below:

 

<table border="0" cellspacing="0" cellpadding="2" class="ysprow1" width="100%" style="border-collapse: collapse" bordercolor="#111111"><tr><td class="yspdetailttl" valign="bottom" height="12"> Wed, Dec 10</td></tr>

<tr><td>

<table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%">

<tr>

<td nowrap> &#149; Indiana <span class='yspscores'><a href="javascript:void(window.open('http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10331118','playerWindow','width=793,height=608,scrollbars=no'));"  ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/pay_audio_box.gif" width="16" height="12" alt="Subscription Audio"></a> </span> vs. TCU <span class='yspscores'><a href="javascript:void(window.open('http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10644764','playerWindow','width=793,height=608,scrollbars=no'));"  ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/free_audio_box.gif" width="16" height="12" alt="Free Audio"></a> </span></td>

 

<td nowrap>  </td>

<td width="100%" nowrap class="yspscores">  6:00 pm EST  </td>

</tr>

</table></td></tr>

 

<tr><td>

<table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%">

<tr>

<td nowrap> &#149; Long Island <span class='yspscores'><a href="javascript:void(window.open('http://cosmos.bcst.yahoo.com/up/collegetest/?cl=10730677','playerWindow','width=793,height=608,scrollbars=no'));"  ><img border="0" src="http://l.yimg.com/a/i/us/sp/ed/ic/free_audio_box.gif" width="16" height="12" alt="Free Audio"></a> </span> vs. Iona <span class='yspscores'></span></td>

<td nowrap>  </td>

<td width="100%" nowrap class="yspscores">  7:00 pm EST  </td>

 

</tr>

</table></td></tr>

</table>

 

I was trying with:

//$html contains above code

preg_match_all("~<tr><td class=\"yspdetailttl\" valign=\"bottom\" height=\"12\"> Wed, Dec 10</td></tr><tr><td>(.*)</td></tr>~", $html, $matches);

 

Link to comment
https://forums.phpfreaks.com/topic/135709-parsing-html-page/#findComment-711470
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.