Jump to content

web scraping


runnerjp
 Share

Recommended Posts

Hello,

 

Currently my webscraper signs into the site and pulls all the html -> perfect.

 

What I need to do is to loop only specific information (horses that ran)

 

here is my current php code

<?
$url = 'site';
$postdata = array('username' => "username",
		          'password' => "password");

$ch = curl_init();
if($ch){
   curl_setopt($ch, CURLOPT_URL, $url);
   curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 15);
   curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
   curl_setopt($ch, CURLOPT_POST, 1);
   curl_setopt($ch, CURLOPT_POSTFIELDS, $postdata);
   curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookies.txt'); // set cookie file to given file
   curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookies.txt'); // set same file as cookie jar

   $content = curl_exec($ch);
   $headers = curl_getinfo($ch);				

   curl_close($ch);

   // Debug option
   // print_r($headers);

   if($headers['http_code'] == 200){ 
      echo $content;
   }
}
?>

here is the html im pulling

<table width=100% border=1><tr><td class=instruction6 colspan=4><b>My Race Notes</b></td></tr>
<tr><td width=90%><form action='races.php?id=7456132' method=post>
<textarea name='comments' rows=2 cols=38>Type notes & press Add</textarea></td>
<td width=5%><input type=submit class='weestatbutton' value='Add'></form></td></tr></table></td></tr></table><table width=100%><tr class=databreakdown2253><th><a href='races.php?id=7456132&sortby=1'>Place</a></th><th>Dist Bt</th><th>Stall</th>
<th>Horse</th><th>Age</th><th><a href='races.php?id=7456132&sortby=3'>Weight</a></th><th>Headgear</th><th>OR</th><th>Trainer</th>
<th><a href='races.php?id=7456132&sortby=2'>Odds</a></th><th>Jockey (Claim)</th></tr><tr><td class=databreakdown2253>1st</td><td class=databreakdown2253></td><td class=databreakdown2253>4</td>
<td class=databreakdown2253><a href='horses.php?id=298745'>Telegraph (IRE)</a></td>
<td class=databreakdown2253>3</td><td class=databreakdown2253>9-3</td><td class=databreakdown2253></td>
<td class=databreakdown2253>57</td>
<td class=databreakdown2253><a href='trainers.php?id=2448'>Evans, P D</a></td>
<td class=databreakdown2253>28/1  </td>
<td class=databreakdown2253><a href='jockeys.php?id=694'>Egan, John</a> </td></tr><tr class=databreakdown18><td colspan=12>soon led, brought field stands side from 3f out, headed 2f out, rallied inside final furlong, bumped and led again towards finish</td></tr><tr><td class=databreakdown2253>2nd</td><td class=databreakdown2253>0.5</td><td class=databreakdown2253>3</td>
<td class=databreakdown2253><a href='horses.php?id=305855'>Ecliptic Sunrise</a></td>
<td class=databreakdown2253>3</td><td class=databreakdown2253>8-12td><td class=databreakdown2253></td>
<td class=databreakdown2253>52</td>
<td class=databreakdown2253><a href='trainers.php?id=4516'>Donovan, D</a></td>
<td class=databreakdown2253>10/1  </td>
<td class=databreakdown2253><a href='jockeys.php?id=3414'>Cosgrave, Pat</a> </td></tr><tr class=databreakdown18><td colspan=12>chased leaders, challenged 2f out, led 2f out, edged right inside final furlong, rider lost whip and headed towards finish</td></tr><tr><td class=databreakdown2253>3rd</td><td class=databreakdown2253>1.5</td><td class=databreakdown2253>1</td>
<td class=databreakdown2253><a href='horses.php?id=300316'>Bookmaker</a></td>
<td class=databreakdown2253>4</td><td class=databreakdown2253>9-6</td><td class=databreakdown2253><a title='Blinkers worn'>Blnk</a></td>
<td class=databreakdown2253>59</td>
<td class=databreakdown2253><a href='trainers.php?id=933'>Bridger, J J</a></td>
<td class=databreakdown2253>6/1  </td>
<td class=databreakdown2253><a href='jockeys.php?id=3848'>Carson, William</a> </td></tr><tr class=databreakdown18><td colspan=12>prominent, took keen hold, led 2f out, headed over 1f out, not much room inside final furlong, stayed on same pace</td></tr><tr><td class=databreakdown2253>4th</td><td class=databreakdown2253>1</td><td class=databreakdown2253>2</td>
<td class=databreakdown2253><a href='horses.php?id=261986'>Night Trade (IRE)</a></td>
<td class=databreakdown2253>7</td><td class=databreakdown2253>8-8</td><td class=databreakdown2253><a title='Cheekpieces worn'>CkPc</a></td>
<td class=databreakdown2253>50</td>
<td class=databreakdown2253><a href='trainers.php?id=2653'>Harris, R A</a></td>
<td class=databreakdown2253>6/1  </td>
<td class=databreakdown2253><a href='jockeys.php?id=7348'>Hardie, Cameron</a> (3)</td></tr><tr class=databreakdown18><td colspan=12>prominent, ridden over 2f out, switched left inside final furlong, no extra close home</td></tr><tr><td class=databreakdown2253>5th</td><td class=databreakdown2253>1.5</td><td class=databreakdown2253>6</td>
<td class=databreakdown2253><a href='horses.php?id=299296'>Trigger Park (IRE)</a></td>
<td class=databreakdown2253>3</td><td class=databreakdown2253>8-10</td><td class=databreakdown2253></td>
<td class=databreakdown2253>50</td>
<td class=databreakdown2253><a href='trainers.php?id=2653'>Harris, R A</a></td>
<td class=databreakdown2253>20/1  </td>
<td class=databreakdown2253><a href='jockeys.php?id=3422'>Dobbs, Pat</a> </td></tr><tr class=databreakdown18><td colspan=12>chased leaders, ridden over 2f out, one pace over 1f out, no impression</td></tr><tr><td class=databreakdown2253>6th</td><td class=databreakdown2253>2.25</td><td class=databreakdown2253>7</td>
<td class=databreakdown2253><a href='horses.php?id=300337'>Port Lairge</a></td>
<td class=databreakdown2253>4</td><td class=databreakdown2253>8-11</td><td class=databreakdown2253><a title='Blinkers worn'>Blnk</a></td>
<td class=databreakdown2253>50</td>
<td class=databreakdown2253><a href='trainers.php?id=914'>Gallagher, J</a></td>
<td class=databreakdown2253>33/1  </td>
<td class=databreakdown2253><a href='jockeys.php?id=193'>Catlin, Chris</a> </td></tr><tr class=databreakdown18><td colspan=12>slowly into stride, in rear, stayed on inside final furlong, never dangerous</td></tr><tr><td class=databreakdown2253>7th</td><td class=databreakdown2253>NK</td><td class=databreakdown2253>11</td>
<td class=databreakdown2253><a href='horses.php?id=289934'>Lionheart</a></td>
<td class=databreakdown2253>4</td><td class=databreakdown2253>8-13</td><td class=databreakdown2253></td>
<td class=databreakdown2253>59</td>
<td class=databreakdown2253><a href='trainers.php?id=4910'>Crate, Peter</a></td>
<td class=databreakdown2253>10/1  </td>
<td class=databreakdown2253><a href='jockeys.php?id=7375'>Crouch, Hector</a> (7)</td></tr><tr class=databreakdown18><td colspan=12>reared start and slowly away, held up in rear, headway over 1f out, weakened inside final furlong</td></tr><tr><td class=databreakdown2253>8th</td><td class=databreakdown2253>2.75</td><td class=databreakdown2253>14</td>
<td class=databreakdown2253><a href='horses.php?id=289421'>Koharu</a></td>
<td class=databreakdown2253>4</td><td class=databreakdown2253>9-4</td><td class=databreakdown2253><a title='Cheekpieces worn'>CkPc</a></td>
<td class=databreakdown2253>60</td>
<td class=databreakdown2253><a href='trainers.php?id=2495'>Makin, P J</a></td>
<td class=databreakdown2253>9/4 (Fav) </td>
<td class=databreakdown2253><a href='jockeys.php?id=5952'>Bates, Mr D J</a> (3)</td></tr><tr class=databreakdown18><td colspan=12>in rear, ridden over 3f out, no impression</td></tr><tr><td class=databreakdown2253>9th</td><td class=databreakdown2253>3</td><td class=databreakdown2253>5</td>
<td class=databreakdown2253><a href='horses.php?id=269827'>Saskias Dream</a></td>
<td class=databreakdown2253>6</td><td class=databreakdown2253>9-6</td><td class=databreakdown2253><a title='Visor worn'>Vsor</a></td>
<td class=databreakdown2253>59</td>
<td class=databreakdown2253><a href='trainers.php?id=2002'>Chapple-Hyam, Jane</a></td>
<td class=databreakdown2253>4/1  </td>
<td class=databreakdown2253><a href='jockeys.php?id=3544'>Hughes, Richard</a> </td></tr><tr class=databreakdown18><td colspan=12>mid-division, headway and switched left over 1f out, edged left entering final furlong, soon eased</td></tr><tr><td class=databreakdown2253>10th</td><td class=databreakdown2253>1.75</td><td class=databreakdown2253>12</td>
<td class=databreakdown2253><a href='horses.php?id=304248'>Crafty Business (IRE)</a></td>
<td class=databreakdown2253>3</td><td class=databreakdown2253>9-2</td><td class=databreakdown2253><a title='Visor worn'>Vsor</a></td>
<td class=databreakdown2253>59</td>
<td class=databreakdown2253><a href='trainers.php?id=695'>Moore, G L</a></td>
<td class=databreakdown2253>14/1  </td>
<td class=databreakdown2253><a href='jockeys.php?id=6669'>Bishop, Mr C</a> (3)</td></tr><tr class=databreakdown18><td colspan=12>towards rear, pushed along over 3f out, well beaten 2f out</td></tr></table><br><hr></td></tr></table>

*note I'm using this for personal reasons

Link to comment
Share on other sites

Ok so i have done the following

$dom = new DOMDocument();
    $html = $content;
    // load html
    @$dom->loadHTML($html);
    $xpath = new DOMXPath($dom);

    //this will gives you all td with class name is jobs.
    $my_xpath_query = '//table/tr/td[contains(@class, "databreakdown")]';
    $result_rows = $xpath->query($my_xpath_query);

    //iterate all td
    foreach ($result_rows as $result_object){
		        echo $result_object->nodeValue. "<br />";
    }

what i would like to do is if the table contains td[contains(@class, "databreakdown")] then grab the whole table!

at the moment it just returns the td

Link to comment
Share on other sites

This thread is more than a year old.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.