Jump to content

web scraping


runnerjp

Recommended Posts

Hello,

 

Currently my webscraper signs into the site and pulls all the html -> perfect.

 

What I need to do is to loop only specific information (horses that ran)

 

here is my current php code

<?
$url = 'site';
$postdata = array('username' => "username",
		          'password' => "password");

$ch = curl_init();
if($ch){
   curl_setopt($ch, CURLOPT_URL, $url);
   curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 15);
   curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
   curl_setopt($ch, CURLOPT_POST, 1);
   curl_setopt($ch, CURLOPT_POSTFIELDS, $postdata);
   curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookies.txt'); // set cookie file to given file
   curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookies.txt'); // set same file as cookie jar

   $content = curl_exec($ch);
   $headers = curl_getinfo($ch);				

   curl_close($ch);

   // Debug option
   // print_r($headers);

   if($headers['http_code'] == 200){ 
      echo $content;
   }
}
?>

here is the html im pulling

<table width=100% border=1><tr><td class=instruction6 colspan=4><b>My Race Notes</b></td></tr>
<tr><td width=90%><form action='races.php?id=7456132' method=post>
<textarea name='comments' rows=2 cols=38>Type notes & press Add</textarea></td>
<td width=5%><input type=submit class='weestatbutton' value='Add'></form></td></tr></table></td></tr></table><table width=100%><tr class=databreakdown2253><th><a href='races.php?id=7456132&sortby=1'>Place</a></th><th>Dist Bt</th><th>Stall</th>
<th>Horse</th><th>Age</th><th><a href='races.php?id=7456132&sortby=3'>Weight</a></th><th>Headgear</th><th>OR</th><th>Trainer</th>
<th><a href='races.php?id=7456132&sortby=2'>Odds</a></th><th>Jockey (Claim)</th></tr><tr><td class=databreakdown2253>1st</td><td class=databreakdown2253></td><td class=databreakdown2253>4</td>
<td class=databreakdown2253><a href='horses.php?id=298745'>Telegraph (IRE)</a></td>
<td class=databreakdown2253>3</td><td class=databreakdown2253>9-3</td><td class=databreakdown2253></td>
<td class=databreakdown2253>57</td>
<td class=databreakdown2253><a href='trainers.php?id=2448'>Evans, P D</a></td>
<td class=databreakdown2253>28/1  </td>
<td class=databreakdown2253><a href='jockeys.php?id=694'>Egan, John</a> </td></tr><tr class=databreakdown18><td colspan=12>soon led, brought field stands side from 3f out, headed 2f out, rallied inside final furlong, bumped and led again towards finish</td></tr><tr><td class=databreakdown2253>2nd</td><td class=databreakdown2253>0.5</td><td class=databreakdown2253>3</td>
<td class=databreakdown2253><a href='horses.php?id=305855'>Ecliptic Sunrise</a></td>
<td class=databreakdown2253>3</td><td class=databreakdown2253>8-12td><td class=databreakdown2253></td>
<td class=databreakdown2253>52</td>
<td class=databreakdown2253><a href='trainers.php?id=4516'>Donovan, D</a></td>
<td class=databreakdown2253>10/1  </td>
<td class=databreakdown2253><a href='jockeys.php?id=3414'>Cosgrave, Pat</a> </td></tr><tr class=databreakdown18><td colspan=12>chased leaders, challenged 2f out, led 2f out, edged right inside final furlong, rider lost whip and headed towards finish</td></tr><tr><td class=databreakdown2253>3rd</td><td class=databreakdown2253>1.5</td><td class=databreakdown2253>1</td>
<td class=databreakdown2253><a href='horses.php?id=300316'>Bookmaker</a></td>
<td class=databreakdown2253>4</td><td class=databreakdown2253>9-6</td><td class=databreakdown2253><a title='Blinkers worn'>Blnk</a></td>
<td class=databreakdown2253>59</td>
<td class=databreakdown2253><a href='trainers.php?id=933'>Bridger, J J</a></td>
<td class=databreakdown2253>6/1  </td>
<td class=databreakdown2253><a href='jockeys.php?id=3848'>Carson, William</a> </td></tr><tr class=databreakdown18><td colspan=12>prominent, took keen hold, led 2f out, headed over 1f out, not much room inside final furlong, stayed on same pace</td></tr><tr><td class=databreakdown2253>4th</td><td class=databreakdown2253>1</td><td class=databreakdown2253>2</td>
<td class=databreakdown2253><a href='horses.php?id=261986'>Night Trade (IRE)</a></td>
<td class=databreakdown2253>7</td><td class=databreakdown2253>8-8</td><td class=databreakdown2253><a title='Cheekpieces worn'>CkPc</a></td>
<td class=databreakdown2253>50</td>
<td class=databreakdown2253><a href='trainers.php?id=2653'>Harris, R A</a></td>
<td class=databreakdown2253>6/1  </td>
<td class=databreakdown2253><a href='jockeys.php?id=7348'>Hardie, Cameron</a> (3)</td></tr><tr class=databreakdown18><td colspan=12>prominent, ridden over 2f out, switched left inside final furlong, no extra close home</td></tr><tr><td class=databreakdown2253>5th</td><td class=databreakdown2253>1.5</td><td class=databreakdown2253>6</td>
<td class=databreakdown2253><a href='horses.php?id=299296'>Trigger Park (IRE)</a></td>
<td class=databreakdown2253>3</td><td class=databreakdown2253>8-10</td><td class=databreakdown2253></td>
<td class=databreakdown2253>50</td>
<td class=databreakdown2253><a href='trainers.php?id=2653'>Harris, R A</a></td>
<td class=databreakdown2253>20/1  </td>
<td class=databreakdown2253><a href='jockeys.php?id=3422'>Dobbs, Pat</a> </td></tr><tr class=databreakdown18><td colspan=12>chased leaders, ridden over 2f out, one pace over 1f out, no impression</td></tr><tr><td class=databreakdown2253>6th</td><td class=databreakdown2253>2.25</td><td class=databreakdown2253>7</td>
<td class=databreakdown2253><a href='horses.php?id=300337'>Port Lairge</a></td>
<td class=databreakdown2253>4</td><td class=databreakdown2253>8-11</td><td class=databreakdown2253><a title='Blinkers worn'>Blnk</a></td>
<td class=databreakdown2253>50</td>
<td class=databreakdown2253><a href='trainers.php?id=914'>Gallagher, J</a></td>
<td class=databreakdown2253>33/1  </td>
<td class=databreakdown2253><a href='jockeys.php?id=193'>Catlin, Chris</a> </td></tr><tr class=databreakdown18><td colspan=12>slowly into stride, in rear, stayed on inside final furlong, never dangerous</td></tr><tr><td class=databreakdown2253>7th</td><td class=databreakdown2253>NK</td><td class=databreakdown2253>11</td>
<td class=databreakdown2253><a href='horses.php?id=289934'>Lionheart</a></td>
<td class=databreakdown2253>4</td><td class=databreakdown2253>8-13</td><td class=databreakdown2253></td>
<td class=databreakdown2253>59</td>
<td class=databreakdown2253><a href='trainers.php?id=4910'>Crate, Peter</a></td>
<td class=databreakdown2253>10/1  </td>
<td class=databreakdown2253><a href='jockeys.php?id=7375'>Crouch, Hector</a> (7)</td></tr><tr class=databreakdown18><td colspan=12>reared start and slowly away, held up in rear, headway over 1f out, weakened inside final furlong</td></tr><tr><td class=databreakdown2253>8th</td><td class=databreakdown2253>2.75</td><td class=databreakdown2253>14</td>
<td class=databreakdown2253><a href='horses.php?id=289421'>Koharu</a></td>
<td class=databreakdown2253>4</td><td class=databreakdown2253>9-4</td><td class=databreakdown2253><a title='Cheekpieces worn'>CkPc</a></td>
<td class=databreakdown2253>60</td>
<td class=databreakdown2253><a href='trainers.php?id=2495'>Makin, P J</a></td>
<td class=databreakdown2253>9/4 (Fav) </td>
<td class=databreakdown2253><a href='jockeys.php?id=5952'>Bates, Mr D J</a> (3)</td></tr><tr class=databreakdown18><td colspan=12>in rear, ridden over 3f out, no impression</td></tr><tr><td class=databreakdown2253>9th</td><td class=databreakdown2253>3</td><td class=databreakdown2253>5</td>
<td class=databreakdown2253><a href='horses.php?id=269827'>Saskias Dream</a></td>
<td class=databreakdown2253>6</td><td class=databreakdown2253>9-6</td><td class=databreakdown2253><a title='Visor worn'>Vsor</a></td>
<td class=databreakdown2253>59</td>
<td class=databreakdown2253><a href='trainers.php?id=2002'>Chapple-Hyam, Jane</a></td>
<td class=databreakdown2253>4/1  </td>
<td class=databreakdown2253><a href='jockeys.php?id=3544'>Hughes, Richard</a> </td></tr><tr class=databreakdown18><td colspan=12>mid-division, headway and switched left over 1f out, edged left entering final furlong, soon eased</td></tr><tr><td class=databreakdown2253>10th</td><td class=databreakdown2253>1.75</td><td class=databreakdown2253>12</td>
<td class=databreakdown2253><a href='horses.php?id=304248'>Crafty Business (IRE)</a></td>
<td class=databreakdown2253>3</td><td class=databreakdown2253>9-2</td><td class=databreakdown2253><a title='Visor worn'>Vsor</a></td>
<td class=databreakdown2253>59</td>
<td class=databreakdown2253><a href='trainers.php?id=695'>Moore, G L</a></td>
<td class=databreakdown2253>14/1  </td>
<td class=databreakdown2253><a href='jockeys.php?id=6669'>Bishop, Mr C</a> (3)</td></tr><tr class=databreakdown18><td colspan=12>towards rear, pushed along over 3f out, well beaten 2f out</td></tr></table><br><hr></td></tr></table>

*note I'm using this for personal reasons

Link to comment
Share on other sites

Ok so i have done the following

$dom = new DOMDocument();
    $html = $content;
    // load html
    @$dom->loadHTML($html);
    $xpath = new DOMXPath($dom);

    //this will gives you all td with class name is jobs.
    $my_xpath_query = '//table/tr/td[contains(@class, "databreakdown")]';
    $result_rows = $xpath->query($my_xpath_query);

    //iterate all td
    foreach ($result_rows as $result_object){
		        echo $result_object->nodeValue. "<br />";
    }

what i would like to do is if the table contains td[contains(@class, "databreakdown")] then grab the whole table!

at the moment it just returns the td

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.