runnerjp Posted October 8, 2014 Share Posted October 8, 2014 Hello, Currently my webscraper signs into the site and pulls all the html -> perfect. What I need to do is to loop only specific information (horses that ran) here is my current php code <? $url = 'site'; $postdata = array('username' => "username", 'password' => "password"); $ch = curl_init(); if($ch){ curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 15); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS, $postdata); curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookies.txt'); // set cookie file to given file curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookies.txt'); // set same file as cookie jar $content = curl_exec($ch); $headers = curl_getinfo($ch); curl_close($ch); // Debug option // print_r($headers); if($headers['http_code'] == 200){ echo $content; } } ?> here is the html im pulling <table width=100% border=1><tr><td class=instruction6 colspan=4><b>My Race Notes</b></td></tr> <tr><td width=90%><form action='races.php?id=7456132' method=post> <textarea name='comments' rows=2 cols=38>Type notes & press Add</textarea></td> <td width=5%><input type=submit class='weestatbutton' value='Add'></form></td></tr></table></td></tr></table><table width=100%><tr class=databreakdown2253><th><a href='races.php?id=7456132&sortby=1'>Place</a></th><th>Dist Bt</th><th>Stall</th> <th>Horse</th><th>Age</th><th><a href='races.php?id=7456132&sortby=3'>Weight</a></th><th>Headgear</th><th>OR</th><th>Trainer</th> <th><a href='races.php?id=7456132&sortby=2'>Odds</a></th><th>Jockey (Claim)</th></tr><tr><td class=databreakdown2253>1st</td><td class=databreakdown2253></td><td class=databreakdown2253>4</td> <td class=databreakdown2253><a href='horses.php?id=298745'>Telegraph (IRE)</a></td> <td class=databreakdown2253>3</td><td class=databreakdown2253>9-3</td><td class=databreakdown2253></td> <td class=databreakdown2253>57</td> <td class=databreakdown2253><a href='trainers.php?id=2448'>Evans, P D</a></td> <td class=databreakdown2253>28/1 </td> <td class=databreakdown2253><a href='jockeys.php?id=694'>Egan, John</a> </td></tr><tr class=databreakdown18><td colspan=12>soon led, brought field stands side from 3f out, headed 2f out, rallied inside final furlong, bumped and led again towards finish</td></tr><tr><td class=databreakdown2253>2nd</td><td class=databreakdown2253>0.5</td><td class=databreakdown2253>3</td> <td class=databreakdown2253><a href='horses.php?id=305855'>Ecliptic Sunrise</a></td> <td class=databreakdown2253>3</td><td class=databreakdown2253>8-12td><td class=databreakdown2253></td> <td class=databreakdown2253>52</td> <td class=databreakdown2253><a href='trainers.php?id=4516'>Donovan, D</a></td> <td class=databreakdown2253>10/1 </td> <td class=databreakdown2253><a href='jockeys.php?id=3414'>Cosgrave, Pat</a> </td></tr><tr class=databreakdown18><td colspan=12>chased leaders, challenged 2f out, led 2f out, edged right inside final furlong, rider lost whip and headed towards finish</td></tr><tr><td class=databreakdown2253>3rd</td><td class=databreakdown2253>1.5</td><td class=databreakdown2253>1</td> <td class=databreakdown2253><a href='horses.php?id=300316'>Bookmaker</a></td> <td class=databreakdown2253>4</td><td class=databreakdown2253>9-6</td><td class=databreakdown2253><a title='Blinkers worn'>Blnk</a></td> <td class=databreakdown2253>59</td> <td class=databreakdown2253><a href='trainers.php?id=933'>Bridger, J J</a></td> <td class=databreakdown2253>6/1 </td> <td class=databreakdown2253><a href='jockeys.php?id=3848'>Carson, William</a> </td></tr><tr class=databreakdown18><td colspan=12>prominent, took keen hold, led 2f out, headed over 1f out, not much room inside final furlong, stayed on same pace</td></tr><tr><td class=databreakdown2253>4th</td><td class=databreakdown2253>1</td><td class=databreakdown2253>2</td> <td class=databreakdown2253><a href='horses.php?id=261986'>Night Trade (IRE)</a></td> <td class=databreakdown2253>7</td><td class=databreakdown2253>8-8</td><td class=databreakdown2253><a title='Cheekpieces worn'>CkPc</a></td> <td class=databreakdown2253>50</td> <td class=databreakdown2253><a href='trainers.php?id=2653'>Harris, R A</a></td> <td class=databreakdown2253>6/1 </td> <td class=databreakdown2253><a href='jockeys.php?id=7348'>Hardie, Cameron</a> (3)</td></tr><tr class=databreakdown18><td colspan=12>prominent, ridden over 2f out, switched left inside final furlong, no extra close home</td></tr><tr><td class=databreakdown2253>5th</td><td class=databreakdown2253>1.5</td><td class=databreakdown2253>6</td> <td class=databreakdown2253><a href='horses.php?id=299296'>Trigger Park (IRE)</a></td> <td class=databreakdown2253>3</td><td class=databreakdown2253>8-10</td><td class=databreakdown2253></td> <td class=databreakdown2253>50</td> <td class=databreakdown2253><a href='trainers.php?id=2653'>Harris, R A</a></td> <td class=databreakdown2253>20/1 </td> <td class=databreakdown2253><a href='jockeys.php?id=3422'>Dobbs, Pat</a> </td></tr><tr class=databreakdown18><td colspan=12>chased leaders, ridden over 2f out, one pace over 1f out, no impression</td></tr><tr><td class=databreakdown2253>6th</td><td class=databreakdown2253>2.25</td><td class=databreakdown2253>7</td> <td class=databreakdown2253><a href='horses.php?id=300337'>Port Lairge</a></td> <td class=databreakdown2253>4</td><td class=databreakdown2253>8-11</td><td class=databreakdown2253><a title='Blinkers worn'>Blnk</a></td> <td class=databreakdown2253>50</td> <td class=databreakdown2253><a href='trainers.php?id=914'>Gallagher, J</a></td> <td class=databreakdown2253>33/1 </td> <td class=databreakdown2253><a href='jockeys.php?id=193'>Catlin, Chris</a> </td></tr><tr class=databreakdown18><td colspan=12>slowly into stride, in rear, stayed on inside final furlong, never dangerous</td></tr><tr><td class=databreakdown2253>7th</td><td class=databreakdown2253>NK</td><td class=databreakdown2253>11</td> <td class=databreakdown2253><a href='horses.php?id=289934'>Lionheart</a></td> <td class=databreakdown2253>4</td><td class=databreakdown2253>8-13</td><td class=databreakdown2253></td> <td class=databreakdown2253>59</td> <td class=databreakdown2253><a href='trainers.php?id=4910'>Crate, Peter</a></td> <td class=databreakdown2253>10/1 </td> <td class=databreakdown2253><a href='jockeys.php?id=7375'>Crouch, Hector</a> (7)</td></tr><tr class=databreakdown18><td colspan=12>reared start and slowly away, held up in rear, headway over 1f out, weakened inside final furlong</td></tr><tr><td class=databreakdown2253>8th</td><td class=databreakdown2253>2.75</td><td class=databreakdown2253>14</td> <td class=databreakdown2253><a href='horses.php?id=289421'>Koharu</a></td> <td class=databreakdown2253>4</td><td class=databreakdown2253>9-4</td><td class=databreakdown2253><a title='Cheekpieces worn'>CkPc</a></td> <td class=databreakdown2253>60</td> <td class=databreakdown2253><a href='trainers.php?id=2495'>Makin, P J</a></td> <td class=databreakdown2253>9/4 (Fav) </td> <td class=databreakdown2253><a href='jockeys.php?id=5952'>Bates, Mr D J</a> (3)</td></tr><tr class=databreakdown18><td colspan=12>in rear, ridden over 3f out, no impression</td></tr><tr><td class=databreakdown2253>9th</td><td class=databreakdown2253>3</td><td class=databreakdown2253>5</td> <td class=databreakdown2253><a href='horses.php?id=269827'>Saskias Dream</a></td> <td class=databreakdown2253>6</td><td class=databreakdown2253>9-6</td><td class=databreakdown2253><a title='Visor worn'>Vsor</a></td> <td class=databreakdown2253>59</td> <td class=databreakdown2253><a href='trainers.php?id=2002'>Chapple-Hyam, Jane</a></td> <td class=databreakdown2253>4/1 </td> <td class=databreakdown2253><a href='jockeys.php?id=3544'>Hughes, Richard</a> </td></tr><tr class=databreakdown18><td colspan=12>mid-division, headway and switched left over 1f out, edged left entering final furlong, soon eased</td></tr><tr><td class=databreakdown2253>10th</td><td class=databreakdown2253>1.75</td><td class=databreakdown2253>12</td> <td class=databreakdown2253><a href='horses.php?id=304248'>Crafty Business (IRE)</a></td> <td class=databreakdown2253>3</td><td class=databreakdown2253>9-2</td><td class=databreakdown2253><a title='Visor worn'>Vsor</a></td> <td class=databreakdown2253>59</td> <td class=databreakdown2253><a href='trainers.php?id=695'>Moore, G L</a></td> <td class=databreakdown2253>14/1 </td> <td class=databreakdown2253><a href='jockeys.php?id=6669'>Bishop, Mr C</a> (3)</td></tr><tr class=databreakdown18><td colspan=12>towards rear, pushed along over 3f out, well beaten 2f out</td></tr></table><br><hr></td></tr></table> *note I'm using this for personal reasons Link to comment https://forums.phpfreaks.com/topic/291505-web-scraping/ Share on other sites More sharing options...
CroNiX Posted October 8, 2014 Share Posted October 8, 2014 Once you grab the page $content, use simple_html_dom to parse it and grab what you need, kinda like how you would with jQuery. http://simplehtmldom.sourceforge.net/ Link to comment https://forums.phpfreaks.com/topic/291505-web-scraping/#findComment-1493029 Share on other sites More sharing options...
runnerjp Posted October 8, 2014 Author Share Posted October 8, 2014 Ok so i have done the following $dom = new DOMDocument(); $html = $content; // load html @$dom->loadHTML($html); $xpath = new DOMXPath($dom); //this will gives you all td with class name is jobs. $my_xpath_query = '//table/tr/td[contains(@class, "databreakdown")]'; $result_rows = $xpath->query($my_xpath_query); //iterate all td foreach ($result_rows as $result_object){ echo $result_object->nodeValue. "<br />"; } what i would like to do is if the table contains td[contains(@class, "databreakdown")] then grab the whole table!at the moment it just returns the td Link to comment https://forums.phpfreaks.com/topic/291505-web-scraping/#findComment-1493032 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.