runnerjp Posted October 8, 2014 Share Posted October 8, 2014 Hello, Currently my webscraper signs into the site and pulls all the html -> perfect. What I need to do is to loop only specific information (horses that ran) here is my current php code <? $url = 'site'; $postdata = array('username' => "username", 'password' => "password"); $ch = curl_init(); if($ch){ curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 15); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS, $postdata); curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookies.txt'); // set cookie file to given file curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookies.txt'); // set same file as cookie jar $content = curl_exec($ch); $headers = curl_getinfo($ch); curl_close($ch); // Debug option // print_r($headers); if($headers['http_code'] == 200){ echo $content; } } ?> here is the html im pulling <table width=100% border=1><tr><td class=instruction6 colspan=4><b>My Race Notes</b></td></tr> <tr><td width=90%><form action='races.php?id=7456132' method=post> <textarea name='comments' rows=2 cols=38>Type notes & press Add</textarea></td> <td width=5%><input type=submit class='weestatbutton' value='Add'></form></td></tr></table></td></tr></table><table width=100%><tr class=databreakdown2253><th><a href='races.php?id=7456132&sortby=1'>Place</a></th><th>Dist Bt</th><th>Stall</th> <th>Horse</th><th>Age</th><th><a href='races.php?id=7456132&sortby=3'>Weight</a></th><th>Headgear</th><th>OR</th><th>Trainer</th> <th><a href='races.php?id=7456132&sortby=2'>Odds</a></th><th>Jockey (Claim)</th></tr><tr><td class=databreakdown2253>1st</td><td class=databreakdown2253></td><td class=databreakdown2253>4</td> <td class=databreakdown2253><a href='horses.php?id=298745'>Telegraph (IRE)</a></td> <td class=databreakdown2253>3</td><td class=databreakdown2253>9-3</td><td class=databreakdown2253></td> <td class=databreakdown2253>57</td> <td class=databreakdown2253><a href='trainers.php?id=2448'>Evans, P D</a></td> <td class=databreakdown2253>28/1 </td> <td class=databreakdown2253><a href='jockeys.php?id=694'>Egan, John</a> </td></tr><tr class=databreakdown18><td colspan=12>soon led, brought field stands side from 3f out, headed 2f out, rallied inside final furlong, bumped and led again towards finish</td></tr><tr><td class=databreakdown2253>2nd</td><td class=databreakdown2253>0.5</td><td class=databreakdown2253>3</td> <td class=databreakdown2253><a href='horses.php?id=305855'>Ecliptic Sunrise</a></td> <td class=databreakdown2253>3</td><td class=databreakdown2253>8-12td><td class=databreakdown2253></td> <td class=databreakdown2253>52</td> <td class=databreakdown2253><a href='trainers.php?id=4516'>Donovan, D</a></td> <td class=databreakdown2253>10/1 </td> <td class=databreakdown2253><a href='jockeys.php?id=3414'>Cosgrave, Pat</a> </td></tr><tr class=databreakdown18><td colspan=12>chased leaders, challenged 2f out, led 2f out, edged right inside final furlong, rider lost whip and headed towards finish</td></tr><tr><td class=databreakdown2253>3rd</td><td class=databreakdown2253>1.5</td><td class=databreakdown2253>1</td> <td class=databreakdown2253><a href='horses.php?id=300316'>Bookmaker</a></td> <td class=databreakdown2253>4</td><td class=databreakdown2253>9-6</td><td class=databreakdown2253><a title='Blinkers worn'>Blnk</a></td> <td class=databreakdown2253>59</td> <td class=databreakdown2253><a href='trainers.php?id=933'>Bridger, J J</a></td> <td class=databreakdown2253>6/1 </td> <td class=databreakdown2253><a href='jockeys.php?id=3848'>Carson, William</a> </td></tr><tr class=databreakdown18><td colspan=12>prominent, took keen hold, led 2f out, headed over 1f out, not much room inside final furlong, stayed on same pace</td></tr><tr><td class=databreakdown2253>4th</td><td class=databreakdown2253>1</td><td class=databreakdown2253>2</td> <td class=databreakdown2253><a href='horses.php?id=261986'>Night Trade (IRE)</a></td> <td class=databreakdown2253>7</td><td class=databreakdown2253>8-8</td><td class=databreakdown2253><a title='Cheekpieces worn'>CkPc</a></td> <td class=databreakdown2253>50</td> <td class=databreakdown2253><a href='trainers.php?id=2653'>Harris, R A</a></td> <td class=databreakdown2253>6/1 </td> <td class=databreakdown2253><a href='jockeys.php?id=7348'>Hardie, Cameron</a> (3)</td></tr><tr class=databreakdown18><td colspan=12>prominent, ridden over 2f out, switched left inside final furlong, no extra close home</td></tr><tr><td class=databreakdown2253>5th</td><td class=databreakdown2253>1.5</td><td class=databreakdown2253>6</td> <td class=databreakdown2253><a href='horses.php?id=299296'>Trigger Park (IRE)</a></td> <td class=databreakdown2253>3</td><td class=databreakdown2253>8-10</td><td class=databreakdown2253></td> <td class=databreakdown2253>50</td> <td class=databreakdown2253><a href='trainers.php?id=2653'>Harris, R A</a></td> <td class=databreakdown2253>20/1 </td> <td class=databreakdown2253><a href='jockeys.php?id=3422'>Dobbs, Pat</a> </td></tr><tr class=databreakdown18><td colspan=12>chased leaders, ridden over 2f out, one pace over 1f out, no impression</td></tr><tr><td class=databreakdown2253>6th</td><td class=databreakdown2253>2.25</td><td class=databreakdown2253>7</td> <td class=databreakdown2253><a href='horses.php?id=300337'>Port Lairge</a></td> <td class=databreakdown2253>4</td><td class=databreakdown2253>8-11</td><td class=databreakdown2253><a title='Blinkers worn'>Blnk</a></td> <td class=databreakdown2253>50</td> <td class=databreakdown2253><a href='trainers.php?id=914'>Gallagher, J</a></td> <td class=databreakdown2253>33/1 </td> <td class=databreakdown2253><a href='jockeys.php?id=193'>Catlin, Chris</a> </td></tr><tr class=databreakdown18><td colspan=12>slowly into stride, in rear, stayed on inside final furlong, never dangerous</td></tr><tr><td class=databreakdown2253>7th</td><td class=databreakdown2253>NK</td><td class=databreakdown2253>11</td> <td class=databreakdown2253><a href='horses.php?id=289934'>Lionheart</a></td> <td class=databreakdown2253>4</td><td class=databreakdown2253>8-13</td><td class=databreakdown2253></td> <td class=databreakdown2253>59</td> <td class=databreakdown2253><a href='trainers.php?id=4910'>Crate, Peter</a></td> <td class=databreakdown2253>10/1 </td> <td class=databreakdown2253><a href='jockeys.php?id=7375'>Crouch, Hector</a> (7)</td></tr><tr class=databreakdown18><td colspan=12>reared start and slowly away, held up in rear, headway over 1f out, weakened inside final furlong</td></tr><tr><td class=databreakdown2253>8th</td><td class=databreakdown2253>2.75</td><td class=databreakdown2253>14</td> <td class=databreakdown2253><a href='horses.php?id=289421'>Koharu</a></td> <td class=databreakdown2253>4</td><td class=databreakdown2253>9-4</td><td class=databreakdown2253><a title='Cheekpieces worn'>CkPc</a></td> <td class=databreakdown2253>60</td> <td class=databreakdown2253><a href='trainers.php?id=2495'>Makin, P J</a></td> <td class=databreakdown2253>9/4 (Fav) </td> <td class=databreakdown2253><a href='jockeys.php?id=5952'>Bates, Mr D J</a> (3)</td></tr><tr class=databreakdown18><td colspan=12>in rear, ridden over 3f out, no impression</td></tr><tr><td class=databreakdown2253>9th</td><td class=databreakdown2253>3</td><td class=databreakdown2253>5</td> <td class=databreakdown2253><a href='horses.php?id=269827'>Saskias Dream</a></td> <td class=databreakdown2253>6</td><td class=databreakdown2253>9-6</td><td class=databreakdown2253><a title='Visor worn'>Vsor</a></td> <td class=databreakdown2253>59</td> <td class=databreakdown2253><a href='trainers.php?id=2002'>Chapple-Hyam, Jane</a></td> <td class=databreakdown2253>4/1 </td> <td class=databreakdown2253><a href='jockeys.php?id=3544'>Hughes, Richard</a> </td></tr><tr class=databreakdown18><td colspan=12>mid-division, headway and switched left over 1f out, edged left entering final furlong, soon eased</td></tr><tr><td class=databreakdown2253>10th</td><td class=databreakdown2253>1.75</td><td class=databreakdown2253>12</td> <td class=databreakdown2253><a href='horses.php?id=304248'>Crafty Business (IRE)</a></td> <td class=databreakdown2253>3</td><td class=databreakdown2253>9-2</td><td class=databreakdown2253><a title='Visor worn'>Vsor</a></td> <td class=databreakdown2253>59</td> <td class=databreakdown2253><a href='trainers.php?id=695'>Moore, G L</a></td> <td class=databreakdown2253>14/1 </td> <td class=databreakdown2253><a href='jockeys.php?id=6669'>Bishop, Mr C</a> (3)</td></tr><tr class=databreakdown18><td colspan=12>towards rear, pushed along over 3f out, well beaten 2f out</td></tr></table><br><hr></td></tr></table> *note I'm using this for personal reasons Quote Link to comment https://forums.phpfreaks.com/topic/291505-web-scraping/ Share on other sites More sharing options...
CroNiX Posted October 8, 2014 Share Posted October 8, 2014 Once you grab the page $content, use simple_html_dom to parse it and grab what you need, kinda like how you would with jQuery. http://simplehtmldom.sourceforge.net/ Quote Link to comment https://forums.phpfreaks.com/topic/291505-web-scraping/#findComment-1493029 Share on other sites More sharing options...
runnerjp Posted October 8, 2014 Author Share Posted October 8, 2014 Ok so i have done the following $dom = new DOMDocument(); $html = $content; // load html @$dom->loadHTML($html); $xpath = new DOMXPath($dom); //this will gives you all td with class name is jobs. $my_xpath_query = '//table/tr/td[contains(@class, "databreakdown")]'; $result_rows = $xpath->query($my_xpath_query); //iterate all td foreach ($result_rows as $result_object){ echo $result_object->nodeValue. "<br />"; } what i would like to do is if the table contains td[contains(@class, "databreakdown")] then grab the whole table!at the moment it just returns the td Quote Link to comment https://forums.phpfreaks.com/topic/291505-web-scraping/#findComment-1493032 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.