n1concepts Posted July 13, 2014 Share Posted July 13, 2014 Hi, I was reviewing a php web scraping write up which is found at http://imbuzu.wordpress.com/tag/web-scraping and discovered there is a syntax error in the author's code: THE ERROR IS ON THIS LINE (FULL SET OF CODE CAN BE FOUND AT AUTHOR'S SITE - link above) for ($i = 0; $i getElementsByTagName('td'); (I'm posting below): Note - i can't understand the logic of the 'for' loop, getElementsByTagName function to fix the problem so asking for help to make this work as the author suggested. <?php error_reporting(E_ERROR); $url = "http://www.imdb.com/chart/"; $curl = curl_init($url); curl_setopt($curl, CURLOPT_RETURNTRANSFER, true); $document = curl_exec($curl); //echo $document; $dom_rep = new DOMDocument; $dom_rep->loadHTML($document); $all_trs = $dom_rep->getElementsByTagName('tr'); $trs_we_want = array(); foreach ($all_trs as $tr) { $class_name = $tr->getAttribute('class'); if (preg_match("/chart_(even|odd)_row/", $class_name)) { $trs_we_want[] = $tr; } } for ($i = 0; $i getElementsByTagName('td'); $the_tds_arr = array(); foreach ($the_tds as $td) { $the_tds_arr[] = $td; } $movie_title = $the_tds_arr[2]->nodeValue; $rank = $the_tds_arr[0]->nodeValue; $weekend = $the_tds_arr[3]->nodeValue; $gross = $the_tds_arr[4]->nodeValue; $weeks = $the_tds_arr[5]->nodeValue; echo "<div>"; echo "<h2>$movie_title</h2>"; echo "Rank: $rank<br />"; echo "Weekend: $weekend<br />"; echo "Gross: $gross<br />"; echo "Weeks: $weeks<br />"; echo "</div>"; } ?> Quote Link to comment Share on other sites More sharing options...
Solution requinix Posted July 13, 2014 Solution Share Posted July 13, 2014 Yeah. See how "$movie_title" is messed up? I'd make an educated guess that the next character after the "$i" was a less-than - you know, the symbol that marks the beginning of an HTML tag? The blog and/or author and/or code plugin whatever is stupid with HTML to the point of leaving some HTML markup unescaped and "sanitizing" other. Looks like it should read for ($i = 0; $i < count($trs_we_want); $i++) { // everything from the < $the_tds = $trs_we_want[$i]->getElementsByTagName('td'); // to the next > was removed Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.