Jump to content

PHP Syntax Error - Need help to understand


n1concepts

Recommended Posts

Hi,

 

I was reviewing a php web scraping write up which is found at http://imbuzu.wordpress.com/tag/web-scraping and discovered there is a syntax error in the author's code:

 

THE ERROR IS ON THIS LINE (FULL SET OF CODE CAN BE FOUND AT AUTHOR'S SITE - link above)

for ($i = 0; $i getElementsByTagName('td');

 

(I'm posting below): Note - i can't understand the logic of the 'for' loop, getElementsByTagName function to fix the problem so asking for help to make this work as the author suggested.

<?php    
     
error_reporting(E_ERROR);

$url = "http://www.imdb.com/chart/";
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
$document = curl_exec($curl);

//echo $document;

$dom_rep = new DOMDocument;
$dom_rep->loadHTML($document);

$all_trs = $dom_rep->getElementsByTagName('tr');
$trs_we_want = array();
foreach ($all_trs as $tr) {
  $class_name = $tr->getAttribute('class');
  if (preg_match("/chart_(even|odd)_row/", $class_name)) {
    $trs_we_want[] = $tr;
  }
}

for ($i = 0; $i getElementsByTagName('td');
  $the_tds_arr = array();

  foreach ($the_tds as $td) {
    $the_tds_arr[] = $td;
  }

  $movie_title = $the_tds_arr[2]->nodeValue;
  $rank = $the_tds_arr[0]->nodeValue;
  $weekend = $the_tds_arr[3]->nodeValue;
  $gross = $the_tds_arr[4]->nodeValue;
  $weeks = $the_tds_arr[5]->nodeValue;
  echo "<div>";
  echo "<h2>$movie_title</h2>";
  echo "Rank: $rank<br />";
  echo "Weekend: $weekend<br />";
  echo "Gross: $gross<br />";
  echo "Weeks: $weeks<br />";
  echo "</div>";
}    
    
    
?>

Yeah. See how "$movie_title" is messed up? I'd make an educated guess that the next character after the "$i" was a less-than - you know, the symbol that marks the beginning of an HTML tag? The blog and/or author and/or code plugin whatever is stupid with HTML to the point of leaving some HTML markup unescaped and "sanitizing" other.

 

Looks like it should read

for ($i = 0; $i < count($trs_we_want); $i++) {               // everything from the <
    $the_tds = $trs_we_want[$i]->getElementsByTagName('td'); // to the next > was removed

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.