Jump to content

PHP Syntax Error - Need help to understand


n1concepts
Go to solution Solved by requinix,

Recommended Posts

Hi,

 

I was reviewing a php web scraping write up which is found at http://imbuzu.wordpress.com/tag/web-scraping and discovered there is a syntax error in the author's code:

 

THE ERROR IS ON THIS LINE (FULL SET OF CODE CAN BE FOUND AT AUTHOR'S SITE - link above)

for ($i = 0; $i getElementsByTagName('td');

 

(I'm posting below): Note - i can't understand the logic of the 'for' loop, getElementsByTagName function to fix the problem so asking for help to make this work as the author suggested.

<?php    
     
error_reporting(E_ERROR);

$url = "http://www.imdb.com/chart/";
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
$document = curl_exec($curl);

//echo $document;

$dom_rep = new DOMDocument;
$dom_rep->loadHTML($document);

$all_trs = $dom_rep->getElementsByTagName('tr');
$trs_we_want = array();
foreach ($all_trs as $tr) {
  $class_name = $tr->getAttribute('class');
  if (preg_match("/chart_(even|odd)_row/", $class_name)) {
    $trs_we_want[] = $tr;
  }
}

for ($i = 0; $i getElementsByTagName('td');
  $the_tds_arr = array();

  foreach ($the_tds as $td) {
    $the_tds_arr[] = $td;
  }

  $movie_title = $the_tds_arr[2]->nodeValue;
  $rank = $the_tds_arr[0]->nodeValue;
  $weekend = $the_tds_arr[3]->nodeValue;
  $gross = $the_tds_arr[4]->nodeValue;
  $weeks = $the_tds_arr[5]->nodeValue;
  echo "<div>";
  echo "<h2>$movie_title</h2>";
  echo "Rank: $rank<br />";
  echo "Weekend: $weekend<br />";
  echo "Gross: $gross<br />";
  echo "Weeks: $weeks<br />";
  echo "</div>";
}    
    
    
?>

Link to comment
Share on other sites

  • Solution

Yeah. See how "$movie_title" is messed up? I'd make an educated guess that the next character after the "$i" was a less-than - you know, the symbol that marks the beginning of an HTML tag? The blog and/or author and/or code plugin whatever is stupid with HTML to the point of leaving some HTML markup unescaped and "sanitizing" other.

 

Looks like it should read

for ($i = 0; $i < count($trs_we_want); $i++) {               // everything from the <
    $the_tds = $trs_we_want[$i]->getElementsByTagName('td'); // to the next > was removed
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.