Jump to content

DOM Traversing


gingerninja

Recommended Posts

Dear All,

 

I have been trying to speed up a script I wrote that takes the content of a webpage (it's a SAP BI Web Query if anyone's wondering) and parses through it reading the content of a particular table and inserting into a mysql database table.

 

Now before anyone asks, the problem has nothing to do with mysql.

 

What I have found to be a performance problem is this bit of code :-

 

   $doc = new DOMDocument();
   @$doc->loadHTMLFile("/tmp/test.html");

   $tables = $doc->getElementsByTagName('table');
   for ($i=0; $i < $tables->length; ++$i) {
    if ($tables->item($i)->getAttribute('name') == "GR1Table")
     {
       $table = $tables->item($i);
       $find_all_tables_time = time() + microtime();
       break;
     }
   }

   // get all rows from the first table
   $rows = $table->getElementsByTagName('tr');
   set_time_limit(1200);

   $rowcount = $rows->length;
		$insert_limit = 1000;

   // iterate over all but the first row
for ($i = 1; $i <= $insert_limit; ++$i)
{
$row = $rows->item($i)->textContent;
if ($i == $insert_limit)
{
   $loop_1 = time() + microtime();
   echo "Rows = " . $insert_limit . "\tTime:\t " . round($loop_1 - $start_time, 4) . "<br/>";
   if ($insert_limit < $rowcount)
   {
      $insert_limit += 500;
   }
}
}

 

I'll try to explain as best I can.  The html document contain 8163 rows in the table I am parsing and using PHP just to loop over the rows and assigning the textContent to a variable becomes slower the more rows you process.  Here is the resulting output of the above code.

 

Rows = 1000 Time: 2.1187

Rows = 1500 Time: 3.5387

Rows = 2000 Time: 5.7027

Rows = 2500 Time: 8.5453

Rows = 3000 Time: 12.0353

Rows = 3500 Time: 16.1604

Rows = 4000 Time: 20.917

Rows = 4500 Time: 26.3189

Rows = 5000 Time: 32.3477

Rows = 5500 Time: 39.0064

Rows = 6000 Time: 46.2941

Rows = 6500 Time: 54.2183

Rows = 7000 Time: 62.8111

Rows = 7500 Time: 72.1436

Rows = 8000 Time: 81.9869

Rows = 8500 Time: 92.3004

 

Now that's slow, the time is in seconds!

 

Here's the same code written in Javascript together with it's results :-

function rowloop()
{
var mytabs = document.getElementsByTagName("TABLE");
for (i=0;i<mytabs.length;++i)
{
   if (mytabs[i].getAttribute("name") == "GR1Table")
   {
     mytab = mytabs[i];
     break;
   }
}

var myrows = mytab.getElementsByTagName("TR");
var rowcount = myrows.length;
var insert_limit = 1000;
var startTime = new Date();
for (i=1;i <= insert_limit;++i)
{
var myrow = myrows[i].textContent;
  if (i == insert_limit)
  {
    var endTime = new Date();
    var totalTime = endTime-startTime;
    document.write("Rows = " + insert_limit + " Time: " + totalTime + "ms<br/>");
    if (insert_limit < rowcount)
    {
       insert_limit += 500;
    }
  }
}
}

And here's the results for the javascript code :-

 

Rows = 1000 Time: 16ms

Rows = 1500 Time: 643ms

Rows = 2000 Time: 650ms

Rows = 2500 Time: 658ms

Rows = 3000 Time: 666ms

Rows = 3500 Time: 674ms

Rows = 4000 Time: 682ms

Rows = 4500 Time: 690ms

Rows = 5000 Time: 698ms

Rows = 5500 Time: 706ms

Rows = 6000 Time: 789ms

Rows = 6500 Time: 797ms

Rows = 7000 Time: 805ms

Rows = 7500 Time: 813ms

Rows = 8000 Time: 821ms

 

Can anyone explain this for me as it's driving me up the wall?

 

I am running php 5.2.5 but have also tried 5.2.4 with no difference to the result.

 

Thanks in advance.

Craig

 

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.