Jump to content

gingerninja

New Members
  • Posts

    1
  • Joined

  • Last visited

    Never

Profile Information

  • Gender
    Not Telling

gingerninja's Achievements

Newbie

Newbie (1/5)

0

Reputation

  1. Dear All, I have been trying to speed up a script I wrote that takes the content of a webpage (it's a SAP BI Web Query if anyone's wondering) and parses through it reading the content of a particular table and inserting into a mysql database table. Now before anyone asks, the problem has nothing to do with mysql. What I have found to be a performance problem is this bit of code :- $doc = new DOMDocument(); @$doc->loadHTMLFile("/tmp/test.html"); $tables = $doc->getElementsByTagName('table'); for ($i=0; $i < $tables->length; ++$i) { if ($tables->item($i)->getAttribute('name') == "GR1Table") { $table = $tables->item($i); $find_all_tables_time = time() + microtime(); break; } } // get all rows from the first table $rows = $table->getElementsByTagName('tr'); set_time_limit(1200); $rowcount = $rows->length; $insert_limit = 1000; // iterate over all but the first row for ($i = 1; $i <= $insert_limit; ++$i) { $row = $rows->item($i)->textContent; if ($i == $insert_limit) { $loop_1 = time() + microtime(); echo "Rows = " . $insert_limit . "\tTime:\t " . round($loop_1 - $start_time, 4) . "<br/>"; if ($insert_limit < $rowcount) { $insert_limit += 500; } } } I'll try to explain as best I can. The html document contain 8163 rows in the table I am parsing and using PHP just to loop over the rows and assigning the textContent to a variable becomes slower the more rows you process. Here is the resulting output of the above code. Rows = 1000 Time: 2.1187 Rows = 1500 Time: 3.5387 Rows = 2000 Time: 5.7027 Rows = 2500 Time: 8.5453 Rows = 3000 Time: 12.0353 Rows = 3500 Time: 16.1604 Rows = 4000 Time: 20.917 Rows = 4500 Time: 26.3189 Rows = 5000 Time: 32.3477 Rows = 5500 Time: 39.0064 Rows = 6000 Time: 46.2941 Rows = 6500 Time: 54.2183 Rows = 7000 Time: 62.8111 Rows = 7500 Time: 72.1436 Rows = 8000 Time: 81.9869 Rows = 8500 Time: 92.3004 Now that's slow, the time is in seconds! Here's the same code written in Javascript together with it's results :- function rowloop() { var mytabs = document.getElementsByTagName("TABLE"); for (i=0;i<mytabs.length;++i) { if (mytabs[i].getAttribute("name") == "GR1Table") { mytab = mytabs[i]; break; } } var myrows = mytab.getElementsByTagName("TR"); var rowcount = myrows.length; var insert_limit = 1000; var startTime = new Date(); for (i=1;i <= insert_limit;++i) { var myrow = myrows[i].textContent; if (i == insert_limit) { var endTime = new Date(); var totalTime = endTime-startTime; document.write("Rows = " + insert_limit + " Time: " + totalTime + "ms<br/>"); if (insert_limit < rowcount) { insert_limit += 500; } } } } And here's the results for the javascript code :- Rows = 1000 Time: 16ms Rows = 1500 Time: 643ms Rows = 2000 Time: 650ms Rows = 2500 Time: 658ms Rows = 3000 Time: 666ms Rows = 3500 Time: 674ms Rows = 4000 Time: 682ms Rows = 4500 Time: 690ms Rows = 5000 Time: 698ms Rows = 5500 Time: 706ms Rows = 6000 Time: 789ms Rows = 6500 Time: 797ms Rows = 7000 Time: 805ms Rows = 7500 Time: 813ms Rows = 8000 Time: 821ms Can anyone explain this for me as it's driving me up the wall? I am running php 5.2.5 but have also tried 5.2.4 with no difference to the result. Thanks in advance. Craig
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.