gingerninja Posted May 5, 2008 Share Posted May 5, 2008 Dear All, I have been trying to speed up a script I wrote that takes the content of a webpage (it's a SAP BI Web Query if anyone's wondering) and parses through it reading the content of a particular table and inserting into a mysql database table. Now before anyone asks, the problem has nothing to do with mysql. What I have found to be a performance problem is this bit of code :- $doc = new DOMDocument(); @$doc->loadHTMLFile("/tmp/test.html"); $tables = $doc->getElementsByTagName('table'); for ($i=0; $i < $tables->length; ++$i) { if ($tables->item($i)->getAttribute('name') == "GR1Table") { $table = $tables->item($i); $find_all_tables_time = time() + microtime(); break; } } // get all rows from the first table $rows = $table->getElementsByTagName('tr'); set_time_limit(1200); $rowcount = $rows->length; $insert_limit = 1000; // iterate over all but the first row for ($i = 1; $i <= $insert_limit; ++$i) { $row = $rows->item($i)->textContent; if ($i == $insert_limit) { $loop_1 = time() + microtime(); echo "Rows = " . $insert_limit . "\tTime:\t " . round($loop_1 - $start_time, 4) . "<br/>"; if ($insert_limit < $rowcount) { $insert_limit += 500; } } } I'll try to explain as best I can. The html document contain 8163 rows in the table I am parsing and using PHP just to loop over the rows and assigning the textContent to a variable becomes slower the more rows you process. Here is the resulting output of the above code. Rows = 1000 Time: 2.1187 Rows = 1500 Time: 3.5387 Rows = 2000 Time: 5.7027 Rows = 2500 Time: 8.5453 Rows = 3000 Time: 12.0353 Rows = 3500 Time: 16.1604 Rows = 4000 Time: 20.917 Rows = 4500 Time: 26.3189 Rows = 5000 Time: 32.3477 Rows = 5500 Time: 39.0064 Rows = 6000 Time: 46.2941 Rows = 6500 Time: 54.2183 Rows = 7000 Time: 62.8111 Rows = 7500 Time: 72.1436 Rows = 8000 Time: 81.9869 Rows = 8500 Time: 92.3004 Now that's slow, the time is in seconds! Here's the same code written in Javascript together with it's results :- function rowloop() { var mytabs = document.getElementsByTagName("TABLE"); for (i=0;i<mytabs.length;++i) { if (mytabs[i].getAttribute("name") == "GR1Table") { mytab = mytabs[i]; break; } } var myrows = mytab.getElementsByTagName("TR"); var rowcount = myrows.length; var insert_limit = 1000; var startTime = new Date(); for (i=1;i <= insert_limit;++i) { var myrow = myrows[i].textContent; if (i == insert_limit) { var endTime = new Date(); var totalTime = endTime-startTime; document.write("Rows = " + insert_limit + " Time: " + totalTime + "ms<br/>"); if (insert_limit < rowcount) { insert_limit += 500; } } } } And here's the results for the javascript code :- Rows = 1000 Time: 16ms Rows = 1500 Time: 643ms Rows = 2000 Time: 650ms Rows = 2500 Time: 658ms Rows = 3000 Time: 666ms Rows = 3500 Time: 674ms Rows = 4000 Time: 682ms Rows = 4500 Time: 690ms Rows = 5000 Time: 698ms Rows = 5500 Time: 706ms Rows = 6000 Time: 789ms Rows = 6500 Time: 797ms Rows = 7000 Time: 805ms Rows = 7500 Time: 813ms Rows = 8000 Time: 821ms Can anyone explain this for me as it's driving me up the wall? I am running php 5.2.5 but have also tried 5.2.4 with no difference to the result. Thanks in advance. Craig Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.