Jump to content

Speed an update loop up? (thousands of records)


Go to solution Solved by mac_gyver,

Recommended Posts

 Was wondering if there is a way to speed the loop below up.

 

 

input array format, it can be as large as 3K items.

    [410] => Array
        (
            [0] => FLTVY3226Y
            [1] => 47.19
            [2] => 46.583
            [3] => 45.583
            [4] => 47.583
        )

    [411] => Array
        (
            [0] => FLTVY3227X
            [1] => 65.927
            [2] => 65.110
            [3] => 64.11
            [4] => 66.11
        )

    [412] => Array
        (
            [0] => FLTVY3228X
            [1] => 63.553
            [2] => 63.233
            [3] => 62.233
            [4] => 64.233
        )

the loathsome loop

    // =============================================================================
    // array..
    // 0 name
    // 1 actual
    // 2 nominal
    // 3 ltl
    // 4 utl
    // =============================================================================
        $adify = 0;
        $mdify = 0;
        foreach($blobarray as $ft){
          $sql = "SELECT ID, dev_pushpop FROM dev_check WHERE dev_partno = '".$pn."' and dev_feature = '".$ft[0]."' LIMIT 1";
          $result = $db->query($sql);
          $fcnt = mysqli_num_rows($result);
          if($fcnt < 1){
            $sql = "INSERT INTO `dev_check`(`dev_partno`, `dev_feature`, `dev_ltol`, `dev_utol`, `dev_pushpop`) VALUES ('".$pn."','".$ft[0]."', ".$ft[3].", ".$ft[4].", '".$ft[1]."')";
            $insert_row = $db->query($sql);
            $adify = $adify + 1;
          }else{
            $row = $result->fetch_assoc();
            $pushpop = explode(";", $row['dev_pushpop']);
            $pushpop[] = $ft[1];
            $dsb = array_slice($pushpop, ($deviation_stack_size * -1));
            $sqlb = "UPDATE dev_check SET dev_pushpop = '".implode(";", $dsb)."' WHERE dev_partno = '".$pn."' and dev_feature = '".$ft[0]."'";
            $modify_row = $db->query($sqlb);
            $mdify = $mdify + 1;
          }
        }
    // =============================================================================

The data being updated is in a delimited format similar to, basically just adding a value at the end. and deleting one from the beginning.

0.638; 0.656; 0.771; 0.563; 0.748; 0.953; 0.980; 1.108; 0.826; 1.058; 0.986; 1.147; 1.075; 1.047; 0.759; 0.840; 0.909; 1.037; 0.893; 1.146; 1.018; 0.972; 0.911; 0.919; 0.834; 1.055; 0.847; 0.748; 0.914; 0.724

 

 

previous versions did use prepared statements. This iteration is slightly faster. Open to changing anything to speed it up as long as the data being recorded is similar.

First, learn the concept of normalization and repair your database. Multiple values are not stored as comma-separated lists.

 

This will immediately make the weird select-then-insert-or-update logic unnecessary, because all you need is one INSERT query per record. If that's still too slow, try insert the records in chunks (i. e. a big INSERT query with multiple values).

 

And do use prepared statements. Those unescaped values everywhere are disturbing.

Tried that in one of the earlier versions, but the table we’re sticking data in blew up rapidly. With this configuration we get about 83K records, and the record count hangs out there.

 

So, if we add the records as I did in the past, and you mention in this post. And past 30 samples for each unique  dev_partno/dev_feature combination we start deleting the older values. A table with approx. 2.5 million records is what we’d end up with. (I know this isn’t horrible)

 

The thought was, it might be slow at the beginning of this process (step #2 below) where we do a comparison to what is in that array to the datasets in the table so we would have to do a SELECT for each item in the array. Currently we’re grabbing one record; a rewrite would grab 30 records, in the midst of 2.5 million, 3000 times in the worst case.

 

The process now;

  1. We parse a text file, and generate an array from it (a pretty quick 0.5sec) for the larger files.
  2. That array is then compared to data in the table with a SELECT in a loop and a very simple analysis is done. This analysis brings a subset of data from that array to the user so he/she can view it. (larger files about 15 seconds, which is manageable) This step HAS to be quick, as this is not external to their process. The data viewed also includes a hidden form with values for ALL the features (that array)
  3. On evoking the form submit button the aforementioned loop is ran to update values for all features. (takes 120 sec worst case)

Full disclosure, we have not tried your suggestion past a couple 100K records. We just assumed it would be to slow at the analysis step (#2)

 

What we’ll try

I try it this way on a separate table/database and pound the schnockers out of it till we reach that size and do some comparisons, though I’m not too optimistic.

Edited by KillGorack

You should definitely do an extensive performance analysis (benchmarks, checking the query plans, maybe profiling) before you jump to the conclusion that a hack is needed. There are many ways to improve query performance, and there are just as many ways to screw it up (bad indexes, poorly written queries etc.). This has to be objectively checked.

 

MySQL also isn't the only database system. If the data specifically falls into the big-but-dumb category, chances are there's a better solution. That would be the second step.

 

Only if you've maxed out absolutely everything and still can't get a decent performance, then you might consider breaking the rules. But right now, I'm not convinced at all.

  • Solution

your overall processing is - insert new data, delete old data from groups having a size greater than the queue size (30)

 

from fastest to slowest, the following is a list of ways to insert data -

 

1) LOAD DATA INFILE query

 

2) LOAD DATA LOCAL INFILE query

 

3) a multi-value prepared INSERT query. you can dynamically prepare a single multi-value INSERT query with a block size to handle most of the data, then dynamically prepare a unique multi-value INSERT query to handle any final partial blocks of data.

 

4) a multi-value non-prepared INSERT query

 

5) a single value prepared INSERT query

 

6) a single value non-prepared INSERT query

 

if you can, write the new data to a file and use option #1 or #2. this will be about 20 times faster than using single INSERT query inside a loop.

 

reusing a prepared query will save about 5%, over a non-prepared query. using a new prepared query each time will actually take longer. if you had a problem with using prepared queries, it was likely due to this. if you want help with your prepared query code, you would need to post it.

 

a multi-value INSERT query can easily insert 5K -10K rows in a single query (the limit is the max allowed packet size between the client/php and the db server, default is 1M byte) and will be between 5-10 times faster than using a single INSERT query inside a loop.

 

-------------------

 

to do this, i would insert all the new data using the fastest method available, then as a second process, delete the old data item from each group that exceeds the queue size. for your current concatenated data, you can use a multi-value INSERT ... ON DUPLICATE KEY UPDATE ... query (rather than three queries) to concatenate new data items onto the end of the list, then after all the new data has been inserted/updated, you can use SUBSTRING_INDEX() in one single UPDATE query to remove any leading data item from the start of the lists.

Thanks, just deleting and replacing reduced the time a lot. It didnt even occur to me.. I get to keep the stacks, and it's quick. I realise there were better options in your list. Tried below first and was satisfied.

All I had to do was was to bring that stack into the array.. easy peasy.

 

Thanks.

    // =============================================================================
    // array..
    // 0 name
    // 1 actual
    // 2 nominal
    // 3 ltl
    // 4 utl
    // 5 stack
    // =============================================================================
        $stmt = $db->prepare("DELETE FROM dev_check WHERE dev_partno = ?");
        $stmt->bind_param('s', $pn);
        $stmt->execute();
        $mdify = 0;
        $stmt = $db->prepare("INSERT INTO dev_check (dev_partno, dev_feature, dev_utol, dev_ltol, dev_pushpop) VALUES (?,?,?,?,?)");
        foreach($blobarray as $ft){
          $stmt->bind_param("ssdds", $pn, $ft[0], $ft[4], $ft[3], $ft[5]);
          $stmt->execute();
          $mdify = $mdify + 1;
        }
    // =============================================================================
Edited by KillGorack
This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.