NotionCommotion Posted December 13, 2016 Share Posted December 13, 2016 (edited) I will be performing multiple updates. All updates will include data for the same columns (i.e. pk, c1 and c2). The number of records to be updates is variable, and will span from 1 to thousands (example below shows three records). The data to be updated is suspect, and need be escaped. PDO is available. Multiple queries is simple, however, not as efficient, and am thinking of doing multiple updates in a single query. max_allowed_packet is set to 16,777,216 on my server. Does this just mean my SQL needs to be under 116,777,216 bytes? Does this take into account the prepared statement placeholder values? Seems like question mark placeholders would be easiest. Agree? How can multiple updates be performed? Maybe with an ON DUPLICATE? Any other way? Any other advice? Thanks GIVEN: [[2,4,12],[4,6,4],[8,2,15]] UPDATE t SET c1=4, c2=12 WHERE pk=2; UPDATE t SET c1=6, c2=6 WHERE pk=4; UPDATE t SET c1=8, c2=2 WHERE pk=15; Edited December 13, 2016 by Psycho Quote Link to comment Share on other sites More sharing options...
kicken Posted December 13, 2016 Share Posted December 13, 2016 (edited) Using INSERT ... ON DUPLICATE would work, but may have undesirable side-effects such as incrementing the pk if it's an AUTO_INCREMENT value. Another option is to use CASE WHEN conditions to select the appropriate value and a simple IN condition on the where clause. UPDATE t SET c1 = CASE WHEN pk = 2 THEN 4 WHEN pk = 4 THEN 6 WHEN pk = 15 THEN 8 ELSE c1 END , c2 = CASE WHEN pk = 2 THEN 12 WHEN pk = 4 THEN 6 WHEN pk = 15 THEN 2 ELSE c2 END WHERE pk IN (2,4,15) If you're concerned about possibly hitting the maximum packet size, break your updates into chunks, say like 1000 rows at a time. You can prepare the query once then just re-execute it with different parameter values for each chunk. Edited December 13, 2016 by kicken Quote Link to comment Share on other sites More sharing options...
Psycho Posted December 13, 2016 Share Posted December 13, 2016 Not sure what ON DUPLICATE would help with since you are updating records. Your original title was about INSERTS, but you asked to have the title changed about UPDATES (which I did). So, the "ON DUPLICATE" wouldn't apply. Creating a massive UPDATE query with case statements would make problems very hard to debug. I would stick with a single prepared statement (make sure the emulate prepares parameter is set to false when creating the connection to the DB). That way if any individual updates fail, you can log and take any action needed. Breaking it up into chunks is a good idea though. However, if all of that doesn't work, I can think of another option - depending on the population of the values to be updated. If the possible values to be updated are not huge you could create a process that updates all the records with a C1 value of "1", then all the C1 values of "2", etc. Then, do the same for the C2 values. This assumes you have the data in a way that can be easily manipulated to determine those groups of records. Run something like this in a loop, changing the c1 value and the list of primary keys. UPDATE t SET c1 = 1 WHERE pk IN (2,4,15, 17, 22, 34, 66) Quote Link to comment Share on other sites More sharing options...
NotionCommotion Posted December 13, 2016 Author Share Posted December 13, 2016 (edited) Thanks kicken, Why is AUTO_INCREMENT relevant? The only problem I see with the ON DUPLICATE solution is if I receive ['pk'=>99,'c1'=>4,'c2'=>12] and record with primary key 99 doesn't exist, a new record will be inserted. Any thing else? I expect the query would look something like the following. Is there a slick way to create [2,4,12,4,6,4,8,2,15] out of [[2,4,12],[4,6,4],[8,2,15]] other than looping over them?. INSERT INTO t(pk,c1,c2) VALUES (2,4,12),(4,6,4),(8,2,15)ON DUPLICATE KEY UPDATE c1=VALUES(c1),c2=VALUES(c2); Edited December 13, 2016 by NotionCommotion Quote Link to comment Share on other sites More sharing options...
NotionCommotion Posted December 13, 2016 Author Share Posted December 13, 2016 Thanks Psycho, Thank you for changing the title to "UPDATE". The ON DUPLICATE is just a hack to allow multiple updates, and there should be no INSERTS since all the records already exist.. See my my response to Kicken's post. I agree the CASE statements seem a little scary from a troubleshooting standpoint. Note sure I understand your proposed second solution. Quote Link to comment Share on other sites More sharing options...
Jacques1 Posted December 13, 2016 Share Posted December 13, 2016 What makes you think that creating a prepared statement and reusing it for multiple updates is inefficient? What time did you measure? What time do you target? Quote Link to comment Share on other sites More sharing options...
NotionCommotion Posted December 13, 2016 Author Share Posted December 13, 2016 What makes you think that creating a prepared statement and reusing it for multiple updates is inefficient? What time did you measure? What time do you target? I haven't measured it, but have read multiple places that it is about three times slower. I haven't set a target and acknowledge it is therefore academic, but am still curious. Quote Link to comment Share on other sites More sharing options...
kicken Posted December 13, 2016 Share Posted December 13, 2016 Why is AUTO_INCREMENT relevant? The only problem I see with the ON DUPLICATE solution is if I receive ['pk'=>99,'c1'=>4,'c2'=>12] and record with primary key 99 doesn't exist, a new record will be inserted. Any thing else? From the manual With an auto-increment column, an INSERT statement increases the auto-increment value but UPDATE does not. Basically attempting to do an INSERT will increment your ID, even if the insert fails. As such if you tried to "insert" 1000 rows just to update them via ON DUPLICATE you would basically add 1000 to your next actual inserted row's ID value. I'd agree with Jacques1 though, test first to see if it's even necessary to try and batch the updates. I've only encountered a couple instances in the past where using the CASE WHEN trick provided any significant benefit over just doing multiple single updates. Quote Link to comment Share on other sites More sharing options...
NotionCommotion Posted December 13, 2016 Author Share Posted December 13, 2016 Thanks Kicken, but I am not witnessing your prediction. What am I missing? Thanks mysql> CREATE TABLE IF NOT EXISTS t ( -> pk INT NOT NULL AUTO_INCREMENT, -> c1 INT NOT NULL, -> c2 INT NOT NULL, -> PRIMARY KEY (pk)) -> ENGINE = InnoDB; INSERT INTO t(pk,c1,c2) VALUES(NULL,11,11); INSERT INTO t(pk,c1,c2) VALUES(NULL,22,22); INSERT INTO t(pk,c1,c2) VALUES(NULL,33,33); SELECT * FROM t; INSERT INTO t(pk,c1,c2) VALUES (1,111,111),(2,222,222),(3,333,333) ON DUPLICATE KEY UPDATE c1=VALUES(c1),c2=VALUES(c2); SELECT * FROM t;Query OK, 0 rows affected (0.04 sec) mysql> mysql> INSERT INTO t(pk,c1,c2) VALUES(NULL,11,11); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO t(pk,c1,c2) VALUES(NULL,22,22); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO t(pk,c1,c2) VALUES(NULL,33,33); Query OK, 1 row affected (0.00 sec) mysql> SELECT * FROM t; +----+----+----+ | pk | c1 | c2 | +----+----+----+ | 1 | 11 | 11 | | 2 | 22 | 22 | | 3 | 33 | 33 | +----+----+----+ 3 rows in set (0.00 sec) mysql> mysql> INSERT INTO t(pk,c1,c2) VALUES (1,111,111),(2,222,222),(3,333,333) -> ON DUPLICATE KEY UPDATE c1=VALUES(c1),c2=VALUES(c2); Query OK, 6 rows affected (0.00 sec) Records: 3 Duplicates: 3 Warnings: 0 mysql> mysql> SELECT * FROM t; +----+-----+-----+ | pk | c1 | c2 | +----+-----+-----+ | 1 | 111 | 111 | | 2 | 222 | 222 | | 3 | 333 | 333 | +----+-----+-----+ 3 rows in set (0.00 sec) mysql> INSERT INTO t(pk,c1,c2) VALUES(NULL,44,4); Query OK, 1 row affected (0.00 sec) mysql> mysql> SELECT * FROM t; +----+-----+-----+ | pk | c1 | c2 | +----+-----+-----+ | 1 | 111 | 111 | | 2 | 222 | 222 | | 3 | 333 | 333 | | 4 | 44 | 4 | +----+-----+-----+ 4 rows in set (0.00 sec) mysql> Quote Link to comment Share on other sites More sharing options...
NotionCommotion Posted December 13, 2016 Author Share Posted December 13, 2016 Sorry, my previous post didn't show all the SELECT statement results, and I didn't notice until the edit time expired... mysql> DROP TABLE t; Query OK, 0 rows affected (0.00 sec) mysql> mysql> CREATE TABLE IF NOT EXISTS t ( -> pk INT NOT NULL AUTO_INCREMENT, -> c1 INT NOT NULL, -> c2 INT NOT NULL, -> PRIMARY KEY (pk) -> ) -> ENGINE = InnoDB; Query OK, 0 rows affected (0.01 sec) mysql> mysql> INSERT INTO t(pk,c1,c2) VALUES(NULL,11,11); Query OK, 1 row affected (0.00 sec) mysql> mysql> INSERT INTO t(pk,c1,c2) VALUES(NULL,22,22); Query OK, 1 row affected (0.00 sec) mysql> mysql> INSERT INTO t(pk,c1,c2) VALUES(NULL,33,33); Query OK, 1 row affected (0.00 sec) mysql> mysql> SELECT * FROM t; +----+----+----+ | pk | c1 | c2 | +----+----+----+ | 1 | 11 | 11 | | 2 | 22 | 22 | | 3 | 33 | 33 | +----+----+----+ 3 rows in set (0.00 sec) mysql> mysql> INSERT INTO t(pk,c1,c2) VALUES (1,111,111),(2,222,222),(3,333,333) -> ON DUPLICATE KEY UPDATE c1=VALUES(c1),c2=VALUES(c2); Query OK, 6 rows affected (0.00 sec) Records: 3 Duplicates: 3 Warnings: 0 mysql> mysql> SELECT * FROM t; +----+-----+-----+ | pk | c1 | c2 | +----+-----+-----+ | 1 | 111 | 111 | | 2 | 222 | 222 | | 3 | 333 | 333 | +----+-----+-----+ 3 rows in set (0.00 sec) mysql> mysql> INSERT INTO t(pk,c1,c2) VALUES(NULL,44,44); Query OK, 1 row affected (0.00 sec) mysql> mysql> SELECT * FROM t; +----+-----+-----+ | pk | c1 | c2 | +----+-----+-----+ | 1 | 111 | 111 | | 2 | 222 | 222 | | 3 | 333 | 333 | | 4 | 44 | 44 | +----+-----+-----+ 4 rows in set (0.00 sec) mysql> Quote Link to comment Share on other sites More sharing options...
kicken Posted December 13, 2016 Share Posted December 13, 2016 Thanks Kicken, but I am not witnessing your prediction. What am I missing? Thanks My statement is just based on what the documentation says. From some brief testing however it seems as though it may only increment if at least one row is successfully inserted. If even a single row is added then the ID will increment for every row, even those that were updated. If however all rows end up being updates it appears to not affect the increment value. Quote Link to comment Share on other sites More sharing options...
Barand Posted December 13, 2016 Share Posted December 13, 2016 I have just done a benchmark. Each test inserts 1000 records then updates every record. The first pass uses single queries with multiple data values, the updates using the ON DUPLICATE hack. The second pass uses multiple executes of the prepared queries for both update and inserts. The results were +----------------------------------------------------------+ | Single query, multiple data | Time | +-----------------------------------------------+----------+ | Insert records | 0.1222 | | Update records | 0.1441 | +-----------------------------------------------+----------+ +----------------------------------------------------------+ | Single prepare, multiple execute | Time | +-----------------------------------------------+----------+ | Insert records | 62.3395 | | Update records | 68.0629 | +-----------------------------------------------+----------+ The code $N=1000; // set # of records for the benchmark; function resetData($pdo) { $pdo->exec("DROP TABLE IF EXISTS person"); $pdo->exec("CREATE TABLE person ( id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, fname VARCHAR(40), lname VARCHAR(40) )"); } function showTimings($desc, $t1, $t2, $t3) { echo '<pre>'; echo "+----------------------------------------------------------+\n"; printf("| %-45s | Time |\n", $desc); echo "+-----------------------------------------------+----------+\n"; printf("| %-45s | %8.4f |\n", 'Insert records', $t2-$t1); printf("| %-45s | %8.4f |\n", 'Update records', $t3-$t2); echo "+-----------------------------------------------+----------+\n"; echo '</pre>'; } // // start the tests // resetData($pdo); $t1 = microtime(1); // pdo multiple insert $data=[]; $params=[]; for ($i=1; $i<=$N; $i++) { $fname = 'aaaaa'.$i; $lname = 'bbbbb'.$i; $data[]="(?,?)"; array_push($params,$fname,$lname); } $sql = "INSERT INTO person (fname,lname) VALUES ".join(',', $data); $stmt=$pdo->prepare($sql); $stmt->execute($params); $t2 = microtime(1); // pdo multiple insert..on duplicate updates $data=[]; $params=[]; for ($i=1; $i<=$N; $i++) { $fname = 'ccccc'.$i; $lname = 'ddddd'.$i; $data[]="(?,?,?)"; array_push($params,$i,$fname,$lname); } $sql = "INSERT INTO person (id,fname,lname) VALUES ".join(',', $data) . " ON DUPLICATE KEY UPDATE fname = VALUES(fname), lname = VALUES(lname)"; $stmt=$pdo->prepare($sql); $stmt->execute($params); $t3 = microtime(1); showTimings('Single query, multiple data', $t1, $t2, $t3) ; // // Method 2 - prepare then execute many // resetData($pdo); $t1 = microtime(1); // PDO multiple insert executes $sql = "INSERT INTO person (fname,lname) VALUES (?,?)"; $stmt = $pdo->prepare($sql); for ($i=1; $i<=$N; $i++) { $fname = 'aaaaa'.$i; $lname = 'bbbbb'.$i; $stmt->execute([$fname,$lname]); } $t2 = microtime(1); // PDO multiple update executes $sql = "UPDATE person SET fname = ? , lname = ? WHERE id = ?"; $stmt = $pdo->prepare($sql); for ($i=1; $i<=$N; $i++) { $fname = 'ccccc'.$i; $lname = 'ddddd'.$i; $stmt->execute([$fname,$lname,$i]); } $t3 = microtime(1); showTimings('Single prepare, multiple execute', $t1, $t2, $t3) ; 3 Quote Link to comment Share on other sites More sharing options...
NotionCommotion Posted December 13, 2016 Author Share Posted December 13, 2016 Pesky documentation! I think it will only be incremented if a row is actually inserted. See the following queries which were performed after the ones in my previous post. mysql> INSERT INTO t(pk,c1,c2) VALUES (1,1111,1111),(3,3333,3333),(4,4444,4444) -> ON DUPLICATE KEY UPDATE c1=VALUES(c1),c2=VALUES(c2); Query OK, 6 rows affected (0.00 sec) Records: 3 Duplicates: 3 Warnings: 0 mysql> mysql> SELECT * FROM t -> ; +----+------+------+ | pk | c1 | c2 | +----+------+------+ | 1 | 1111 | 1111 | | 2 | 222 | 222 | | 3 | 3333 | 3333 | | 4 | 4444 | 4444 | +----+------+------+ 4 rows in set (0.00 sec) mysql> INSERT INTO t(pk,c1,c2) VALUES (5,5555,5555),(3,3333,3333),(4,4444,4444) -> ON DUPLICATE KEY UPDATE c1=VALUES(c1),c2=VALUES(c2); Query OK, 1 row affected (0.01 sec) Records: 3 Duplicates: 0 Warnings: 0 mysql> mysql> SELECT * FROM t; +----+------+------+ | pk | c1 | c2 | +----+------+------+ | 1 | 1111 | 1111 | | 2 | 222 | 222 | | 3 | 3333 | 3333 | | 4 | 4444 | 4444 | | 5 | 5555 | 5555 | +----+------+------+ 5 rows in set (0.00 sec) mysql> Quote Link to comment Share on other sites More sharing options...
NotionCommotion Posted December 13, 2016 Author Share Posted December 13, 2016 I have just done a benchmark. Each test inserts 1000 records then updates every record. The first pass uses single queries with multiple data values, the updates using the ON DUPLICATE hack. The second pass uses multiple executes of the prepared queries for both update and inserts. The results were Wow, not 3 times faster but 600 times. As a mater of principle, seems like the way to go. Kinda forgot what I was originally asking... Quote Link to comment Share on other sites More sharing options...
mac_gyver Posted December 13, 2016 Share Posted December 13, 2016 if c1, c2, and pk in your example are the only columns, you can use a multi-value REPLACE query, otherwise use a multi-value INSERT ... ON DUPLICATE KEY UPDATE .. query (which, since the the data already exists, implements a multi-value UPDATE query). the max packet size refers to the data sent in each communication back and forth between php and the database server. for a true prepared query, the sql statement (which is a string) would be sent in one communication, and the data for each execute statement would be another communication. named place-holders are implemented solely in the php PDO driver. the actual sql statement that is sent to the db server has had them replaced with ? and the driver maps the names to the place-holder parameter number when it builds the communication command to send the data to the db server - here's an interesting read on what the binary transfer protocol is for the execute statement when using prepared queries - https://dev.mysql.com/doc/internals/en/com-stmt-execute.html#packet-COM_STMT_EXECUTE note: the binary transfer protocol is only available when using prepared queries and then it only really saves time if transferring binary data. for string data values, which is the default for PDO, unless you specifically bind the data using a non-string data type, the data is still sent as characters and there's no time savings in the communication. and here's the reason for the performance difference - for simple queries, the communication time (handshaking + actual transfer) for both the prepare() and execute() statements is much greater than the time to actually prepare (parse/plan) or to execute the query on the database server, so, from the php side, the only way to significantly reduce the amount of time taken is to reduce the number of separate communications. running a single row prepared query inside of a loop only saves about 5% (bench-marked using a mysqli prepared query a while ago) of the overall time, compared to running a non-prepared query inside of a loop, because all you have eliminated from the loop code is the communication of the sql statement string and the parse/plan step. you are still performing a separate communication for each pass through the loop and have to spend the time for the handshaking + actual transfer for each communication. the only signification time savings i have seen is when using a multi-value query, which Barand has also confirmed in his tests. to fully implement a bulk method, figure out the maximum number of data values you want to send at once. if the actual amount of data is less then the maximum you have chosen, dynamically produce the multi-value sql statement of your choice for that amount of data, prepare it, then supply the data as an array to the execute statement. if the actual amount of data is greater than the maximum you have chosen, break the data into equal size blocks that are less than the maximum. if there's an odd amount of data, you can include one dummy/repeat set of values to make all the blocks the same size. then, dynamically build the multi-value query statement with the correct number of sets of place-holders to match the block size, prepare the query, then loop over the actual data and break it into arrays of the correct size, fixing up the size of the last block if it is odd, and supply each array of data to the execute() statement. 1 Quote Link to comment Share on other sites More sharing options...
NotionCommotion Posted December 14, 2016 Author Share Posted December 14, 2016 Thanks MacGyver. Nice post. I've never used REPLACE before. Why do you recommend if c1 and c2 are the only columns. REPLACE works exactly like INSERT, except that if an old row in the table has the same value as a new row for a PRIMARY KEY or a UNIQUE index, the old row is deleted before the new row is inserted Quote Link to comment Share on other sites More sharing options...
mac_gyver Posted December 14, 2016 Share Posted December 14, 2016 if you are updating ALL the 'data' columns that exist in a row for any index value, you might as well just replace the row. i don't know how the performance of this multi-value method compares with the multi-value INSERT ... ON DUPLICATE KEY UPDATE ... method. Quote Link to comment Share on other sites More sharing options...
NotionCommotion Posted December 14, 2016 Author Share Posted December 14, 2016 Can REPLACE result in a foreign reference constraint, or worse, CASCADE DELETE? Quote Link to comment Share on other sites More sharing options...
Barand Posted December 14, 2016 Share Posted December 14, 2016 Benchmark rerun with REPLACE also: +----------------------------------------------------------+ | Single query, multiple data | Time | +-----------------------------------------------+----------+ | Insert records | 0.1552 | | Update records | 0.1669 | | Replace records | 0.1449 | +-----------------------------------------------+----------+ +----------------------------------------------------------+ | Single prepare, multiple execute | Time | +-----------------------------------------------+----------+ | Insert records | 66.9068 | | Update records | 69.7397 | | Replace records | 67.3186 | +-----------------------------------------------+----------+ Quote Link to comment Share on other sites More sharing options...
kicken Posted December 14, 2016 Share Posted December 14, 2016 Can REPLACE result in a foreign reference constraint, or worse, CASCADE DELETE? Yes. If you are using foreign keys replace is not ideal. 1 Quote Link to comment Share on other sites More sharing options...
NotionCommotion Posted December 14, 2016 Author Share Posted December 14, 2016 Thank you all for your help. Particularly MacGyver for your great explanation, Kicken for preventing me from accidentally cascade deleting all my data, and Barand for your awesome benchmarking test. Quote Link to comment Share on other sites More sharing options...
NotionCommotion Posted December 14, 2016 Author Share Posted December 14, 2016 By the way, below is what I eventually used. Thanks again for the help. private function savePoints($points) { /* max_allowed_packet=16,777,216 Assume 16 characters per parameter Other text is about 160 characters. 11 bytes per character? (16777216*.8-160*11)/16*11 = 95K. Just use 50K */ foreach(array_chunk($points, 50000) as $p) { $sql = 'INSERT INTO points (id, accounts_id, value, units) VALUES '.rtrim(str_repeat('(?,?,?,?),',count($p)/4),','). ' ON DUPLICATE KEY UPDATE value_old=value, value = VALUES(value), units = VALUES(units), timestamp=NOW()'; $stmt=$this->db->prepare($sql); $stmt->execute(call_user_func_array('array_merge', $p)); } } Quote Link to comment Share on other sites More sharing options...
kicken Posted December 15, 2016 Share Posted December 15, 2016 (edited) Another alternative is to do a multi-table update where one of the tables is derived of your values. Unfortunately the mysql way of doing this is a bit verbose as you have to do a bunch of unions. For example: UPDATE t INNER JOIN ( SELECT NULL as pk, NULL as c1, NULL as c2 UNION ALL SELECT 2, 4, 12 UNION ALL SELECT 4, 6, 6 UNION ALL SELECT 15, 8, 2 ) v SET t.c1 = v.c1 , t.c2 = v.c2 The first row in the derived table is just to set the column names. By just adding a row with nulls you avoid having to handle naming in your code when generating the union block and the nulls won't match anything on the join. Using an inner join means you'll only be updating matching rows so there's no need for an explicit where clause in the query. Edited December 15, 2016 by kicken Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.