performance on thousands of selects in while loop

caleb87 · November 21, 2014

Hello,

Any ideas why server resources not exhausted on the code below, when my.cnf and php.ini are both set to use half the resources each? PHP & MySQL will both utilize all allotted resources on other scripts, so it's not a tuning problem. There is no script-side tuning.

The bottleneck: pdo select as shown below

foreach($bigList as $listObject) {
                $sql = $dbl->prepare("SELECT * FROM fewMillionRows WHERE indexedCol1=:indexedCol1 AND indexedCol2=:indexedCol2 AND indexedCol3=:indexedCol3 AND indexedCol4=:indexedCol4 LIMIT 1");
                $sql->execute($preparedValues);
                $return = $sql->fetchAll(PDO::FETCH_ASSOC);
}

On a dedicated server with 8GB RAM, the server uses >5% cpu/ram but takes a long time to finish the script.

Second Question

What are some alternative designs?

Because the column values all happen to be alphanumeric, I could select the entire table and store it in an array. Accessing the keys like so: indexedCol1indexedCol2indexedCol3indexedCol4.

Results:

MyISAM - Select Whole Table: 30 seconds

MyISAM – Select Individual Rows (10k times) – 68 seconds

InnoDB – Select Whole Table: 30 seconds

InnoDB – Select Individual Rows (10k times) – 131 seconds

I am surprised it takes so long to select a whole table. The server resources use 1% for about 20 seconds, then cpu/ram jump to 30%+ for about 10 seconds. This is still drastically faster than individual selects.

In this instance, $bigList is over 500k items. At 68 seconds per 10k rows it’s absurdly long. Building an array with key/values is the only realistic way I currently know of, but I suspect there is a much better way of doing this.

As far as I know, I cannot do a select like so: SELECT * FROM t1 WHERE (column1,column2,column3) IN ((val1, val2,val3), (val4, val5,val6))

There is no way to determine whether a row was found for each entry as far as I know.

Barand · November 21, 2014

Does the array $biglist come from a database table?

Are your values indexcol1, indexcol2 etc from the objects in the biglist array?

caleb87 · November 21, 2014

Does the array $biglist come from a database table?

Are your values indexcol1, indexcol2 etc from the objects in the biglist array?

In this instance, $bigList is not from a database. It's a csv file.

Yes the column values are from $bigList ($preparedValues); I just didn't show binding the parameters to keep it short. It's merely example code.

Barand · November 21, 2014

Is it feasible for you to load the csv data into a table then join that table to you fewMillionRows table. That way you only need a single SELECT query

caleb87 · November 21, 2014

temporary table

Dumping 580k rows to tempTable ("INSERT INTO tempTable (col1, col2, col3, col4) VALUES (1, 2, 3, 4), (5, 6, 7,

, (9, 10, 11, 12)"):

InnoDB: 200 seconds

MyISAM: 130 seconds

--251 bytes per row

"SELECT * FROM tempTable LEFT JOIN fewMillionRowsON tempTable.col1=fewMillionRows.col1 AND tempTable.col2=fewMillionRows.col2 AND tempTable.col3=fewMillionRows.col3 AND tempTable.col4=fewMillionRows.col4 LIMIT 1000"

The select took 30 seconds on 1000 limit.

-----------------------------------------

asynchronous queries

It appears I could use mysqli asynchronous queries (pdo doesn't appear to have this; but I haven't looked much yet). At first glance this seems to be the best solution or the php array dump.

-----------------------------------------

original question

Even if I find a superb alternative, I would really like to understand why it takes PHP so long to communicate to MySQL for each select in my original question. It's about .0068 seconds per query. It would be nice if it were .00068 instead.

I did duplicate fewMillionRows to a memory engine, which reduced the select on 10k rows to 43.42 seconds. I don't really like doing this though, since any time the server is reboot it has to be rebuilt.

mac_gyver · November 21, 2014

It's a csv file.

this sounds like the - insert new data, update existing data, problem.

you don't need to select existing data first to do this. you can either use a REPLACE query or an INSERT ... ON DUPLICATE KEY UPDATE query.

also, your prepare() and bind statements should not be inside the loop and if you are binding input parameters, you would not supply the data values in the execute() statement.

edit: i see you posted a reply while i was typing this. for most simple queries, the time taken to communicate the query/data to the database server, which is all done using characters, is much longer than the time it takes to actually run the query. even with a prepared query, assuming you correctly prepare and bind things once, before the start of any loop, you still must communicate the values for the bound parameters. the only way to substantially reduce the amount of time taken for a large number of queries is to reduce the total number of queries and communication round trips (there is handshaking for each communication that also takes up time.)

Barand · November 22, 2014

Why the LEFT join? INNER join is much faster

And do you need to load the csv file every time you need to run the query?

Sign In

performance on thousands of selects in while loop

Recommended Posts

caleb87

Link to comment

Share on other sites

Barand

Link to comment

Share on other sites

caleb87

Link to comment

Share on other sites

Barand

Link to comment

Share on other sites

caleb87

Link to comment

Share on other sites

mac_gyver

Link to comment

Share on other sites

Barand

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information