Jump to content

performance on thousands of selects in while loop


Recommended Posts

Hello,

 

Any ideas why server resources not exhausted on the code below, when my.cnf and php.ini are both set to use half the resources each? PHP & MySQL will both utilize all allotted resources on other scripts, so it's not a tuning problem. There is no script-side tuning.

 

The bottleneck: pdo select as shown below

foreach($bigList as $listObject) {
                $sql = $dbl->prepare("SELECT * FROM fewMillionRows WHERE indexedCol1=:indexedCol1 AND indexedCol2=:indexedCol2 AND indexedCol3=:indexedCol3 AND indexedCol4=:indexedCol4 LIMIT 1");
                $sql->execute($preparedValues);
                $return = $sql->fetchAll(PDO::FETCH_ASSOC);
}

On a dedicated server with 8GB RAM, the server uses >5% cpu/ram but takes a long time to finish the script.

 

 

Second Question

What are some alternative designs?

 

Because the column values all happen to be alphanumeric, I could select the entire table and store it in an array. Accessing the keys like so: indexedCol1indexedCol2indexedCol3indexedCol4.

 

Results:

MyISAM - Select Whole Table: 30 seconds

MyISAM – Select Individual Rows (10k times) – 68 seconds

InnoDB – Select Whole Table: 30 seconds

InnoDB – Select Individual Rows (10k times) – 131 seconds

 

 

I am surprised it takes so long to select a whole table. The server resources use 1% for about 20 seconds, then cpu/ram jump to 30%+ for about 10 seconds. This is still drastically faster than individual selects.

 

In this instance, $bigList is over 500k items. At 68 seconds per 10k rows it’s absurdly long. Building an array with key/values is the only realistic way I currently know of, but I suspect there is a much better way of doing this.

 

 

As far as I know, I cannot do a select like so: SELECT * FROM t1 WHERE (column1,column2,column3) IN  ((val1, val2,val3), (val4, val5,val6))

There is no way to determine whether a row was found for each entry as far as I know. 

Does the array $biglist come from a database table?

Are your values indexcol1, indexcol2 etc from the objects in the biglist array?

 

In this instance, $bigList is not from a database. It's a csv file. 

 

Yes the column values are from $bigList ($preparedValues); I just didn't show binding the parameters to keep it short. It's merely example code.

temporary table
Dumping 580k rows to tempTable ("INSERT INTO tempTable (col1, col2, col3, col4) VALUES (1, 2, 3, 4), (5, 6, 7, 8), (9, 10, 11, 12)"):
InnoDB: 200 seconds

MyISAM: 130 seconds

--251 bytes per row

 

"SELECT * FROM tempTable LEFT JOIN fewMillionRowsON tempTable.col1=fewMillionRows.col1 AND tempTable.col2=fewMillionRows.col2 AND tempTable.col3=fewMillionRows.col3 AND tempTable.col4=fewMillionRows.col4 LIMIT 1000"
The select took 30 seconds on 1000 limit. 
-----------------------------------------
 
asynchronous queries
It appears I could use mysqli asynchronous queries (pdo doesn't appear to have this; but I haven't looked much yet). At first glance this seems to be the best solution or the php array dump.
 
-----------------------------------------
 
original question
Even if I find a superb alternative, I would really like to understand why it takes PHP so long to communicate to MySQL for each select in my original question. It's about .0068 seconds per query. It would be nice if it were .00068 instead. 
 
I did duplicate fewMillionRows to a memory engine, which reduced the select on 10k rows to 43.42 seconds. I don't really like doing this though, since any time the server is reboot it has to be rebuilt. 

It's a csv file. 

 

this sounds like the - insert new data, update existing data, problem.

 

you don't need to select existing data first to do this. you can either use a REPLACE query or an INSERT ... ON DUPLICATE KEY UPDATE query.

 

also, your prepare() and bind statements should not be inside the loop and if you are binding input parameters, you would not supply the data values in the execute() statement.

 

edit: i see you posted a reply while i was typing this. for most simple queries, the time taken to communicate the query/data to the database server, which is all done using characters, is much longer than the time it takes to actually run the query. even with a prepared query, assuming you correctly prepare and bind things once, before the start of any loop, you still must communicate the values for the bound parameters. the only way to substantially reduce the amount of time taken for a large number of queries is to reduce the total number of queries and communication round trips (there is handshaking for each communication that also takes up time.)

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.