caleb87 Posted November 21, 2014 Share Posted November 21, 2014 Hello, Any ideas why server resources not exhausted on the code below, when my.cnf and php.ini are both set to use half the resources each? PHP & MySQL will both utilize all allotted resources on other scripts, so it's not a tuning problem. There is no script-side tuning. The bottleneck: pdo select as shown below foreach($bigList as $listObject) { $sql = $dbl->prepare("SELECT * FROM fewMillionRows WHERE indexedCol1=:indexedCol1 AND indexedCol2=:indexedCol2 AND indexedCol3=:indexedCol3 AND indexedCol4=:indexedCol4 LIMIT 1"); $sql->execute($preparedValues); $return = $sql->fetchAll(PDO::FETCH_ASSOC); } On a dedicated server with 8GB RAM, the server uses >5% cpu/ram but takes a long time to finish the script. Second Question What are some alternative designs? Because the column values all happen to be alphanumeric, I could select the entire table and store it in an array. Accessing the keys like so: indexedCol1indexedCol2indexedCol3indexedCol4. Results: MyISAM - Select Whole Table: 30 seconds MyISAM – Select Individual Rows (10k times) – 68 seconds InnoDB – Select Whole Table: 30 seconds InnoDB – Select Individual Rows (10k times) – 131 seconds I am surprised it takes so long to select a whole table. The server resources use 1% for about 20 seconds, then cpu/ram jump to 30%+ for about 10 seconds. This is still drastically faster than individual selects. In this instance, $bigList is over 500k items. At 68 seconds per 10k rows it’s absurdly long. Building an array with key/values is the only realistic way I currently know of, but I suspect there is a much better way of doing this. As far as I know, I cannot do a select like so: SELECT * FROM t1 WHERE (column1,column2,column3) IN ((val1, val2,val3), (val4, val5,val6)) There is no way to determine whether a row was found for each entry as far as I know. Quote Link to comment Share on other sites More sharing options...
Barand Posted November 21, 2014 Share Posted November 21, 2014 Does the array $biglist come from a database table? Are your values indexcol1, indexcol2 etc from the objects in the biglist array? Quote Link to comment Share on other sites More sharing options...
caleb87 Posted November 21, 2014 Author Share Posted November 21, 2014 Does the array $biglist come from a database table? Are your values indexcol1, indexcol2 etc from the objects in the biglist array? In this instance, $bigList is not from a database. It's a csv file. Yes the column values are from $bigList ($preparedValues); I just didn't show binding the parameters to keep it short. It's merely example code. Quote Link to comment Share on other sites More sharing options...
Barand Posted November 21, 2014 Share Posted November 21, 2014 Is it feasible for you to load the csv data into a table then join that table to you fewMillionRows table. That way you only need a single SELECT query Quote Link to comment Share on other sites More sharing options...
caleb87 Posted November 21, 2014 Author Share Posted November 21, 2014 temporary table Dumping 580k rows to tempTable ("INSERT INTO tempTable (col1, col2, col3, col4) VALUES (1, 2, 3, 4), (5, 6, 7, , (9, 10, 11, 12)"): InnoDB: 200 seconds MyISAM: 130 seconds --251 bytes per row "SELECT * FROM tempTable LEFT JOIN fewMillionRowsON tempTable.col1=fewMillionRows.col1 AND tempTable.col2=fewMillionRows.col2 AND tempTable.col3=fewMillionRows.col3 AND tempTable.col4=fewMillionRows.col4 LIMIT 1000" The select took 30 seconds on 1000 limit. ----------------------------------------- asynchronous queries It appears I could use mysqli asynchronous queries (pdo doesn't appear to have this; but I haven't looked much yet). At first glance this seems to be the best solution or the php array dump. ----------------------------------------- original question Even if I find a superb alternative, I would really like to understand why it takes PHP so long to communicate to MySQL for each select in my original question. It's about .0068 seconds per query. It would be nice if it were .00068 instead. I did duplicate fewMillionRows to a memory engine, which reduced the select on 10k rows to 43.42 seconds. I don't really like doing this though, since any time the server is reboot it has to be rebuilt. Quote Link to comment Share on other sites More sharing options...
mac_gyver Posted November 21, 2014 Share Posted November 21, 2014 It's a csv file. this sounds like the - insert new data, update existing data, problem. you don't need to select existing data first to do this. you can either use a REPLACE query or an INSERT ... ON DUPLICATE KEY UPDATE query. also, your prepare() and bind statements should not be inside the loop and if you are binding input parameters, you would not supply the data values in the execute() statement. edit: i see you posted a reply while i was typing this. for most simple queries, the time taken to communicate the query/data to the database server, which is all done using characters, is much longer than the time it takes to actually run the query. even with a prepared query, assuming you correctly prepare and bind things once, before the start of any loop, you still must communicate the values for the bound parameters. the only way to substantially reduce the amount of time taken for a large number of queries is to reduce the total number of queries and communication round trips (there is handshaking for each communication that also takes up time.) Quote Link to comment Share on other sites More sharing options...
Barand Posted November 22, 2014 Share Posted November 22, 2014 (edited) Why the LEFT join? INNER join is much faster And do you need to load the csv file every time you need to run the query? Edited November 22, 2014 by Barand Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.