cytech Posted December 18, 2009 Share Posted December 18, 2009 Good Morning, I wrote a small import function for a website of mine and I know there has to be a better way to handle what I'm doing. I'm hoping someone can take my code and make it run a hair faster as it seems to be pretty slow right now. Think anyone can speed this up? What am I doing wrong? or am I doing it right? The below function takes a file(csv) reads the content and imports it into the database, how can I optimize this to run faster and cleaner? function importFile($user_id, $file, $file_id) { if(!file_exists($file)) { return false; } $handle = fopen($file, "rb"); $contents = ''; while (!feof($handle)) { $contents .= fread($handle, 8192); } fclose($handle); if(empty($contents)) { return false; } // grab the items $items = explode("\n", $contents); // if there are items import them. if(is_array($items)) { foreach($items as $key=>$val) { $save_val = $val; $val = explode(",", trim($val)); if(!is_array($val) || sizeOf($val)<=1) { $val = explode("\t", trim($save_val)); if(!is_array($val)) { return false; } } // part number // alt part number // condition // qty // description foreach($val as $inner_key=>$inner_val) { $inner_val = str_replace("\r", "", $inner_val); $val[$inner_key] = str_replace("\"", "", $val[$inner_key]); $val[$inner_key] = trim(mysql_real_escape_string($val[$inner_key])); } if(!empty($val[0]) && strtolower($val[0]) != "part number") { $val[0] = $this->fixPartNumber($val[0]); $val[1] = $this->fixPartNumber($val[1]); // check to see if we need to insert or update $select_sql = "SELECT inventory_id FROM ".TABLE_PREFIX."inventory WHERE inventory_part_number='{$val[0]}' AND user_id={$user_id}"; $result = $this->db->query($select_sql); //echo mysql_error()."<BR><BR>"; if(!$result) { echo "fail"; exit; } if(mysql_num_rows($result) == 0) { // no record, so insert $insert_sql = "INSERT INTO ".TABLE_PREFIX."inventory (inventory_part_number, inventory_alt_part_number, inventory_condition_code, inventory_quantity, inventory_description, last_update, user_id) VALUES ( '{$val[0]}', '{$val[1]}', '{$val[2]}', '{$val[3]}', '{$val[4]}', NOW(), '{$user_id}')"; //echo $insert_sql."<BR><BR>"; $result = $this->db->query($insert_sql); echo "Inserted: ".$val[0]."<br />"; } else { $update_sql = "UPDATE ".TABLE_PREFIX."inventory SET inventory_part_number='{$val[0]}', inventory_alt_part_number='{$val[1]}', inventory_condition_code='{$val[2]}', inventory_quantity='{$val[3]}', inventory_description='{$val[4]}', last_update = NOW() WHERE user_id = {$user_id} AND inventory_part_number = '{$val[0]}'"; //echo $update_sql."<BR><BR>"; $result = $this->db->query($update_sql); echo "Updated: ".$val[0]."<br />"; } } } $update_sql = "UPDATE ".TABLE_PREFIX."file_upload SET file_flag = 1 WHERE file_id=".$file_id." AND user_id=".$user_id; $result = $this->db->query($update_sql); return true; } } Quote Link to comment Share on other sites More sharing options...
cytech Posted December 18, 2009 Author Share Posted December 18, 2009 Doing some research and found some benchmarks that state - foreach loops used with large amounts of data are way to inefficient and you should use for loops. Anyone care to back up this claim? Quote Link to comment Share on other sites More sharing options...
oni-kun Posted December 18, 2009 Share Posted December 18, 2009 Doing some research and found some benchmarks that state - foreach loops used with large amounts of data are way to inefficient and you should use for loops. Anyone care to back up this claim? What is your application of this? It's sorta ironically mixing flatfiles with databases. If it's just for importing the odd file it shouldn't matter, but if it's an essential code than you should store the files in a format, serialized, imploded, in an array, or even in xml/csv which can be easily handled with built in plugins that should perform much faster than all that parsing. You can test speeds and try a few optimizations using this benchmarking code if you wish, I think it works: This'll be at the top of your page, or included.. $starttime = explode(' ', microtime()); $starttime = $starttime[1] + $starttime[0]; And this at the bottom: $mtime = explode(' ', microtime()); $totaltime = $mtime[0] + $mtime[1] - $starttime; printf('<br>Page loaded in %.4f seconds.', $totaltime); Quote Link to comment Share on other sites More sharing options...
cytech Posted December 18, 2009 Author Share Posted December 18, 2009 Thank you for your post and for the benchmark code, I will plop that in and see how it looks. It is somewhat essential, users upload their csv files and the system needs to import it into the database. I wonder if there is a solution that upon upload it turns the csv into something that will make importing a lot easier... I will give that a few google searches. Quote Link to comment Share on other sites More sharing options...
oni-kun Posted December 18, 2009 Share Posted December 18, 2009 Thank you for your post and for the benchmark code, I will plop that in and see how it looks. It is somewhat essential, users upload their csv files and the system needs to import it into the database. I wonder if there is a solution that upon upload it turns the csv into something that will make importing a lot easier... I will give that a few google searches. fgetcsv and similar? Since it's internal rather than logic it should run faster, you should run some tests. Quote Link to comment Share on other sites More sharing options...
cags Posted December 18, 2009 Share Posted December 18, 2009 How many rows do you have in your csv? Your script is likely taking a lot more time doing all the SELECT and UPDATE statements than you will ever notice from switching foreach to for. Perhaps there would be a quicker method using the INSERT...ON DUPLICATE syntax, but I don't know I've never used it. As to which loop is technically quicker, there appears to be no definitive answer, it would very much seem to depend on what exactly your doing on each iteration. Edit: Or possibly LOAD DATA INFILE? Quote Link to comment Share on other sites More sharing options...
cytech Posted December 18, 2009 Author Share Posted December 18, 2009 Hey Cags, Agreed... I just figured that out, I switched out the foreach loops and it did run a "hair" faster, not enough to worry about it. So that didn't help. However, when running the import function "without" the database queries it ran through the csv perfectly and decently fast. I put the queries back in and it went to a halt, slow as a snail. So it has to do with the database and queries at this point. With that being said, I need to look at the database and figure out a more appropriate fix. Thank you! Quote Link to comment Share on other sites More sharing options...
cytech Posted December 18, 2009 Author Share Posted December 18, 2009 loadata infile... brilliant.. Quote Link to comment Share on other sites More sharing options...
cags Posted December 18, 2009 Share Posted December 18, 2009 I've never used it before so I'm not entirely sure how it'll work with duplicate rows, but it might be useful for what your trying. It'll get rid of a lot of overheads from communicating between PHP and MySQL. Quote Link to comment Share on other sites More sharing options...
cytech Posted December 18, 2009 Author Share Posted December 18, 2009 Check this out: http://www.softwareprojects.com/resources/programming/t-how-to-use-mysql-fast-load-data-for-updates-1753.html Quote Link to comment Share on other sites More sharing options...
cags Posted December 18, 2009 Share Posted December 18, 2009 Makes sense, has it given you a performance boost? Quote Link to comment Share on other sites More sharing options...
cytech Posted December 18, 2009 Author Share Posted December 18, 2009 Not sure yet, will attempt it this weekend. Of course, I shall post the results once it is completed. Quote Link to comment Share on other sites More sharing options...
Psycho Posted December 18, 2009 Share Posted December 18, 2009 I was thinking of ON DUPLICATE as well, but that only works when you have a single column being used for the duplicate check, whereas it looks like this is using two fields for uniqueness: user and part number. I do see a lot of room for improvement though: First, you have the following code to read the file into a string, then you explode the string into an array. $contents = ''; while (!feof($handle)) { $contents .= fread($handle, 8192); } fclose($handle); if(empty($contents)) { return false; } // grab the items $items = explode("\n", $contents); You can replace all of that with a single line which reads the contents directly into an array using the function file() $items = file($file); Second, and this is minor, you are runnig an is_array() check after creating an array - kind of pointless. Third, you are iterrating through the data several times. Once to validate/cleant the data and then again to put the data in the DB. Do it all in one loop. Finally, and most importantly, as others have noted looping queries is probably the biggest hit inperformance. You are doing two queries for each record - one to check if the record exists and then a second to update or insert the record. Instead, always do an UPDATE query, then use mysql_affected_rows() to determine if the record already existed. If not, then add the values to an array. Once you've gone through every record you have run one query for each and all the updates are done. Then you can insert all the new records in one single query. Doing this will reduce the number of queries needed in half. Here is some modified code I put together. This is not tested though <?php function importFile($user_id, $file, $file_id) { if(!file_exists($file)) { return false; } // grab the items $recordsAry = file($file); if (count($recordsAry)<1) { return false; } foreach($recordsAry as $rKey => $recordStr) { $recordStr = trim($recordStr); if($recordStr=='') { //Remove empty lines unset($recordsAry[$rKey$rKey]); } else { //Create variable to store records to be inserted $insertValues = array(); $recordStr = str_replace("\"", "", $recordStr); $recordStr = mysql_real_escape_string($recordStr); //Split record string into fields vars $delimiter = (strstr($recordStr, ',')!==false) ? ",": "\t"; $fieldsAry = explode($delimiter, $recordStr); list($partNumber, $altPartNumber, $condition, $qty, $description) = $fieldsAry; $partNumber = $this->fixPartNumber($partNumber); $altPartNumber = $this->fixPartNumber($altPartNumber); //Run update script $sql = "UPDATE ".TABLE_PREFIX."inventory SET inventory_alt_part_number='{$altPartNumber}', inventory_condition_code='{$condition}', inventory_quantity='{$qty}', inventory_description='{$description}', last_update = NOW() WHERE user_id = {$user_id} AND inventory_part_number = '{$partNumber}'"; $result = $this->db->query($sql); //Check results if(!$result) { echo "Fail Update: {$partNumber}<br />\n"; } else if (mysql_affected_rows()===0) { //Record doesn't exist - add to INSERT values array $insertValues[$partNumber] = "('{$partNumber}', '{$altPartNumber}', '{$condition}', '{$qty}', '{$description}', NOW(), '{$user_id}')"; } else { //Record was updated echo "Updated: {$partNumber}<br />\n"; } } //Add new records if(count($insertValues)>0) { //Insert records that need to be added $sql = "INSERT INTO ".TABLE_PREFIX."inventory (inventory_part_number, inventory_alt_part_number, inventory_condition_code, inventory_quantity, inventory_description, last_update, user_id) VALUES " implode(', ', $insertValues); $result = $this->db->query($sql); if(!$result) { echo "Fail Insert: " . implode(', ', array_keys($insertValues)) . "<br />\n"; } else { echo "Inserted: " . implode(', ', array_keys($insertValues)) . "<br />\n"; } } } $sql = "UPDATE ".TABLE_PREFIX."file_upload SET file_flag = 1 WHERE file_id={$file_id} AND user_id={$user_id}"; $result = $this->db->query($sql); return true; } ?> Quote Link to comment Share on other sites More sharing options...
cytech Posted December 18, 2009 Author Share Posted December 18, 2009 Very interesting... I really appreciate the input mjdamato - I will give this a go. Quote Link to comment Share on other sites More sharing options...
cytech Posted December 21, 2009 Author Share Posted December 21, 2009 Well the suggestions put out have helped overall with performance but when updating 5-10k records its still VERY slow. This is just due to the database. The performance hit isn't with inserts but with the update commands, so I re-did it so I could do the update command in one go instead of doing multiple calls. This seems to have improved performance slightly. However, when doing thousands of records its still very slow. Not sure there's a way around this one. Once again thank you all for your help. Quote Link to comment Share on other sites More sharing options...
nafetski Posted December 21, 2009 Share Posted December 21, 2009 So with each record, you are querying the database to see if it's there...if it is, you are updating - and if it's not, you are inserting. I would suggest using REPLACE INTO Basically if it's there, it deletes and reinserts. If it's not, it just inserts. That will probably speed up your performance a ton. That, and getcsv(). That or take a look at ON DUPLICATE KEY in mysql. Quote Link to comment Share on other sites More sharing options...
cytech Posted December 21, 2009 Author Share Posted December 21, 2009 Hey nafetski, Thanks for the ideas. ON DUPLICATE seems like the best option in this situation however I have two unique keys and not just one. REPLACE INTO even though would be a single query, would have its down fall because if the record does exist (which most do) it would take 2 queries to "update" it. So after all is said and done it would be 3 queries for one entry. Even though its faster then doing 3 queries yourself, if I can get it down to a single query on both sides its great. What I have now is doing one insert query into a temporary table (this takes 1-2 seconds if that). I then do an update based from the temporary table to the live table in one update command (this still takes a lot of time - just so many records). Then those that where not updated from the temporary table I insert into the live table. Then truncate temp able and move onto a new file. This works great, except for when you have 30k updates to do haha. Oh well. Quote Link to comment Share on other sites More sharing options...
Psycho Posted December 21, 2009 Share Posted December 21, 2009 ON DUPLICATE seems like the best option in this situation however I have two unique keys and not just one. Actually, you have no unique keys. The combination of two values must be unique - which ON DUPLICATE cannot work with. But, there is a simple solution - create a new, unique key using those two values. In addition to saving the user ID and the part number as individual fields, create a new column in the table where you save the value "[userID]-[partNo]". Then that field can be set up as a unique column and you can use ON DUPLICATE. Then, as in the examples I provided earlier, use PHP to create one single query using ON DUPLICATE. Quote Link to comment Share on other sites More sharing options...
cytech Posted December 21, 2009 Author Share Posted December 21, 2009 That's not a bad idea.. I was reading though and people where saying the overhead of using ON DUPLICATE was a dangerous matter? This true? They where saying if the entry is found ON DUPLICATE removes it and then inserts a new one which in turn is 2 queries. Or am I just over thinking this and I should not worry about that.. Quote Link to comment Share on other sites More sharing options...
Psycho Posted December 22, 2009 Share Posted December 22, 2009 That's not a bad idea.. I was reading though and people where saying the overhead of using ON DUPLICATE was a dangerous matter? This true? They where saying if the entry is found ON DUPLICATE removes it and then inserts a new one which in turn is 2 queries. Or am I just over thinking this and I should not worry about that.. The best way to find out would be to test it. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.