jacobcunningham Posted June 3, 2011 Share Posted June 3, 2011 I am parsing through a .csv file to translate it into a cleaner looking .csv file. I am first taking the input .csv file and putting it into a database. I just testing an input file that was about 50,000 lines and that took about 14 minutes to complete.. This doesn't sound good especially considering the input file that users will be submitting are about 1,600,000 lines. Here's my code.. does anyone see how my algorithm could be improved to increase the speed? Or should I be looking at doing this in another language? If so, what language do you propose? if($flag == true) { $con = mysql_connect('localhost', 'root', '') or die('cannot connect'); $db = mysql_select_db('translate') or die('cannot select DB'); $table = 'nursinghomes'; if ($file=fopen($fileloc,"r+")) { while(!feof($file)) { $line = fgets($file); $linearray = explode(',', $line); $id = $linearray[0]; $formcolumn = $linearray[1] . "_" . $linearray[2] . "_" . $linearray[3]; $content = $linearray[4]; if(checkColumn($formcolumn, $table) == 0) { $query = ("ALTER TABLE $table ADD COLUMN $formcolumn CHAR(100);"); $result = mysql_query($query); } if(checkID($id, $table) == 1) { $query = "UPDATE $table SET $formcolumn='$content' WHERE id=$id"; $result = mysql_query($query); } else { $query = "INSERT INTO $table (id, $formcolumn) VALUES('$id', '$content')"; mysql_query($query); } } fclose($file); echo "Your file has been translated. Click <a href=\"?p=another.php\">here</a> to export the file."; } Quote Link to comment Share on other sites More sharing options...
Psycho Posted June 3, 2011 Share Posted June 3, 2011 Never EVER run queries in loops. Instead, in your processing code determine all the DB queries you will need to make and then combine them (as applicable). For example, you could run one query to add multiple columns and you can run one query to add multiple records (both of which will be much more efficient). You can also run one query to update multiple records with different values for each - but it is not strait-forward and I'm not sure how efficient it will be. I'll review the code and give you something to work with. Quote Link to comment Share on other sites More sharing options...
gizmola Posted June 3, 2011 Share Posted June 3, 2011 Why are you adding columns to a table? There has to be a structure that doesn't involve that. I would be willing to bet that most of the time taken in this is in the mysql aspects of the code. You can greatly improve that by batching up all your inserts. You do not need to seperate the insert from the update if you use the INSERT ... ON DUPLICATE KEY ... syntax. You might start with batches of 500 rows at a time. If you're not familiar, you simply repeat the part after VALUES INSERT INTO table (id, ) VALUES ( ), ( ), ( ) ..... ON DUPLICATE KEY ... Quote Link to comment Share on other sites More sharing options...
mikosiko Posted June 3, 2011 Share Posted June 3, 2011 Maybe you can use LOAD DATA INFILE to load the data in your table instead of read the file with PHP (assuming that, as gizmola said, you have a defined table in you db to hold the information). Quote Link to comment Share on other sites More sharing options...
jacobcunningham Posted June 3, 2011 Author Share Posted June 3, 2011 an example line of input would be like: 111111,Z012345,00100,0200,Arizona 11111 would be the client ID Z012345,00100,0200 would be combined to form the column name, so it would produce Z012345_00100_0200 Arizona would be the content of the field The input file has multiple lines like this.. there could be 10 lines with the same ID number, but different columns and content. There is no way of knowing the structure of the database before hand, so I have to build the structure as I go as well as filling in the content.. Quote Link to comment Share on other sites More sharing options...
gizmola Posted June 3, 2011 Share Posted June 3, 2011 an example line of input would be like: 111111,Z012345,00100,0200,Arizona 11111 would be the client ID Z012345,00100,0200 would be combined to form the column name, so it would produce Z012345_00100_0200 Arizona would be the content of the field The input file has multiple lines like this.. there could be 10 lines with the same ID number, but different columns and content. There is no way of knowing the structure of the database before hand, so I have to build the structure as I go as well as filling in the content.. Concatenating 3 values together and turning that into a column is a bizarre design that is going to perform poorly. Other than loading in this data into mysql we don't really have an idea of what you're trying to accomplish, but I can say with some certainty that you are on the wrong track here. If we understood the goal, we might be able to advise you on a good way to approach your problem. Quote Link to comment Share on other sites More sharing options...
Psycho Posted June 3, 2011 Share Posted June 3, 2011 OK, following your original process I rewrote this to do the exact same thing, but more efficiently. Although it is a lot more code it should run faster due to reducing the queries being done. Right now your script runs an individual query for each and every field of each record. The below script does the following: 1. Gets a list of all the current fields in the table, then while processing the data generates a list of new fieldds to be added to the table. Then ONE query is run to add the fields. 2. Creates an array of all complete records to be added/updated 3. Splits the array into two arrays: records to be added and records to be updated 4. Processes the records to be added array into ONE single query to add all the records 5. Processes the records to be updated into one complete query per record (not per field). With that there are a total of 4 main queries plus one query each for each record update. That is probably less than 10% of the queries you have now. Also, #5 can be further updated into a single query to update all the records in one query. It takes a MySQL function similar to a switch statement in PHP. But, I forget the structure and I'm too lazy to look it up. That would mean only 5 total queries for the entire file. I wrote all this code on-the-fly, so I'm positive there are some syntax errors and I probably have a logic problem or two. I would suggest commenting out the actual mysql_query() calls, add some debugging code to display the queries and other pertinent data, then run it against a small file to validate the results. <?php if($flag == true) { $con = mysql_connect('localhost', 'root', '') or die('cannot connect'); $db = mysql_select_db('translate') or die('cannot select DB'); $table = 'nursinghomes'; //Get list of existing columns from table $query = "SHOW COLUMNS FROM {$table}"; $result = mysql_query($query) or die(mysql_error()); $existingColumns = array(); while($row = mysql_fetch_assoc($result)) { $existingColumns[] = $row['Field']; } //Variables to hold processed data $newColumnsSQL = array(); $data = array(); //Process the file into an array if ($file=fopen($fileloc,"r+")) { while($linearray = fgetcsv($file, 1000, ",") !== FALSE) { $id = $linearray[0]; $formcolumn = $linearray[1] . "_" . $linearray[2] . "_" . $linearray[3]; $content = $linearray[4]; //Detemine if field exists in table (or has already been added) if(!in_array($formcolumn, $existingColumns)) { $existingColumns[] = $formcolumn; $newColumnsSQL[] = "ADD COLUMN {$formcolumn} CHAR(100)"; } //Add record to data variable $data[(int)$id][$formcolumn] = mysql_real_escape_string($content); } fclose($file); } //Finish processing file, now run queries //Run ONE query to add all new columns if(count($newColumnsSQL) > 0) { $query = "ALTER TABLE $table " . implode(', ', $newColumnsSQL); $result = mysql_query($query) or die(mysql_error()); } //Run query to find all existing records from the processed data $query = "SELECT id FROM {$table} WHERE id IN (" . implode(',', array_keys($data)) . ")"; $result = mysql_query($query) or die(mysql_error()); //Dump results into array $existingIDs = array(); while($row = mysql_fetch_assoc($result)) { $existingIDs[$row['id']] = false; } //Split data array into two arrays (one for new records and one for existing records) $newData = array_diff_key($data, $existingIDs); $existingData = array_diff_key($data, $newData); //Create array of all fields (except ID field) with empty value $columnList = array_fill_keys($existingColumns, ''); //Process insert data $valuesSQL = array(); foreach($newData as $id => $recordData) { //Ensure data is mapped for all fields in correct order $insertData = array_merge($columnList, $recordData); foreach($recordData as $fieldName => $fieldValue) { $valuesSQL[] = "('" . implode("', '", $insertData) . "')"; } } //Create ONE insert query for all new records $query = "INSERT INTO $table (`" . implode('`, `', array_keys($columnList)) . "`) VALUES " . implode(', ', $valuesSQL); $result = mysql_query($query) or die(mysql_error()); //Process update data foreach($existingData as $id => $recordData) { $setValuesSQL = array(); foreach($recordData as $fieldName => $fieldValue) { $setValuesSQL[] = "`{$fieldName}` = '{$fieldValue}'" } $query = "UPDATE {$table} SET " . implode(', ', $setValuesSQL); $result = mysql_query($query) or die(mysql_error()); } echo "Your file has been translated. Click <a href=\"?p=another.php\">here</a> to export the file."; } ?> Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.