Jump to content

csv import script.. improvements?


paddy_fields
Go to solution Solved by paddy_fields,

Recommended Posts

I've written the script below to import a csv file into the database. I've tried to make this as secure as possible by sanitizing and validating the data, but is there anything else I could do to improve it? I'm happy with the way it works so far.

 

All I can think to add is a meme check via php but I can't find a way to check whether it's specifically a csv file (other than just checking the extension but that's not secure)

<?php

if ($_FILES[csv][size] > 0) {
 
    //get the csv file
    $file = $_FILES[csv][tmp_name];
    $handle = fopen($file,"r");
 
    //set the row counter
    $csv_row = 0;
 
    //loop through the csv file and insert into database
    do {
 
        // start the import
        if ($data[0]) {
 
          //add 1 the row counter
          $csv_row++;
         
          //clear errors
          $errors = '';
         
          //sanitize the inputs
          $firstName = addslashes($data[0]);
          $lastName = addslashes($data[1]);
          $email = addslashes($data[2]);

         
 
          //validate the inputs - set $errors as 'YES' if any fail
          if(!ctype_alpha($firstName)){
          $errors = 'YES';    
          }
          if(!ctype_alpha($lastName)){
          $errors = 'YES';
          }
          if(!filter_var($email, FILTER_VALIDATE_EMAIL)){
          $errors = 'YES';
          }
 
          //check whether there were any errors in the validation
          if($errors!=='YES'){
              //insert into database
              $stmt = $db->prepare("INSERT INTO test (firstName,lastName,email) VALUES (?,?,?)");
              $stmt->bind_param('sss',$firstName,$lastName,$email);
              $stmt->execute();
          }
          //there was an error, store the affected row number in the error array
          else
          {
          $error_report[] = "There was an error on row $csv_row";
          } //end error check
 
 
 
       } //end single row import
 
    } while ($data = fgetcsv($handle));

    if($csv_row>0){
    	$success = 'YES';
    }
      
 } // end csv import
?>
Edited by paddyfields
Link to comment
Share on other sites

What does your loop do the FIRST time thru since $data has not been defined yet? Even if it doesn't completely fail, why do you want to run thru all that code before you have read a row from your file yet?

 

Wouldn't this form of the do/while concept work better:

while ($data = fgetcsv($handle))
{
    (do your stuff)
}
PS - you should use ' ' on your indices in associative arrays.
Link to comment
Share on other sites

“Sanitzing” the input with addslashes() makes no sense and will only damage your data, because you prepend a literal backslash to every quote, backslash and a few other characters.

 

A prepared statement already prevents the input parameters from causing any harm, so there's no need for any extra stuff. Also note that addslashes() is not suitable for escaping, anyway, because it doesn't take the character encoding into account. It simply assumes that the input it an ASCII string, which obviously isn't always true. In some cases, you may end up with no protection at all.

 

So get rid of the addslashes() calls – and forget this function. ;)

 

Then you should create the prepared statement outside of the loop. That's really one of the main features of prepared statements: You create them once and then execute them as often as you want. 

 

Another problematic practice is that you use the magical string “YES” as an error indicator. This can easily lead to total chaos if the code becomes more complex and you accidentally use different variations (“YES”, “yes”, “Yes”, “TRUE” etc.). A simple yes/no is what booleans are for. If you have more than two states, use constants.

Edited by Jacques1
Link to comment
Share on other sites

A few other notes:

if ($_FILES[csv][size] > 0) {

Textual array indexes should be enclosed in quotes. From the manual

 

 

$foo[bar] = 'enemy';
echo $foo[bar];

This is wrong, but it works. The reason is that this code has an undefined constant (bar) rather than a string ('bar' - notice the quotes). PHP may in the future define constants which, unfortunately for such code, have the same name. It works because PHP automatically converts a bare string (an unquoted string which does not correspond to any known symbol) into a string which contains the bare string. For instance, if there is no defined constant named bar, then PHP will substitute in the string 'bar' and use that.

 

Also, the code uses a do {} while() loop. In the code block the code is looking to perform work against the variable $data. But, that variable isn't defined in the first execution of the loop. So, the first iteration of the loop is a waste. It should instead use just a while() {} loop since you want to define $data before the first execution. (EDIT: This is the reason you had to add a if ($data[0]) in that loop. With a while() loop, you don't need to do that check since you know that $data would always be defined on each iteration of the loop)

 

Lastly, to add to Jacques1's response regarding the use of 'YES'. When you do use Booleans, it makes the code simpler. If you set $errors = FALSE; as the default and then set it to $errors = TRUE; when you detect an error, the condition to check if there were errors can be written simply as if($errors) {}. No need to write if($errors==TRUE) {}. A condition 'passes' if the results of the condition result to TRUE. So, if a value will only be TRUE/FALSE you don't need to create a comparison for the condition.

Edited by Psycho
Link to comment
Share on other sites

Thanks guys, much appreciated. I think I've incorporated all of the comments?

if ($_FILES['csv'][size] > 0) {
 
    //get the csv file
    $file = $_FILES['csv'][tmp_name];
    $handle = fopen($file,"r");
 
    //set the row counter
    $csv_row = 0;

    //prepared statement
    $stmt = $db->prepare("INSERT INTO test (firstName,lastName,email) VALUES (?,?,?)");
    $stmt->bind_param('sss',$firstName,$lastName,$email);
 
   
       //loop through the csv file and insert into database
       while ($data = fgetcsv($handle)){
 
          //add 1 the row counter
          $csv_row++;

          //assign the variables
          $firstName = $data[0];
          $lastName = $data[1];
          $email = $data[2];
         
          //clear errors
          $errors = ''; 
 
          //validate the inputs - set $errors as TRUE if any fail
          if(!ctype_alpha($firstName)){
          $errors = TRUE;    
          }
          if(!ctype_alpha($lastName)){
          $errors = TRUE;
          }
          if(!filter_var($email, FILTER_VALIDATE_EMAIL)){
          $errors = TRUE;
          }
 
          //check whether there were any errors in the validation
          if(!$errors){
              //insert into database
              $stmt->execute();
          }
          //there was an error, store the affected row number in the error array
          else
          {
          $error_report['error'] = "There was an error on row $csv_row";
          } //end error check
 
 
 
       } //end single row import
    
    //close the statement
    $stmt->close();

    if($csv_row>0){
    	$success = TRUE;
    }
 
 
 } // end csv import
Edited by paddyfields
Link to comment
Share on other sites

A few things:

  • You still need to quote the second string literal in $_FILES['csv'][size] and $_FILES['csv'][tmp_name]. The unquoted word size or tmp_name is interpreted as a constant. That's obviously not what you mean, and of course those constants don't exist anywhere in your script. PHP will emit a warning and then assume you meant a string. This does lead to the correct result eventually, but it's really not the proper way of doing it. String literals always need quotes (the only exception is an index within a double-quoted string if you don't use the curly brace syntax; but that's an edge case).
  • You should initialize $errors with false, not an empty string.
  • You keep overwriting $error_report['error'] with different messages. If you want an array of multiple messages, you need to append them like before: $error_report[] = ...
  • The $success variable at the end is a bit “surprising” and won't be defined at all if the condition isn't met. Better initialize it before the if statement or even before the loop. If you initialize it before the loop, you can actually set it to true right in the loop and don't need any extra checks.

Apart from that, the code looks good. :)

Link to comment
Share on other sites

   //get the csv file
    $file = $_FILES['csv'][tmp_name];
    $handle = fopen($file,"r");

I'm not a fan of defining a variable to only use it once. Why not just . . . 

$handle = fopen($_FILES['csv']['tmp_name'], "r");

I would trim the values

         //assign the variables
          $firstName = trim($data[0]);
          $lastName = trim($data[1]);
          $email = trim($data[2]);
Edited by Psycho
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.