TOA Posted May 24, 2012 Share Posted May 24, 2012 I'm having issues getting a csv into the database, and I'm having alot of weird behavior. Please bear with me as I try to give as complete an explanation as possible. The description We get usage data from a partner ftp'd to us every week in csv format. (FYI - it's an Excel csv) I tried to parse it into mysql database using LOAD DATA INFILE, and after figuring out the right line endings, etc. the data gets put in but half the fields default or 0 out - depending on the type of course (FYI - yes I have error settings on, double checked just in case) So I opened the csv and noticed 4 things right off the bat: 1. All data is in the first column 2. When I 'save-as', it says something about some portions not working with unicode or something - I can't recreate the dialog box for some reason, so I can't remember the exact message 3. and 4. When I 'save as', the name is in quotes - ie: "somethingorother.csv" in the filename input and the type is auto-selected to Unicode text. If I follow through with the save-as, changing the name and type, and run the data through the database again it get's entered but one of the fields 0's out - it's an integer. (FYI - using the 'text-to-columns' tool has the same effect) (FYI 2 - This is the closest I've come) I used f(get|put)csv to try to manually do it again, and it had the same effect as running the original. I tried to brute force it by whipping together the following code and I get the same effect as running the originally formatted csv through; that is, half the fields don't get entered correctly. // get the csv file if (($handle = fopen($_FILES['report']['tmp_name'], 'r')) === FALSE) { echo "Unable to open uploaded file"; } else { $data = array(); // get each line while ($line = fgetcsv($handle)) { $data[] = $line; } array_shift($data); // drops the columns from the csv array_pop($data); // drops the empty last line that's always there $sql = "INSERT INTO TableName (`Field1`,`Field2`,`Field3`,`Field4`,`Field5`,`Field6`,`Field7`) VALUES "; foreach ($data as $line) { if (isset($line[5])) { // format the date $date = explode('/', $line[5]); // put it back in $line $line[5] = $date[2].'-'.$date[0].'-'.$date[1]; } // make it a csv for VALUES $value = implode("','", $line); // there's always an extra space at the end for some reason - get rid of it - it bothers me $value = trim($value); // put quotes around each value $value = "'".$value."'"; // add it to the values array $values[] = $value; } // group each line in ( ) for the values statement foreach ($values as $value) { $value = trim($value); $v[] = "($value)"; } // add all the formatted values to the sql statement $sql .= implode(',', $v); // FOR TESTING echo "<p>$sql</p>"; // query the db if (($res = $db->query($sql)) === false) { echo "<p>Could not import data. ({$db->error})</p>"; } else { echo "<p>Data successfully imported!</p>"; } } At this point I thought was an encoding issue and used iconv() which had everything from no effect to 'wow, what the hell is that' effects. If, at any of these points, I open it in notepad I get a range of results; weird unicode, text with black nul pills between every letter, all the way up to what seems to my eye to be a csv, but must not be because it doesn't work. I've googled and searched forums (including this one) and have tried anything I find as an answer that seems reasonable. I just don't have any more ideas. Any insights?? PS: My test server is Apache 5.1.6 so PHPExcel Class is not an option for me. (Although our live server runs 5.3.2) Quote Link to comment Share on other sites More sharing options...
Jamdog Posted May 24, 2012 Share Posted May 24, 2012 As far as I can tell, there is a fundamental flaw in your code - here's it in pseudo-code, as I see it While more CSV lines in file { Get line Add to Data Array } // Data array now contains the WHOLE CSV file! Foreach Line in the Data Array { If the 5th field on the line is set, reformat the date Now, hang on there a sec, you earlier said: So I opened the csv and noticed 4 things right off the bat: 1. All data is in the first column So, how can there be a 5th field on that line? OK, I'll continue: Put together all the fields on the line, separated by commas and quotes Trim, add quotes to the start and end then add the new string to an array ($values) // At this point, $values is now an array of lines from the CSV, comma-seperated for SQL entry } Foreach Values { Add brackets and put in a new array, $v } Implode $v, comma-separated Throw the whole CSV file into the SQL database in one go! Sorry, I don't like that last line at all... I've never done it, and if a huge CSV file is encountered, something could go wrong (and possibly has). Why not simplify this a bit... Get rid of the values and v arrays, and from here, put the lines in one at a time: $sql = "INSERT INTO TableName (`Field1`,`Field2`,`Field3`,`Field4`,`Field5`,`Field6`,`Field7`) VALUES "; foreach ($data as $line) { if (isset($line[5])) { // format the date $date = explode('/', $line[5]); // put it back in $line $line[5] = $date[2].'-'.$date[0].'-'.$date[1]; } // make it a csv for VALUES $value = implode("','", $line); // there's always an extra space at the end for some reason - get rid of it - it bothers me $value = trim($value); // put quotes around each value $value = "'".$value."'"; // New Bit! // We should have 7 fields to put into the database if (count($line) != 7) { echo "Invalid line in CSV: Values = ".$value; } else { // Put this one line into the database $query = $sql."(".$value.")"; if (($res = $db->query($query)) === false) { echo "<p>Could not import data. ({$db->error})</p>"; } else { echo "<p>Data successfully imported!</p>"; } } } Quote Link to comment Share on other sites More sharing options...
mikosiko Posted May 24, 2012 Share Posted May 24, 2012 could you post some lines of your csv file? Quote Link to comment Share on other sites More sharing options...
TOA Posted May 24, 2012 Author Share Posted May 24, 2012 Hey Jamdog, thanks for the reply. Now, hang on there a sec, you earlier said: So I opened the csv and noticed 4 things right off the bat: 1. All data is in the first column So, how can there be a 5th field on that line? Those are 2 different statements; I may not have been clear about that. I was describing the results of several things, sorry. This So I opened the csv and noticed 4 things right off the bat: 1. All data is in the first column is about the csv file itself This So, how can there be a 5th field on that line? was about the code shown; different attempts using different processes. When you use fgetcsv to get the file, it returns an array; the 5th element was a date and needed to be formatted is all. That really doesn't have anything to do with the problem, just making a date from m/d/y to y/m/d Get rid of the values and v arrays, and from here, put the lines in one at a time: It was my understanding that a single query is always prefered over multiple, but lacking a successful attempt on my part, I'll give your's a go too Thanks for the suggestions. Quote Link to comment Share on other sites More sharing options...
TOA Posted May 24, 2012 Author Share Posted May 24, 2012 could you post some lines of your csv file? Sure thing...here's 3 random lines 0,39354,Fake Company,Aging,Housing Options,4/12/2012,1 37,42877,Another Fake Company,Homepage,Monthly Feature,4/17/2012,1 25,41850,Yet Another Fake Company,Homepage,Monthly Feature,4/11/2012,1 **Company names have been changed to protect the innocent This is exactly as I recieved it, except for changing the company names. Quote Link to comment Share on other sites More sharing options...
mikosiko Posted May 24, 2012 Share Posted May 24, 2012 what about the LOAD DATA INFILE that you are using?... I don't see nothing wrong with that data format, LOAD DATA should work Quote Link to comment Share on other sites More sharing options...
TOA Posted May 24, 2012 Author Share Posted May 24, 2012 what about the LOAD DATA INFILE that you are using?... I don't see nothing wrong with that data format, LOAD DATA should work I thought so too Here it is LOAD DATA LOCAL INFILE '$file' INTO TABLE MyTable FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"' LINES TERMINATED BY '\r' IGNORE 1 LINES ($fields) SET `Datadate` = DATE_FORMAT(STR_TO_DATE(@date, '%m/%d/%Y'), '%Y-%m-%d') $file is just the $_FILES['file']['tmp_name'] $fields just stores the field names Using this on the original csv gets half the data filled in, half zeroed out or default. It might be worth noting that I have text and int as the types of fields in the db; all text's get populated, none of the int's do. Thanks for looking at this. Quote Link to comment Share on other sites More sharing options...
mikosiko Posted May 24, 2012 Share Posted May 24, 2012 using your data example (without headers) this worked perfect for me (adjust your fields/ variables accordingly) LOAD DATA INFILE '$file' INTO TABLE MyTable FIELDS TERMINATED BY ',' (f1,f2,f3,f4,f5,@f6,f7) SET `f6` = STR_TO_DATE(@f6, '%m/%d/%Y'); Quote Link to comment Share on other sites More sharing options...
TOA Posted May 24, 2012 Author Share Posted May 24, 2012 That looks like what I had. I'll use yours and build off it and see what I can get, maybe something's different that I'm not noticing. Could it be something with my database or that particular table? I've used this technique several other times in the same database in other parts of the site and have always had success. Thanks Quote Link to comment Share on other sites More sharing options...
TOA Posted May 25, 2012 Author Share Posted May 25, 2012 Using exactly what you posted (with or without LOCAL), I get the same results as the LOAD DATA I posted; that is half the data loads, but the other half doesn't. Quote Link to comment Share on other sites More sharing options...
mikosiko Posted May 25, 2012 Share Posted May 25, 2012 well.. that could mean that your csv file has not the right format or could be corrupt... without seeing the original file is hard to tell what could be wrong with it. other simple test is do what I did... copy/paste your example data and try to upload it with the provided command (yours or mine) and see if you got the same behavior. Quote Link to comment Share on other sites More sharing options...
TOA Posted May 25, 2012 Author Share Posted May 25, 2012 well.. that could mean that your csv file has not the right format or could be corrupt... without seeing the original file is hard to tell what could be wrong with it. That's what I was thinking. other simple test is do what I did... copy/paste your example data and try to upload it with the provided command (yours or mine) and see if you got the same behavior. Now why didn't I think of that? I'm going to do that first. Then if that works, I'll focus on the above. Thanks, I'll be back. Quote Link to comment Share on other sites More sharing options...
TOA Posted May 25, 2012 Author Share Posted May 25, 2012 other simple test is do what I did... copy/paste your example data and try to upload it with the provided command (yours or mine) and see if you got the same behavior. Ok, I ran it with the version based on yours and my original and both entered all the data successfully. So it's not the query. well.. that could mean that your csv file has not the right format or could be corrupt... Not sure how to tell if it's corrupt or not but I did try a few things regarding format (mentioned in OP) but I'd be happy to try anything, even if it is something I tried before; I was pretty frazzled and may not have been using them correctly. without seeing the original file is hard to tell what could be wrong with it. That could be arranged, although it holds some client info so I wouldn't be able to post it publicly. Thanks for your help so far. Quote Link to comment Share on other sites More sharing options...
TOA Posted May 29, 2012 Author Share Posted May 29, 2012 Still looking for help with this issue. I'm positive it's not the query, and I can do all the normal stuff with the file except get it into the db so I don't think it's corrupt. Does anyone have any ideas what might be going on? I tried messing with encoding a few days ago (this seems to be the last option), but didn't reach any acceptable results, so I still need help on this. Thanks. Quote Link to comment Share on other sites More sharing options...
TOA Posted May 29, 2012 Author Share Posted May 29, 2012 Found a function online to detect the encoding by checking the BOM and it's returning UTF-16 for the encoding of the csv. Could this be causing my issues? I'm not really too familiar with what the differences are and how I should handle it. iconv_get_encoding() shows my system is set to ISO-8859-1. If I use iconv_set_encoding() to set it to either UTF-8 or 16, it seems to have no effect when using the data - that is to say it's still not correct. Any suggestions? Ideas? Quote Link to comment Share on other sites More sharing options...
TOA Posted May 29, 2012 Author Share Posted May 29, 2012 I tried to convert the encoding from UTF-16 to 8, and I get functionality equal to that of the converted file; all data gets put in correctly except the first field. $filecontents = iconv('UTF-16','UTF-8',file_get_contents($file)); Could that be because of the BOM? I tried to remove it with this code but it wasn't successful. $filecontents = substr($filecontents,3); Any ideas on where to go from here? What could be causing the first field to be ignored in every row? Quote Link to comment Share on other sites More sharing options...
TOA Posted May 29, 2012 Author Share Posted May 29, 2012 I think I know what's happening now. Just not why or how to fix it. In an earlier attempt, I had used the below code to see if the file was corrupt and to see the format of the actual data. $file = $_FILES['report']['tmp_name']; $filecontents = iconv('UTF-16','UTF-8',file_get_contents($file)); file_put_contents($file, $filecontents); $handle = fopen($file, 'r'); while (!feof($handle)) { echo fgets($handle)."<br />"; } An example of what it produced was "comma,separated,values" "comma,separated,values" "comma,separated,values" "comma,separated,values" "comma,separated,values" Since everything but the one first field was going in correctly, I decided to try to change the type of the field that wasn't populating. Changing it to varchar did have results, but not as expected. What is now being populated is literally "0 - with the quotation mark. So for some reason, it's taking the " as part of the field, but my db field is an integer, leading to zeroed out values in that field. All the fields after are entered correctly though. I've tried to use the LINES STARTING BY and TERMINATED BY options with the LOAD DATA using the quotation mark as the line delim, but didn't get the results I expected; in fact almost nothing was loaded. Any ideas on what's going on or how to fix it? I'm so close.... Quote Link to comment Share on other sites More sharing options...
TOA Posted May 29, 2012 Author Share Posted May 29, 2012 I think I know what's happening now. Just not why or how to fix it. Guess I was a bit off. It has nothing to do with the quote; I was using the wrong file. What is happening though, is that if I set the db field to varchar it enters the data some-what correctly. It adds space in front of the data, almost like a carriage return or something, because if I go in and manually edit the data, putting in the exact same data, it doesn't include that extra space. If I set it back to INT, it zeroes out. So the real question(s) are 'why would it be adding that/where is it coming from'? I'm sure this is the reason that particular field is zeroing out. Quote Link to comment Share on other sites More sharing options...
TOA Posted May 29, 2012 Author Share Posted May 29, 2012 Got it. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.