Jump to content

Trouble getting csv into db


TOA

Recommended Posts

I'm having issues getting a csv into the database, and I'm having alot of weird behavior. Please bear with me as I try to give as complete an explanation as possible.

 

The description

 

We get usage data from a partner ftp'd to us every week in csv format. (FYI - it's an Excel csv)

 

I tried to parse it into mysql database using LOAD DATA INFILE, and after figuring out the right line endings, etc. the data gets put in but half the fields default or 0 out - depending on the type of course :) (FYI - yes I have error settings on, double checked just in case)

 

So I opened the csv and noticed 4 things right off the bat:

    1. All data is in the first column

    2. When I 'save-as', it says something about some portions not working with unicode or something - I can't recreate the dialog box for some reason, so I can't remember the exact message

    3. and 4. When I 'save as', the name is in quotes - ie: "somethingorother.csv" in the filename input and the type is auto-selected to Unicode text.

 

If I follow through with the save-as, changing the name and type, and run the data through the database again it get's entered but one of the fields 0's out - it's an integer. (FYI - using the 'text-to-columns' tool has the same effect) (FYI 2 - This is the closest I've come)

 

I used f(get|put)csv to try to manually do it again, and it had the same effect as running the original.

 

I tried to brute force it by whipping together the following code and I get the same effect as running the originally formatted csv through; that is, half the fields don't get entered correctly.

// get the csv file
if (($handle = fopen($_FILES['report']['tmp_name'], 'r')) === FALSE) {
echo "Unable to open uploaded file";
} else {
$data = array();
// get each line
while	($line = fgetcsv($handle)) {
	$data[] = $line;
}
array_shift($data); // drops the columns from the csv
array_pop($data); // drops the empty last line that's always there	

$sql = "INSERT INTO TableName (`Field1`,`Field2`,`Field3`,`Field4`,`Field5`,`Field6`,`Field7`) VALUES ";

foreach ($data as $line) {
	if (isset($line[5])) {
		// format the date
		$date = explode('/', $line[5]);
		// put it back in $line
		$line[5] = $date[2].'-'.$date[0].'-'.$date[1];
	}		
	// make it a csv for VALUES
	$value = implode("','", $line);
	// there's always an extra space at the end for some reason - get rid of it - it bothers me
	$value = trim($value);
	// put quotes around each value
	$value = "'".$value."'";
	// add it to the values array
	$values[] = $value;
}
// group each line in ( ) for the values statement
foreach ($values as $value) {
	$value = trim($value);
	$v[] = "($value)";
}
// add all the formatted values to the sql statement
$sql .= implode(',', $v);

// FOR TESTING
echo "<p>$sql</p>"; 

// query the db
if (($res = $db->query($sql)) === false) {
	echo "<p>Could not import data. ({$db->error})</p>";
} else {			
	echo "<p>Data successfully imported!</p>";				
}

}

 

At this point I thought was an encoding issue and used iconv() which had everything from no effect to 'wow, what the hell is that' effects.

 

If, at any of these points, I open it in notepad I get a range of results; weird unicode, text with black nul pills between every letter, all the way up to what seems to my eye to be a csv, but must not be because it doesn't work.

 

I've googled and searched forums (including this one) and have tried anything I find as an answer that seems reasonable.

 

:facewall:

 

I just don't have any more ideas. Any insights??

 

PS: My test server is Apache 5.1.6 so PHPExcel Class is not an option for me. (Although our live server runs 5.3.2)

Link to comment
Share on other sites

As far as I can tell, there is a fundamental flaw in your code - here's it in pseudo-code, as I see it

While more CSV lines in file 
{
  Get line
  Add to Data Array
}

// Data array now contains the WHOLE CSV file!

Foreach Line in the Data Array
{
  If the 5th field on the line is set, reformat the date

Now, hang on there a sec, you earlier said:

So I opened the csv and noticed 4 things right off the bat:

    1. All data is in the first column

 

So, how can there be a 5th field on that line?

OK, I'll continue:

  Put together all the fields on the line, separated by commas and quotes
  Trim, add quotes to the start and end then add the new string to an array ($values)
  // At this point, $values is now an array of lines from the CSV, comma-seperated for SQL entry
}

Foreach Values
{
  Add brackets and put in a new array, $v
}

Implode $v, comma-separated
Throw the whole CSV file into the SQL database in one go!

Sorry, I don't like that last line at all...  I've never done it, and if a huge CSV file is encountered, something could go wrong (and possibly has).

 

Why not simplify this a bit...

 

Get rid of the values and v arrays, and from here, put the lines in one at a time:

	$sql = "INSERT INTO TableName (`Field1`,`Field2`,`Field3`,`Field4`,`Field5`,`Field6`,`Field7`) VALUES ";

foreach ($data as $line) {
	if (isset($line[5])) {
		// format the date
		$date = explode('/', $line[5]);
		// put it back in $line
		$line[5] = $date[2].'-'.$date[0].'-'.$date[1];
	}		
	// make it a csv for VALUES
	$value = implode("','", $line);
	// there's always an extra space at the end for some reason - get rid of it - it bothers me
	$value = trim($value);
	// put quotes around each value
	$value = "'".$value."'";

	// New Bit!
	// We should have 7 fields to put into the database
                if (count($line) != 7) {
	  echo "Invalid line in CSV: Values = ".$value;
	} else {
	  // Put this one line into the database
	  $query = $sql."(".$value.")";

	  if (($res = $db->query($query)) === false) {
		echo "<p>Could not import data. ({$db->error})</p>";
	  } else {			
		echo "<p>Data successfully imported!</p>";				
	  }
	}
}

Link to comment
Share on other sites

Hey Jamdog, thanks for the reply.

 

Now, hang on there a sec, you earlier said:

So I opened the csv and noticed 4 things right off the bat:

    1. All data is in the first column

 

So, how can there be a 5th field on that line?

 

Those are 2 different statements; I may not have been clear about that. I was describing the results of several things, sorry.

 

This

So I opened the csv and noticed 4 things right off the bat:

    1. All data is in the first column

is about the csv file itself

 

This

So, how can there be a 5th field on that line?

 

was about the code shown; different attempts using different processes. When you use fgetcsv to get the file, it returns an array; the 5th element was a date and needed to be formatted is all. That really doesn't have anything to do with the problem, just making a date from m/d/y to y/m/d

 

Get rid of the values and v arrays, and from here, put the lines in one at a time:

 

It was my understanding that a single query is always prefered over multiple, but lacking a successful attempt on my part, I'll give your's a go too :)

 

Thanks for the suggestions.

Link to comment
Share on other sites

could you post some lines of your csv file?

 

Sure thing...here's 3 random lines

 

0,39354,Fake Company,Aging,Housing Options,4/12/2012,1

37,42877,Another Fake Company,Homepage,Monthly Feature,4/17/2012,1

25,41850,Yet Another Fake Company,Homepage,Monthly Feature,4/11/2012,1

 

**Company names have been changed to protect the innocent

 

This is exactly as I recieved it, except for changing the company names.

Link to comment
Share on other sites

what about the  LOAD DATA INFILE that you are using?... I don't see nothing wrong with that data format,  LOAD DATA should work

 

I thought so too  :shrug:

 

Here it is

LOAD DATA LOCAL INFILE '$file' INTO TABLE MyTable FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"' LINES TERMINATED BY '\r' IGNORE 1 LINES ($fields) SET `Datadate` = DATE_FORMAT(STR_TO_DATE(@date, '%m/%d/%Y'), '%Y-%m-%d')

 

$file is just the $_FILES['file']['tmp_name']

$fields just stores the field names

 

Using this on the original csv gets half the data filled in, half zeroed out or default. It might be worth noting that I have text and int as the types of fields in the db; all text's get populated, none of the int's do.

 

Thanks for looking at this.

Link to comment
Share on other sites

using your data example (without headers) this worked perfect for me (adjust your fields/ variables accordingly)

 

LOAD DATA INFILE '$file' 
     INTO TABLE MyTable 
     FIELDS TERMINATED BY ',' 
     (f1,f2,f3,f4,f5,@f6,f7) 
     SET `f6` = STR_TO_DATE(@f6, '%m/%d/%Y');

Link to comment
Share on other sites

That looks like what I had. I'll use yours and build off it and see what I can get, maybe something's different that I'm not noticing.

 

Could it be something with my database or that particular table? I've used this technique several other times in the same database in other parts of the site and have always had success.

 

Thanks

Link to comment
Share on other sites

Using exactly what you posted (with or without LOCAL), I get the same results as the LOAD DATA I posted; that is half the data loads, but the other half doesn't.

Link to comment
Share on other sites

well.. that could mean that your csv file has not the right format or could be corrupt... without seeing the original file is hard to tell what could be wrong with it.

 

other simple test is do what I did... copy/paste your example data and try to upload it with the provided command (yours or mine) and see if you got the same behavior.

Link to comment
Share on other sites

well.. that could mean that your csv file has not the right format or could be corrupt... without seeing the original file is hard to tell what could be wrong with it.

 

That's what I was thinking.

 

other simple test is do what I did... copy/paste your example data and try to upload it with the provided command (yours or mine) and see if you got the same behavior.

 

Now why didn't I think of that? :facepalm:

 

I'm going to do that first. Then if that works, I'll focus on the above.

 

Thanks, I'll be back.

Link to comment
Share on other sites

other simple test is do what I did... copy/paste your example data and try to upload it with the provided command (yours or mine) and see if you got the same behavior.

 

Ok, I ran it with the version based on yours and my original and both entered all the data successfully. So it's not the query.

 

well.. that could mean that your csv file has not the right format or could be corrupt...

 

Not sure how to tell if it's corrupt or not but I did try a few things regarding format (mentioned in OP) but I'd be happy to try anything, even if it is something I tried before; I was pretty frazzled and may not have been using them correctly.

 

without seeing the original file is hard to tell what could be wrong with it.

 

That could be arranged, although it holds some client info so I wouldn't be able to post it publicly.

 

Thanks for your help so far.

Link to comment
Share on other sites

Still looking for help with this issue.

 

I'm positive it's not the query, and I can do all the normal stuff with the file except get it into the db so I don't think it's corrupt.

 

Does anyone have any ideas what might be going on? I tried messing with encoding a few days ago (this seems to be the last option), but didn't reach any acceptable results, so I still need help on this.

 

Thanks.

Link to comment
Share on other sites

Found a function online to detect the encoding by checking the BOM and it's returning UTF-16 for the encoding of the csv.

 

Could this be causing my issues? I'm not really too familiar with what the differences are and how I should handle it.

 

iconv_get_encoding() shows my system is set to ISO-8859-1. If I use iconv_set_encoding() to set it to either UTF-8 or 16, it seems to have no effect when using the data - that is to say it's still not correct.

 

Any suggestions? Ideas?

Link to comment
Share on other sites

I tried to convert the encoding from UTF-16 to 8, and I get functionality equal to that of the converted file; all data gets put in correctly except the first field.

 

$filecontents = iconv('UTF-16','UTF-8',file_get_contents($file));

 

Could that be because of the BOM?

 

I tried to remove it with this code but it wasn't successful.

 

$filecontents = substr($filecontents,3);

 

Any ideas on where to go from here? What could be causing the first field to be ignored in every row?

Link to comment
Share on other sites

I think I know what's happening now. Just not why or how to fix it.

 

In an earlier attempt, I had used the below code to see if the file was corrupt and to see the format of the actual data.

 

$file = $_FILES['report']['tmp_name'];	
$filecontents = iconv('UTF-16','UTF-8',file_get_contents($file));	
file_put_contents($file, $filecontents);
$handle = fopen($file, 'r');

while (!feof($handle)) {
echo fgets($handle)."<br />";
}

 

An example of what it produced was

 

"comma,separated,values"
"comma,separated,values"
"comma,separated,values"
"comma,separated,values"
"comma,separated,values"

 

Since everything but the one first field was going in correctly, I decided to try to change the type of the field that wasn't populating. Changing it to varchar did have results, but not as expected. What is now being populated is literally "0 - with the quotation mark.

 

So for some reason, it's taking the " as part of the field, but my db field is an integer, leading to zeroed out values in that field. All the fields after are entered correctly though. I've tried to use the LINES STARTING BY and TERMINATED BY options with the LOAD DATA using the quotation mark as the line delim, but didn't get the results I expected; in fact almost nothing was loaded.

 

Any ideas on what's going on or how to fix it? I'm so close....

Link to comment
Share on other sites

I think I know what's happening now. Just not why or how to fix it.

 

Guess I was a bit off. It has nothing to do with the quote; I was using the wrong file.

 

What is happening though, is that if I set the db field to varchar it enters the data some-what correctly. It adds space in front of the data, almost like a carriage return or something, because if I go in and manually edit the data, putting in the exact same data, it doesn't include that extra space. If I set it back to INT, it zeroes out.

 

So the real question(s) are 'why would it be adding that/where is it coming from'? I'm sure this is the reason that particular field is zeroing out.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.