Jump to content

parse CSV file with line breaks and quotes


bmmayer

Recommended Posts

hey all--

 

i am trying to parse a CSV file, which contains elements that have line breaks as well as quotes.  Right now, i am using this function:

 

function file_breakdown($content,$del){
ini_set('auto_detect_line_endings','1');
$file = fopen($content, "r");
$row = 1;
while (($data = fgetcsv($file, 0, $del)) !== FALSE) {
	//loop content here
}
fclose($file);
}

 

the problem i'm having is that the function works fine when the cells that are being imported are normal--they contain no line breaks or quotes---but when it reaches a cell that has a line break, it creates a new "row" instead of maintains the current row.  the quotes within the cells further confuse the program.  in addition, i cannot designate the line breaks as being proceeded by a double quote (") because some quotes are typed inside the cells that are followed by line breaks.

 

i need help telling the program to distinguish between the quotes that separate columns and quotes that are contained within the cells!  and that doesn't think a line break within a cell means a new row.

 

can anyone help me write a function that will take a csv file and return an ACCURATE array, with each value of the array being a $del-delimited line, like the information returned for the file() function?

 

this has been driving me crazy.

 

thanks for your help,

 

-b

Link to comment
Share on other sites

i can't post exact data for privacy reasons.  this is the type of data:

 

Column 1,Column 2,Column 3,Column 4

Data 1,Data 2,Data 3,"This is a text field into which has been entered lots of data, including line breaks:

like this.  Also, there are quotes like this:

 

"This is a quote, quote, quote."

 

Notice how the quotation mark proceeded a line break?

 

This is the end of column 4."

Data 6,Data 7,Data 8,"Data Data Data"

 

...and so on

Link to comment
Share on other sites

its kinda simple really...  this is assuming, though, that it is a true CSV and the column width is static, and you are reading it byte-by-byte.

 

 

first off, you are going to want a variable, $word, which stores the current 'word chunk' that you are parsing.

what you are going to want to do is set a flagger variable, $quoted.  when you hit a quotation, you not-value this variable.  this will let you know whether or not you are inside or outside of a quote.  next, you are going to want a counter that counts the commas read.  when you hit a comma, store the current word chunk to your array and increment the $commas variable.  when the $commas variable exceeds the limit, reset it and increment your $rows variable.

 

thats just the idea... its too long for me to type out here... you should be able to figure it out.

 

 

...or you could wait for the regular expressions response to show up, which would save you the time of doing this. (though literally, you could bang your head on the keyboard and something would work with regex....)

Link to comment
Share on other sites

are u going to return the data for column 4 exactly as how it is written?

as for ur example like ds?

-->>

"This is a text field into which has been entered lots of data, including line breaks:

like this.  Also, there are quotes like this:

 

"This is a quote, quote, quote."

 

Notice how the quotation mark proceeded a line break?

 

This is the end of column 4."

 

Link to comment
Share on other sites

are u going to return the data for column 4 exactly as how it is written?

as for ur example like ds?

-->>

"This is a text field into which has been entered lots of data, including line breaks:

like this.  Also, there are quotes like this:

 

"This is a quote, quote, quote."

 

Notice how the quotation mark proceeded a line break?

 

This is the end of column 4."

 

 

yes, that's correct.  as for ss32:

 

its kinda simple really...  this is assuming, though, that it is a true CSV and the column width is static, and you are reading it byte-by-byte.

 

 

first off, you are going to want a variable, $word, which stores the current 'word chunk' that you are parsing.

what you are going to want to do is set a flagger variable, $quoted.  when you hit a quotation, you not-value this variable.  this will let you know whether or not you are inside or outside of a quote.  next, you are going to want a counter that counts the commas read.  when you hit a comma, store the current word chunk to your array and increment the $commas variable.  when the $commas variable exceeds the limit, reset it and increment your $rows variable.

 

thats just the idea... its too long for me to type out here... you should be able to figure it out.

 

 

...or you could wait for the regular expressions response to show up, which would save you the time of doing this. (though literally, you could bang your head on the keyboard and something would work with regex....)

 

it would be great if you could write something out for me; i have tried something like this and it didn't really work.

 

thanks a lot,

 

-b

Link to comment
Share on other sites

I hope you realize no one is going to write a CSV parser for you for nothing. We're all happy to help if you have specific problems, but if you want an implementation piece to be done from scratch, be willing to pay for it.

Link to comment
Share on other sites

this is rough, though it should work.

there may be a couple bugs, and the limitation is that every value must have a comma after it, newlines dont mean a new line in the CSV.

so i guess it is not truly a csv parser, but you can modify it for your needs.

 

<?php
function csvToArray($csvFile, $linelen) {
if (($contents = file_get_contents($csvFile)) === false) {
	return false;
}

$result = array();
$tarray = array();
$quoted = false;
$word = "";
for($i = 0; $i < strlen($contents); $i++) {
	//get the current character
	$char = substr($contents, $i, 1);
	//var_dump($quoted);
	echo count($tarray) . "\r\n";
	//check for the start/end of a quoted section
	if ($char == '"') {  
		$quoted = !$quoted;
	}

	//if we are not in quote mode...
	if ($quoted == false) {
		//check for commas
		if ($char == ',') {
			//print_r($tarray);
			$tarray[] = $word;
			$word = "";

			//now if we are over the limit of $linelen, then add the current temporary array to the result
			if (count($tarray) >= $linelen) {
				$result[] = $tarray;
				$tarray = array();  //reset the temporary array
			}
		}
	}

	if ($char != '"') {
		if ($char != ',' || $quoted) {
			$word .= $char;
		}
	}
}

return $result;
}
?>

 

 

Link to comment
Share on other sites

although, you could try a different approach.  (where is my edit button?!)

when you store the data, store the line break character as a different character, and then translate it when you read it.  that way, it is a true CSV file, and you dont need a complex script to parse it.

Link to comment
Share on other sites

try

<?php
function csvToArray($csvFile, $linelen) {
if (($contents = $csvFile) === false) {
	return false;
}
$fi_co = 0;
$result = array();
$tarray = array();
while ($contents){
	$word = "";
	$delim = (++$fi_co % $linelen) ? ',' : "\n";
	$pos = -1;
	do {
		if(($pos = strpos($contents, $delim, ++$pos)) === false) $pos = strlen($contents);
		$word = substr($contents, 0, $pos);
		$x = substr_count($word, '"') % 2;
		$pos;
	} while ($x) ;
	if (($fi_co % $linelen) == 1) $tarray = array($word); else $tarray[] = $word;
	if ($fi_co % $linelen == 0) $result[] = $tarray;
	$contents = substr($contents, $pos+1);
}
if ($fi_co % $linelen != 0) $result[] = $tarray;
return $result;
}
// parse CSV file with line breaks and quotes
$a = 'Column 1,Column 2,Column 3,Column 4
Data 1,Data 2,Data 3,"This is a text field into which has been entered lots of data, including line breaks:
like this.  Also, there are quotes like this:

"This is a quote, quote, quote."

Notice how the quotation mark proceeded a line break?

This is the end of column 4."
Data 6,Data 7,"Data
"sasa"
8","Data Data Data"';
$m = csvToArray($a,4);
print_r($m);
?>

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.