xyph Posted June 1, 2011 Share Posted June 1, 2011 I was wondering if you guys could help me 'break' a function I've been working on. It converts a CSV-formatted string to a 2d array following RFC4180. Here's the function. /** * * Covert a multi-line CSV string into a 2d array. Follows RFC 4180, allows * "cells with ""escaped delimiters""" and multi-line enclosed cells * It assumes the CSV file is properly formatted, and doesn't check for errors * in CSV format. * @param string $str The CSV string * @param string $d The delimiter between values * @param string $e The enclosing character * @param bool $crlf Set to true if your CSV file should return carriage return * and line feed (CRLF should be returned according to RFC 4180 * @return array */ function csv_explode( $str, $d=',', $e='"', $crlf=TRUE ) { // Convert CRLF to LF, easier to work with in regex if( $crlf ) $str = str_replace("\r\n","\n",$str); // Get rid of trailing linebreaks that RFC4180 allows $str = trim($str); // Do the dirty work if ( preg_match_all( '/(?: '.$e.'((?:[^'.$e.']|'.$e.$e.')*+)'.$e.'(?:'.$d.'|\n|$) # match enclose, then match either non-enclose or double-enclose # zero to infinity times (possesive), then match another enclose, # followed by a comma, linebreak, or string end | ####### OR ####### ([^'.$d.'\n]*+)(?:['.$d.'\n]|$) # match anything thats not a comma or linebreak zero to infinity # times (possesive), then match either a comma or a linebreak or # string end )/x', $str, $ms, PREG_SET_ORDER ) === FALSE ) return FALSE; // Initialize vars, $r will hold our return data, $i will track which line we're on $r = array(); $i = 0; // Loop through results foreach( $ms as $m ) { // If the first group of matches is empty, the cell has no quotes if( empty($m[1]) ) // Put the CRLF back in if needed $r[$i][] = ($crlf == TRUE) ? str_replace("\n","\r\n",$m[2]) : $m[2]; else { // The cell was quoted, so we want to convert any "" back to " and // any LF back to CRLF, if needed $r[$i][] = ($crlf == TRUE) ? str_replace( array("\n",$e.$e), array("\r\n",$e), $m[1]) : str_replace($e.$e, $e, $m[1]); } // If the raw match doesn't have a delimiter, it must be the last in the // row, so we increment our line count. if( substr($m[0],-1) != $d ) $i++; } // An empty array will exist due to $ being a zero-length match, so remove it array_pop( $r ); return $r; } And to use it: $csv = 'this,will,"be ""separated""",by "commas,",,"should work with ""multiline,"", ",entries some,last,data,"test"'; print_r( csv_explode($csvn) ); or $csv_eurwin = "this;will;'be ''separated''';by\r\n"; $csv_eurwin .= "'semicolons;';;'should work with\r\n"; $csv_eurwin .= "''multiline;'';';entries\r\n"; $csv_eurwin .= "some;'last';data;'test'"; print_r( csv_explode($csv_eurwin, ';', '\'', TRUE) ); Thanks! Here's the actual spec if anyone cares 1. Each record is located on a separate line, delimited by a line break (CRLF). For example: aaa,bbb,ccc CRLF zzz,yyy,xxx CRLF 2. The last record in the file may or may not have an ending line break. For example: aaa,bbb,ccc CRLF zzz,yyy,xxx 3. There maybe an optional header line appearing as the first line of the file with the same format as normal record lines. This header will contain names corresponding to the fields in the file and should contain the same number of fields as the records in the rest of the file (the presence or absence of the header line should be indicated via the optional "header" parameter of this MIME type). For example: field_name,field_name,field_name CRLF aaa,bbb,ccc CRLF zzz,yyy,xxx CRLF 4. Within the header and each record, there may be one or more fields, separated by commas. Each line should contain the same number of fields throughout the file. Spaces are considered part of a field and should not be ignored. The last field in the record must not be followed by a comma. For example: aaa,bbb,ccc 5. Each field may or may not be enclosed in double quotes (however some programs, such as Microsoft Excel, do not use double quotes at all). If fields are not enclosed with double quotes, then double quotes may not appear inside the fields. For example: "aaa","bbb","ccc" CRLF zzz,yyy,xxx 6. Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes. For example: "aaa","b CRLF bb","ccc" CRLF zzz,yyy,xxx 7. If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example: "aaa","b""bb","ccc" Quote Link to comment https://forums.phpfreaks.com/topic/238056-help-me-break-my-script-please/ Share on other sites More sharing options...
kenrbnsn Posted June 1, 2011 Share Posted June 1, 2011 How is this function different than the function fgetcsv? Ken Quote Link to comment https://forums.phpfreaks.com/topic/238056-help-me-break-my-script-please/#findComment-1223333 Share on other sites More sharing options...
xyph Posted June 1, 2011 Author Share Posted June 1, 2011 User comment from the manual do not spam aleske at live dot ru 08-Jul-2010 09:38 The PHP's CSV handling stuff is non-standard and contradicts with RFC4180, thus fgetcsv() cannot properly deal with files like this example from Wikipedia: 1997,Ford,E350,"ac, abs, moon",3000.00 1999,Chevy,"Venture ""Extended Edition""","",4900.00 1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00 1996,Jeep,Grand Cherokee,"MUST SELL! air, moon roof, loaded",4799.00 His code sample wasn't quite as elegant as mine, and most other examples use preg_match in a loop. Just want to make sure everything is solid in mine. The example I posted showed me an error! Empty quoted fields are throwing a notice. Fixed Quote Link to comment https://forums.phpfreaks.com/topic/238056-help-me-break-my-script-please/#findComment-1223348 Share on other sites More sharing options...
kenrbnsn Posted June 1, 2011 Share Posted June 1, 2011 Thanks for that explanation. I never knew of that RFC and I've been using/managing/programing computers since the days of the punch cards! Ken Quote Link to comment https://forums.phpfreaks.com/topic/238056-help-me-break-my-script-please/#findComment-1223352 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.