RobertAmsterdam Posted June 26, 2013 Share Posted June 26, 2013 Hello, perhaps this is is n00b question, well it is.... I need to read CSV file and reformat certain cells. For example: 'Name' column to have 'Last Name, First Name Middle Initial.' I want to convert Jeorge W Bush into-> Bush, Jeorge W (with comma) How can I do this? Should I do some with array? Small example would be appreciated. Google wasn't very helpfull. Quote Link to comment Share on other sites More sharing options...
ginerjm Posted June 26, 2013 Share Posted June 26, 2013 Since a CSV file depends on the commas to retain its structure, changing the comma-delimited field holding the name into one that now holds a comma as well, you'll be changing the file layout. Will that affect other users, or just you? So - if you are proceeding - if you know exactly which field in every row holds the current name value it's pretty easy. (Here's where you should have provided your attempt at coding): (for the admins - my browser doesn't have any editing tags enable in this window) <? if (!$hdl_in = fopen((your starting filename),"r")) { echo "Could not open input file"; exit(); } if (!$hdl_out = fopen((your output filename),"c")) { echo "Could not open output file"; exit(); } $in_cnt=0; $out_cnt=0; while ($row = fgets($hdl_in)) { $in_cnt++; $fields = explode(",",$row); $name = $fields[0]; // I AM ASSUMING THAT THE NAME IS IN THE 1ST FIELD OF EACH LINE $nm_parts = explode(" ",$name); // watch out for multiple spaces in names or names with more than // 3 parts to them if (count($nm_parts)>3) { $fields[0] .= "xxx"; // look for xxx in your completed file and fix manually } else { $fields[0] = $nm_parts[2].", ".$nm_parts[0]." ".$nm_parts[1]; } $newrow = implode(",",$fields); fputs($hdl_out,$newrow); $out_cnt++; } fclose($hdl_in); fclose($hdl_out); echo "File conversion complete<br>"; echo "Read $in_cnt records; wrote out $out_cnt records<br>"; exit(); Quote Link to comment Share on other sites More sharing options...
requinix Posted June 26, 2013 Share Posted June 26, 2013 (edited) Don't write your own CSV reading and writing code. fgetcsv fputcsv Edited June 26, 2013 by requinix Quote Link to comment Share on other sites More sharing options...
ginerjm Posted June 26, 2013 Share Posted June 26, 2013 I've had little success with those two functions. Don't seem to work as advertised. Quote Link to comment Share on other sites More sharing options...
requinix Posted June 26, 2013 Share Posted June 26, 2013 Never had problems myself. What doesn't work? Quote Link to comment Share on other sites More sharing options...
Psycho Posted June 26, 2013 Share Posted June 26, 2013 @Robert, Can you explain exactly what you are trying to achieve? What is the purpose of needing to change the name field, where is the data coming from, where is the file used after it is converted, etc. The reason I ask is that there may be better solutions than what you are asking. Quote Link to comment Share on other sites More sharing options...
RobertAmsterdam Posted June 26, 2013 Author Share Posted June 26, 2013 Since a CSV file depends on the commas to retain its structure, changing the comma-delimited field holding the name into one that now holds a comma as well, you'll be changing the file layout. Will that affect other users, or just you? So - if you are proceeding - if you know exactly which field in every row holds the current name value it's pretty easy. (Here's where you should have provided your attempt at coding): (for the admins - my browser doesn't have any editing tags enable in this window) <? if (!$hdl_in = fopen((your starting filename),"r")) { echo "Could not open input file"; exit(); } if (!$hdl_out = fopen((your output filename),"c")) { echo "Could not open output file"; exit(); } $in_cnt=0; $out_cnt=0; while ($row = fgets($hdl_in)) { $in_cnt++; $fields = explode(",",$row); $name = $fields[0]; // I AM ASSUMING THAT THE NAME IS IN THE 1ST FIELD OF EACH LINE $nm_parts = explode(" ",$name); // watch out for multiple spaces in names or names with more than // 3 parts to them if (count($nm_parts)>3) { $fields[0] .= "xxx"; // look for xxx in your completed file and fix manually } else { $fields[0] = $nm_parts[2].", ".$nm_parts[0]." ".$nm_parts[1]; } $newrow = implode(",",$fields); fputs($hdl_out,$newrow); $out_cnt++; } fclose($hdl_in); fclose($hdl_out); echo "File conversion complete<br>"; echo "Read $in_cnt records; wrote out $out_cnt records<br>"; exit(); Thank you Sir! I will attempt to work out this logic with me limited php skills and will get back with something. Quote Link to comment Share on other sites More sharing options...
RobertAmsterdam Posted June 26, 2013 Author Share Posted June 26, 2013 @Robert, Can you explain exactly what you are trying to achieve? What is the purpose of needing to change the name field, where is the data coming from, where is the file used after it is converted, etc. The reason I ask is that there may be better solutions than what you are asking. This is what is requested of me: 1) read in the data from the sample csv data file 2) reformat the 'Name' column to have 'Last Name, First Name Middle Initial.' example : Ray G Kummerow -> Kummerow, Ray G. 3) reformat birthdays to be YYYYMMDD 4) output reformatted data to a file called : 'cleandata.<time stamp YYYYMMDD_HHMMSS>.csv' 5) have output sorted alphabetically by state, then by name 6) any duplicates account numbers should be rejected and noted This is a bit overwhelming, so I want to take it step by step. Quote Link to comment Share on other sites More sharing options...
Psycho Posted June 26, 2013 Share Posted June 26, 2013 (edited) This will do what you need. The only thing you would need to do is define the correct column index for $nameColumn $inputfile = 'input.csv'; $outputFile = 'output.csv'; $inputHandle = fopen($inputfile, "r"); $outputHandle = fopen($outputFile, 'w'); if(!$inputHandle || !$outputHandle) { echo "Error: Unable to open file '{$csvfile}' or the ourput file '{$outputFile}'"; } else { $nameColumn = 1; while (($data = fgetcsv($inputHandle, 1000, ",")) !== FALSE) { //Retrieve the name $name = $data[$nameColumn]; //Explode the name $nameArray = explode(' ', $name); //Extract the last name $lastName = array_pop($nameArray); //Format lanstname, rest of name and put back in $data[$nameColumn] = $lastName . (count($nameArray) ? ', ' : '' ) . implode(' ', $nameArray); //Add record to new csv fputcsv($outputHandle, $data); } fclose($inputHandle); fclose($outputHandle); } EDIT: Based on your requirements you should probably read the data in, process it into an array, THEN after all processing is done, write to the new file. You'll need to do this to sort the results in the output file. Edited June 26, 2013 by Psycho Quote Link to comment Share on other sites More sharing options...
RobertAmsterdam Posted June 27, 2013 Author Share Posted June 27, 2013 Thank you so much! This indeed reformats the name string:) However, I don't completely understand this part of logic: $data[$nameColumn] = $lastName . (count($nameArray) ? ', ' : '' ) .implode(' ', $nameArray); Namely this : (count($nameArray) ? ', ' : '' ) Here you say: if you can count elements in array (true), then concatenate with ', ' else concat with empty space - ' '. (if I read it correctly). Why those elements need to be counted here? Is it only to make sure that there is a ' , ' after $lastName? Why not just concat it with coma+ space? Quote Link to comment Share on other sites More sharing options...
Psycho Posted June 27, 2013 Share Posted June 27, 2013 (edited) Thank you so much! This indeed reformats the name string:) However, I don't completely understand this part of logic: $data[$nameColumn] = $lastName . (count($nameArray) ? ', ' : '' ) .implode(' ', $nameArray); Namely this : (count($nameArray) ? ', ' : '' ) Here you say: if you can count elements in array (true), then concatenate with ', ' else concat with empty space - ' '. (if I read it correctly). Why those elements need to be counted here? Is it only to make sure that there is a ' , ' after $lastName? Why not just concat it with coma+ space? I added that based on previous experience. Think about what would happen if you were to have a record that only had a single "word" for the name? For example: "Bono". The result would be "Bono, " with an extraneous comma and space after the name. In fact, if this was my project I would add a lot more such data handling (e.g. removing repetitive spaces). In plain English the logic works like this: 1. Explode the name into multiple words based upon spaces. 2. Remove the last word from the array 3. Create a new string starting with the last word that was extracted 4. IF there are additional words left in the array append a comma and space 5. Concatenate the remaining words with a space between them and append Edited June 27, 2013 by Psycho Quote Link to comment Share on other sites More sharing options...
RobertAmsterdam Posted June 27, 2013 Author Share Posted June 27, 2013 Ok, I got it! Thanks! I also used your approach to reformat the Birthdate field like this: $inputfile = 'input.csv'; $outputFile = 'output.csv'; $inputHandle = fopen($inputfile, "r"); $outputHandle = fopen($outputFile, "w"); $BDColumn = 8; while (($data = fgetcsv($inputHandle, 1000, ",")) !== FALSE) { //Retrieve the birthdate $BD = $data[$BDColumn]; $data[$BDColumn] = date('Ymd', strtotime($BD)); //Add record to new csv fputcsv($outputHandle, $data); } fclose($inputHandle); fclose($outputHandle); It works as desired ( takes a string like 10/09/1973 and makes it YYYYMMDD), but the field name disappears in the first raw, and instead of 'Birthdate' there is a value there, so there is no column name anymore, when in your code, which is almost identical, this doesn't happen... Any idea why this happened? Quote Link to comment Share on other sites More sharing options...
Psycho Posted June 27, 2013 Share Posted June 27, 2013 It works as desired ( takes a string like 10/09/1973 and makes it YYYYMMDD), but the field name disappears in the first raw, and instead of 'Birthdate' there is a value there, so there is no column name anymore, when in your code, which is almost identical, this doesn't happen... Any idea why this happened? You never stated there was a header row! The reason why it works for the name column header is probably due to the line of code you asked me about previously to handle situations where there is only one "word" in a name to not add a comma/space. So, I assume the column header is a single word. If I had not put in that line, the name header would have been manipulated. The reason you are seeing the problem with the date header is because of this line $data[$BDColumn] = date('Ymd', strtotime($BD)); It is trying to convert the value to a timestamp and then into a formatted date string. The first part is failing since a date that cannot be interpreted as a date can be converted to a timestamp and is returning false. Then the date() function is using the value of FALST to try and create a formatted string and defaulting to something close to the EPOX date. I see two choices. 1) Skip the first row (or however many rows there are up to the header row). Just create a variable before the loop called $header and set to false. Then in the loop add something like this if(!$header) { $header = true; continue; } 2. Modify the code to not convert the string to a date. Start by first attempting to convert to a timestamp. If the value of that result is false do not change the value. $timestamp = stringtotime($BD); if($timestamp != false) { data[$BDColumn] = date('Ymd', $timestamp); } Quote Link to comment Share on other sites More sharing options...
RobertAmsterdam Posted June 27, 2013 Author Share Posted June 27, 2013 Yes, this code makes sure timestamp is put only into the rows bellow header. $timestamp = stringtotime($BD); if($timestamp) { data[$BDColumn] = date('Ymd', $timestamp); } Thanks a lot for this lesson. I am amazed by your knowledge. I could only dream about this. Btw, could you give me a tip on best way to learn php, Psycho? I seem to understand the the rules but obviously don't know how to create the logic itself:( There is another part to this thing: I need too have names alphabetically sorted in output file. Could you recommend a method for this? Now it is clear that it should begin with loop as before: $inputfile = 'output.csv'; $outputFile = 'input.csv'; $inputHandle = fopen($inputfile, "r"); $outputHandle = fopen($outputFile, 'w'); while (($data = fgetcsv($inputHandle, 1024, ",")) !== FALSE) { $nameColumn = 1; $name = $data[$nameColumn]; $nameArray = explode(' ', $name); $data = sort($nameArray, SORT_ASC); // this is not producing output sorted by name fputcsv($outputHandle, $data); } fclose($inputHandle); fclose($outputHandle); What is the way to approach it? I looked at various sort() functions in php.net but they don't seem to offer, or at least I didn't find the right one. If I understand correctly I need to read file again, get an array of names, and sort it somehow before sending it to output file... Quote Link to comment Share on other sites More sharing options...
Psycho Posted June 27, 2013 Share Posted June 27, 2013 Thanks a lot for this lesson. I am amazed by your knowledge. I could only dream about this. Btw, could you give me a tip on best way to learn php, Psycho? I seem to understand the the rules but obviously don't know how to create the logic itself:( I've learned a ton by visiting forums such as this one. Look at interesting problems people post (such as yours) then look at the solutions people provide. Then step through the solution line by line to understand what it is doing - and more importantly why. If you don't know what a function does, go look it up. Then, try and come up with your own solutions to peoples problems. Chances are others will respond before you have a chance to complete yours, but maybe you did some piece that was more efficient than what someone posts. There is another part to this thing: I need too have names alphabetically sorted in output file. Could you recommend a method for this? . . . . What is the way to approach it? I looked at various sort() functions in php.net but they don't seem to offer, or at least I didn't find the right one. If I understand correctly I need to read file again, get an array of names, and sort it somehow before sending it to output file... You should NOT run another loop. You only need to run ONE loop. I already provided an explanation of how this could be accomplished EDIT: Based on your requirements you should probably read the data in, process it into an array, THEN after all processing is done, write to the new file. You'll need to do this to sort the results in the output file. Let me expand on this a bit: Have one loop to read the input file. When the loop runs do all of the data transformations you need to make, BUT do not write the data to the new file. Instead, after the transformations, copy the row into a new array. Then once the loop has processed all the line, run a function to sort the array. Lastly, run a foreach() loop on the array to write the data to the new file. To do the sorting, I would use usort() which allows you to create a custom function for the sorting logic. Quote Link to comment Share on other sites More sharing options...
RobertAmsterdam Posted June 28, 2013 Author Share Posted June 28, 2013 Ok, this is how I attempt to do it: $nameColumn = 1; while (($data = fgetcsv($inputHandle, 1024, ",")) !== FALSE) { $name = $data[$nameColumn]; // getting String of names $nameArray = explode(' ', $name); // make array print_r($nameArray) ; function cmp($a, $b) { // using callback method as recommended in manual if ($a == $b) { return 0; } return ($a < $b) ? -1 : 1; } usort($nameArray, 'cmp'); foreach ($nameArray as $key => $value) { echo "$key: $value\n"; // this should produce sorted names } but I get Fatal error: Cannot redeclare cmp() (previously declared in...) What do I miss? Quote Link to comment Share on other sites More sharing options...
RobertAmsterdam Posted June 28, 2013 Author Share Posted June 28, 2013 I see the problem with error was that I shouldn't have put function into the loop. It shoots out an array of names, but they don't seen to be sorted in alphabetical order Quote Link to comment Share on other sites More sharing options...
RobertAmsterdam Posted June 28, 2013 Author Share Posted June 28, 2013 Update: I managed to sort the values like this: $inputHandle = fopen($inputfile, "r"); $outputHandle = fopen($outputFile, 'w'); $nameColumn = 1; $lastnameArray = array(); while (($data = fgetcsv($inputHandle, 1024, ",")) !== FALSE) { $name = $data[$nameColumn]; $nameArray = explode(' ', $name); $lastName = array_pop($nameArray); $lastnameArray [] = $lastName; array_multisort($lastnameArray); - now $lastnameArray has values sorted alphabetically fputcsv($outputHandle, $data); } foreach ($lastnameArray as $value) { //$lastnameSorted = implode(',', $lastnameArray); $data[$nameColumn] = $value ; print("<pre>".print_r($data[$nameColumn], true)."</pre>"); } however I obviously miss something in foreach loop as the output does not have names sorted, coud you have a look into how to modify the code? Thanks Quote Link to comment Share on other sites More sharing options...
Psycho Posted June 28, 2013 Share Posted June 28, 2013 Please go back and look at my instructions again. You should FIRST run a loop to process the data and make all the data transformations - putting the data into a temporary array. Then, AFTER THE LOOP IS DONE processing all the records, sort the data in the array as you need it. Lastly, iterate over the array to now write out the sorted. modified data to a new file. The code you have above doesn't really make sense. For one, it is trying to sort the records on each iteration of the loop - which is a waste of resources. Second, it is only sorting an array of the last names. You need the records sorted using the last name of the records. Use this as a framework and fill in the gaps <?php //Define the input/output files $inputFile = 'input.csv'; $outputFile = 'input.csv'; $nameColumn = 1; function formatName($name) { $nameArray = explode(' ', $name); //Extract the last name $lastName = array_pop($nameArray); //Format lanstname, rest of name and put back in string $name = $lastName . (count($nameArray) ? ', ' : '' ) . implode(' ', $nameArray); return $name } $dateColumn = 3; function formatDate($date) { //Add steps to transform the date // . . . // . . . // . . . return $date; } $inputHandle = fopen($inputfile, "r"); $outputHandle = fopen($outputFile, 'w'); if(!$inputHandle || !$outputHandle) { $errorMsg = ''; if(!$inputHandle) { $errorMsg .= "Error: nable to open input file.<br>"; } if(!$outputHandle) { $errorMsg .= "Error: Unable to open/create output file.<br>"; } die($errorMsg); } //Temporary array to hold processed data $tempData = array(); //Flag to not process the header data $headerRow = true; //Process the input data while (($dataRow = fgetcsv($inputHandle, 1024, ",")) !== FALSE) { //Do not process the first row of data if($headerRow) { //Set flag so additional rows will be processed $headerRow = false; //Modify the name $data[$nameColumn] = formatName($dataRow[$nameColumn]); //Modify the date $data[$dateColumn] = formatDate($dataRow[$dateColumn]); } //Add record to temp array $tempData = $dataRow; } //Records are processed. NOW add process to sort the tempData array // . . . // . . . // . . . //After records are storted, put into output file foreach ($lastnameArray as $dataRow) { fputcsv($outputHandle, $dataRow); } //Close the files fclose($inputHandle); fclose($outputHandle); ?> Quote Link to comment Share on other sites More sharing options...
RobertAmsterdam Posted June 29, 2013 Author Share Posted June 29, 2013 Thank you for the framework. I managed to everything except for the last part where foreach loop should write final data to the file. I get Warning: fputcsv() expects parameter 2 to be array, string given this is shortened version of the code: $inputfile = 'Data.csv'; $outputFile = 'cleandata.'.date("Ymd_His"). 'output.csv'; $nameColumn = 1; $BDColumn = 8; function formatName($name) { $nameArray = explode(' ', $name); //Extract the last name $lastName = array_pop($nameArray); //Format lanstname, rest of name and put back in string $name = $lastName . (count($nameArray) ? ', ' : '' ) . implode(' ', $nameArray); return $name;; } function formatDate($date) { $timestamp = strtotime($date); //if there is timestamp in a row, do data bd conversion else it is header field if($timestamp) { $date = date('Ymd', $timestamp); } return $date; } $inputHandle = fopen($inputfile, "r"); $outputHandle = fopen($outputFile, 'w'); //Temporary array to hold processed data $tempData = array(); //Flag to not process the header data $headerRow = true; //Process the input data while (($dataRow = fgetcsv($inputHandle, 1024, ",")) !== FALSE) { //Do not process the first row of data if($headerRow) { //Set flag so additional rows will be processed $headerRow = false; //Modify the name $dataRow[$nameColumn] = formatName($dataRow[$nameColumn]); // and date $dataRow[$BDColumn] = formatDate($dataRow[$BDColumn]); } //Add record to temp array $tempData = $dataRow; //now $tempData contains modified rows $name1 = $tempData[$nameColumn]; //extracting names again to sort $ar[] = $name1; //putting them in array } array_multisort($ar); //now ar has all names sorted alphabetically foreach($ar as $value) { $tempData[$nameColumn] = $value; //putting back sorted names } now $tempData array has all requirements then I try to put it here // foreach ($tempData as $dataRow) { fputcsv($outputHandle, $dataRow); - this gives an error : fputcsv() expects parameter 2 to be array, string given } //Close the files fclose($inputHandle); fclose($outputHandle); What did I miss? Quote Link to comment Share on other sites More sharing options...
Psycho Posted June 29, 2013 Share Posted June 29, 2013 Output the array before and after you modify it. I'm not following the logic you are using to try and sort the array. But, that is the problem. When you process the data you create an array of JUST the last names in the variable $ar. Then you use array_multisort() even though it is only a single dimensional array (sort() would have worked the same). But, THEN you do this foreach($ar as $value) { $tempData[$nameColumn] = $value; //putting back sorted names } Since the variable $nameColumn is not changing on each iteration of the loop, all that loop does is continually overwrite the index $nameColumn with each last name. So, when you are done the variable $tempData has all the records that were generated during the processing PLUS it has a new record which is not a sub array, but is only a single element of the very last last_name. So, when you try to use fputscsv() it is failing on that record. Don't use the $ar array at all or array_multuisort(). After processing the input, use usort() to sort the array. Check the manual on how to use it - but you will need to create a function I think this should work function sortMyArray($a, $b) { $nameColumn = 1; return strcmp($a[$nameColumn], $b[$nameColumn]); } Quote Link to comment Share on other sites More sharing options...
RobertAmsterdam Posted June 29, 2013 Author Share Posted June 29, 2013 Hmm, I'm confused. If you mean: usort($tempData, 'cmp'); print("<pre>".print_r($tempData, true)."</pre>"); // function cmp($a, $b) { $nameColumn = 1; return strcmp($a["Name"], $b["Name"]); } after while loop, this produces only one completely messed up record, like this: Array ( [0] => 01-380378 [1] => 02/04/1973 [2] => 1315 E Hillside, #63 [3] => 540.31 [4] => 154.00 [5] => 166.00 [6] => 78041-0986 [7] => GA|RA|LA [8] => TX [9] => Laredo [10] => Derksen Printing [11] => Roger W Baugher [12] => ttake@hotmail.com ) unlike arr, which I removed. I checked the manual for usort(), it seems to be short and examples are very basic...could'n see how to apply it tho this case... Quote Link to comment Share on other sites More sharing options...
Psycho Posted June 30, 2013 Share Posted June 30, 2013 after while loop, this produces only one completely messed up record, like this: Did you verify the array contents before you do the sorting? FYI: You changed the sorting function I provided. what you changed wouldn't sort the array, but I don't think it would have removed the records. I checked the manual for usort(), it seems to be short and examples are very basic...could'n see how to apply it tho this case... I gave you what should be a working solution for using usort() in this particular case. Quote Link to comment Share on other sites More sharing options...
RobertAmsterdam Posted June 30, 2013 Author Share Posted June 30, 2013 This is interesting, $tempData inside while loop contains all the pre-formated records, but when I print it outside the loop, it contains only one record...:/ I don't understand where is it being overwritten? while (($dataRow = fgetcsv($inputHandle, 1024, ",")) !== FALSE) { //Do not process the first row of data if($headerRow) { //Set flag so additional rows will be processed //$headerRow = false; //-- had to remove it, because if this line is executed, nothing is formatted //Modify the name $dataRow[$nameColumn] = formatName($dataRow[$nameColumn]); //Modify the date $dataRow[$BDColumn] = formatDate($dataRow[$BDColumn]); } //Add record to temp array $tempData = $dataRow; //print("<pre>".print_r($tempData, true)."</pre>"); //-- has all the records } print("<pre>".print_r($tempData, true)."</pre>"); //-- has only one record, what changed it? Quote Link to comment Share on other sites More sharing options...
Psycho Posted June 30, 2013 Share Posted June 30, 2013 Ah, here is the cause //Add record to temp array $tempData = $dataRow; That line is overwriting $tempData with the value of $dataRow (the current record). We want to add/append the current records to $tempData //Add record to temp array $tempData[] = $dataRow; Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.