Jump to content

Reformatting CSV cell


RobertAmsterdam

Recommended Posts

Hello, perhaps this is is n00b question, well it is....

 

I need to read CSV file and reformat certain cells. For example: 

 

'Name' column to have 'Last Name, First Name  Middle Initial.'

 I want to convert Jeorge W Bush  into-> Bush, Jeorge W (with comma)
 
How can I do this? Should I do some with array? Small example would be appreciated. Google wasn't very helpfull.
 
 
Link to post
Share on other sites

Since a CSV file depends on the commas to retain its structure, changing the comma-delimited field holding the name into one that now holds a comma as well, you'll be changing the file layout. Will that affect other users, or just you?

 

So - if you are proceeding - if you know exactly which field in every row holds the current name value it's pretty easy. (Here's where you should have provided your attempt at coding):

 

(for the admins - my browser doesn't have any editing tags enable in this window)

 

<?

if (!$hdl_in = fopen((your starting filename),"r"))

{

echo "Could not open input file";

exit();

}

if (!$hdl_out = fopen((your output filename),"c"))

{

echo "Could not open output file";

exit();

}

$in_cnt=0;

$out_cnt=0;

while ($row = fgets($hdl_in))

{

$in_cnt++;

$fields = explode(",",$row);

$name = $fields[0]; // I AM ASSUMING THAT THE NAME IS IN THE 1ST FIELD OF EACH LINE

$nm_parts = explode(" ",$name); // watch out for multiple spaces in names or names with more than

// 3 parts to them

if (count($nm_parts)>3)

{

$fields[0] .= "xxx"; // look for xxx in your completed file and fix manually

}

else

{

$fields[0] = $nm_parts[2].", ".$nm_parts[0]." ".$nm_parts[1];

}

$newrow = implode(",",$fields);

fputs($hdl_out,$newrow);

$out_cnt++;

}

 

fclose($hdl_in);

fclose($hdl_out);

echo "File conversion complete<br>";

echo "Read $in_cnt records; wrote out $out_cnt records<br>";

exit();

Link to post
Share on other sites

@Robert,

 

Can you explain exactly what you are trying to achieve? What is the purpose of needing to change the name field, where is the data coming from, where is the file used after it is converted, etc. The reason I ask is that there may be better solutions than what you are asking.

Link to post
Share on other sites

Since a CSV file depends on the commas to retain its structure, changing the comma-delimited field holding the name into one that now holds a comma as well, you'll be changing the file layout. Will that affect other users, or just you?

 

So - if you are proceeding - if you know exactly which field in every row holds the current name value it's pretty easy. (Here's where you should have provided your attempt at coding):

 

(for the admins - my browser doesn't have any editing tags enable in this window)

 

<?

if (!$hdl_in = fopen((your starting filename),"r"))

{

echo "Could not open input file";

exit();

}

if (!$hdl_out = fopen((your output filename),"c"))

{

echo "Could not open output file";

exit();

}

$in_cnt=0;

$out_cnt=0;

while ($row = fgets($hdl_in))

{

$in_cnt++;

$fields = explode(",",$row);

$name = $fields[0]; // I AM ASSUMING THAT THE NAME IS IN THE 1ST FIELD OF EACH LINE

$nm_parts = explode(" ",$name); // watch out for multiple spaces in names or names with more than

// 3 parts to them

if (count($nm_parts)>3)

{

$fields[0] .= "xxx"; // look for xxx in your completed file and fix manually

}

else

{

$fields[0] = $nm_parts[2].", ".$nm_parts[0]." ".$nm_parts[1];

}

$newrow = implode(",",$fields);

fputs($hdl_out,$newrow);

$out_cnt++;

}

 

fclose($hdl_in);

fclose($hdl_out);

echo "File conversion complete<br>";

echo "Read $in_cnt records; wrote out $out_cnt records<br>";

exit();

 

Thank you Sir!

 

I will attempt to work out this logic with me limited php skills and will get back with something.

Link to post
Share on other sites

@Robert,

 

Can you explain exactly what you are trying to achieve? What is the purpose of needing to change the name field, where is the data coming from, where is the file used after it is converted, etc. The reason I ask is that there may be better solutions than what you are asking.

 

This is what is requested of me:

 

 
 1) read in the data from the sample csv data file
 2) reformat the 'Name' column to have 'Last Name, First Name  Middle Initial.'
       example : Ray G Kummerow -> Kummerow, Ray G.
 3) reformat birthdays to be YYYYMMDD
 4) output reformatted data to a file called : 'cleandata.<time stamp YYYYMMDD_HHMMSS>.csv'
 5) have output sorted alphabetically by state, then by name

 6) any duplicates account numbers should be rejected and noted 

 

 

This is a bit overwhelming, so I want to take it step by step.

Link to post
Share on other sites

This will do what you need. The only thing you would need to do is define the correct column index for $nameColumn

$inputfile  = 'input.csv';
$outputFile = 'output.csv';

$inputHandle = fopen($inputfile, "r");
$outputHandle = fopen($outputFile, 'w');

if(!$inputHandle || !$outputHandle)
{
    echo "Error: Unable to open file '{$csvfile}' or the ourput file '{$outputFile}'";
}
else
{
    $nameColumn = 1;

    while (($data = fgetcsv($inputHandle, 1000, ",")) !== FALSE)
    {
        //Retrieve the name
        $name = $data[$nameColumn];
        //Explode the name
        $nameArray = explode(' ', $name);
        //Extract the last name
        $lastName = array_pop($nameArray);
        //Format lanstname, rest of name and put back in
        $data[$nameColumn] = $lastName . (count($nameArray) ? ', ' : '' ) . implode(' ', $nameArray);
        //Add record to new csv
        fputcsv($outputHandle, $data);
    }
    fclose($inputHandle);
    fclose($outputHandle);
}

EDIT: Based on your requirements you should probably read the data in, process it into an array, THEN after all processing is done, write to the new file. You'll need to do this to sort the results in the output file.

Link to post
Share on other sites

Thank you so much! This indeed reformats the name string:)

 

However, I don't completely understand this part of logic:

$data[$nameColumn] = $lastName . (count($nameArray) ? ', ' : '' ) .implode(' ', $nameArray);

Namely this : (count($nameArray) ? ', ' : '' )

 

Here you say: if you can count elements in array (true), then concatenate with ', ' else concat with empty space - ' '. (if I read it correctly).  

Why those elements need to be counted here? Is it only to make sure that there is a ' , ' after $lastName? Why not just concat it with coma+ space?

Link to post
Share on other sites

Thank you so much! This indeed reformats the name string:)

 

However, I don't completely understand this part of logic:

$data[$nameColumn] = $lastName . (count($nameArray) ? ', ' : '' ) .implode(' ', $nameArray);

Namely this : (count($nameArray) ? ', ' : '' )

 

Here you say: if you can count elements in array (true), then concatenate with ', ' else concat with empty space - ' '. (if I read it correctly).  

Why those elements need to be counted here? Is it only to make sure that there is a ' , ' after $lastName? Why not just concat it with coma+ space?

 

I added that based on previous experience. Think about what would happen if you were to have a record that only had a single "word" for the name? For example: "Bono". The result would be "Bono, " with an extraneous comma and space after the name. In fact, if this was my project I would add a lot more such data handling (e.g. removing repetitive spaces). In plain English the logic works like this:

 

1. Explode the name into multiple words based upon spaces.

2. Remove the last word from the array

3. Create a new string starting with the last word that was extracted

4. IF there are additional words left in the array append a comma and space

5. Concatenate the remaining words with a space between them and append

Link to post
Share on other sites

Ok, I got it! Thanks!

 

I also used your approach to reformat the Birthdate field like this:

$inputfile  = 'input.csv';
$outputFile = 'output.csv';


$inputHandle = fopen($inputfile, "r");
$outputHandle = fopen($outputFile, "w");

$BDColumn = 8;
while (($data = fgetcsv($inputHandle, 1000, ",")) !== FALSE)
    {
        //Retrieve the birthdate
        $BD = $data[$BDColumn];
        
        $data[$BDColumn] =  date('Ymd', strtotime($BD));
        //Add record to new csv
        fputcsv($outputHandle, $data);
   
    }
    fclose($inputHandle);
    fclose($outputHandle);

It works as desired ( takes a string like 10/09/1973 and makes it YYYYMMDD), but  the field name disappears in the first  raw, and instead of 'Birthdate' there is a value there, so there is no column name anymore, when in your code, which is almost identical, this doesn't happen...

 

Any idea why this happened?

Link to post
Share on other sites

It works as desired ( takes a string like 10/09/1973 and makes it YYYYMMDD), but  the field name disappears in the first  raw, and instead of 'Birthdate' there is a value there, so there is no column name anymore, when in your code, which is almost identical, this doesn't happen...

 

Any idea why this happened?

 

You never stated there was a header row! The reason why it works for the name column header is probably due to the line of code you asked me about previously to handle situations where there is only one "word" in a name to not add a comma/space. So, I assume the column header is a single word. If I had not put in that line, the name header would have been manipulated. The reason you are seeing the problem with the date header is because of this line

 

$data[$BDColumn] =  date('Ymd', strtotime($BD));

 

It is trying to convert the value to a timestamp and then into a formatted date string. The first part is failing since a date that cannot be interpreted as a date can be converted to a timestamp and is returning false. Then the date() function is using the value of FALST to try and create a formatted string and defaulting to something close to the EPOX date.

 

I see two choices.

 

1) Skip the first row (or however many rows there are up to the header row). Just create a variable before the loop called $header and set to false. Then in the loop add something like this

 

if(!$header)
{
    $header = true;
    continue;
}

 

2. Modify the code to not convert the string to a date. Start by first attempting to convert to a timestamp. If the value of that result is false do not change the value.

 

    $timestamp = stringtotime($BD);
    if($timestamp != false)
    {
        data[$BDColumn] =  date('Ymd', $timestamp);
    }
 
Link to post
Share on other sites

Yes, this code makes sure timestamp is put only into the rows bellow header.


    $timestamp = stringtotime($BD);
    if($timestamp)
    {
        data[$BDColumn] =  date('Ymd', $timestamp);
    }
 

Thanks a lot for this lesson. I am amazed by your knowledge. I could only dream about this. Btw, could you give me a tip on best way to learn php, Psycho? 

I seem to understand the the rules but obviously don't know how to create the logic itself:( 

 

 

There is another part to this thing: I need too have names alphabetically sorted in output file. Could you recommend a method for this?

 

Now it is clear that it should begin with loop as before:

 

 

$inputfile  = 'output.csv';
$outputFile = 'input.csv';


$inputHandle = fopen($inputfile, "r");
$outputHandle = fopen($outputFile, 'w');

while (($data = fgetcsv($inputHandle, 1024, ",")) !== FALSE) {

$nameColumn = 1;
$name = $data[$nameColumn];


$nameArray = explode(' ', $name);
          
$data = sort($nameArray, SORT_ASC); // this is not producing output sorted by name  

fputcsv($outputHandle, $data);

}


fclose($inputHandle);
fclose($outputHandle);

What is the way to approach it? I looked at various sort() functions in php.net but they don't seem to offer, or at least I didn't find the right one.

 

If I understand correctly I need to read file again, get an array of names, and sort it somehow before sending it to output file...

Link to post
Share on other sites
Thanks a lot for this lesson. I am amazed by your knowledge. I could only dream about this. Btw, could you give me a tip on best way to learn php, Psycho? 

I seem to understand the the rules but obviously don't know how to create the logic itself:(

 

I've learned a ton by visiting forums such as this one. Look at interesting problems people post (such as yours) then look at the solutions people provide. Then step through the solution line by line to understand what it is doing - and more importantly why. If you don't know what a function does, go look it up. Then, try and come up with your own solutions to peoples problems. Chances are others will respond before you have a chance to complete yours, but maybe you did some piece that was more efficient than what someone posts.

 

There is another part to this thing: I need too have names alphabetically sorted in output file. Could you recommend a method for this?

 

. . . .

 

What is the way to approach it? I looked at various sort() functions in php.net but they don't seem to offer, or at least I didn't find the right one.

 

If I understand correctly I need to read file again, get an array of names, and sort it somehow before sending it to output file...

 

You should NOT run another loop. You only need to run ONE loop. I already provided an explanation of how this could be accomplished

 

 

EDIT: Based on your requirements you should probably read the data in, process it into an array, THEN after all processing is done, write to the new file. You'll need to do this to sort the results in the output file.

 

Let me expand on this a bit:

 

Have one loop to read the input file. When the loop runs do all of the data transformations you need to make, BUT do not write the data to the new file. Instead, after the transformations, copy the row into a new array. Then once the loop has processed all the line, run a function to sort the array. Lastly, run a foreach() loop on the array to write the data to the new file.

 

To do the sorting, I would use usort() which allows you to create a custom function for the sorting logic.

Link to post
Share on other sites

Ok, this is how I attempt to do it:

$nameColumn = 1;


while (($data = fgetcsv($inputHandle, 1024, ",")) !== FALSE) {
    
    $name = $data[$nameColumn]; // getting String of names
    $nameArray = explode(' ', $name); // make array
    
    print_r($nameArray) ;


    function cmp($a, $b) { // using callback method as recommended in manual
    if ($a == $b) {
    return 0;
    }
       return ($a < $b) ? -1 : 1;
    }
     
    usort($nameArray, 'cmp');


    foreach ($nameArray as $key => $value) {
        echo "$key: $value\n";                    // this should produce sorted names
    }
     

but I get 

 

Fatal error: Cannot redeclare cmp() (previously declared in...)

 

What do I miss?

Link to post
Share on other sites

Update: I managed to sort the values like this:

$inputHandle = fopen($inputfile, "r");
$outputHandle = fopen($outputFile, 'w');


$nameColumn = 1;
$lastnameArray = array();


while (($data = fgetcsv($inputHandle, 1024, ",")) !== FALSE) {
    
    $name = $data[$nameColumn];
    
    $nameArray = explode(' ', $name);
    
    $lastName = array_pop($nameArray);
    
    $lastnameArray [] = $lastName;


    array_multisort($lastnameArray); - now $lastnameArray has values sorted alphabetically 
        
     
        fputcsv($outputHandle, $data);
}


foreach ($lastnameArray as $value) { 
       
        //$lastnameSorted = implode(',', $lastnameArray);
        $data[$nameColumn] = $value ;
        print("<pre>".print_r($data[$nameColumn], true)."</pre>");
}
    

however I obviously miss something in foreach loop as the output does not have names sorted,

 

coud you have a look into how to modify the code?

 

Thanks

Link to post
Share on other sites

Please go back and look at my instructions again. You should FIRST run a loop to process the data and make all the data transformations - putting the data into a temporary array. Then, AFTER THE LOOP IS DONE processing all the records, sort the data in the array as you need it. Lastly, iterate over the array to now write out the sorted. modified data to a new file.
 

The code you have above doesn't really make sense. For one, it is trying to sort the records on each iteration of the loop - which is a waste of resources. Second, it is only sorting an array of the last names. You need the records sorted using the last name of the records.

 

Use this as a framework and fill in the gaps

 

<?php

//Define the input/output files
$inputFile = 'input.csv';
$outputFile = 'input.csv';

$nameColumn = 1;
function formatName($name)
{
    $nameArray = explode(' ', $name);
    //Extract the last name
    $lastName = array_pop($nameArray);
    //Format lanstname, rest of name and put back in string
    $name = $lastName . (count($nameArray) ? ', ' : '' ) . implode(' ', $nameArray);
    return $name
}

$dateColumn = 3;
function formatDate($date)
{
    //Add steps to transform the date
    // . . .
    // . . .
    // . . .
    return $date;
}

$inputHandle = fopen($inputfile, "r");
$outputHandle = fopen($outputFile, 'w');

if(!$inputHandle || !$outputHandle)
{
    $errorMsg = '';
    if(!$inputHandle)
    {
        $errorMsg .= "Error: nable to open input file.<br>";
    }
    if(!$outputHandle)
    {
        $errorMsg .= "Error: Unable to open/create output file.<br>";
    }
    die($errorMsg);
}

//Temporary array to hold processed data
$tempData = array();
//Flag to not process the header data
$headerRow = true;
//Process the input data
while (($dataRow = fgetcsv($inputHandle, 1024, ",")) !== FALSE)
{
    //Do not process the first row of data
    if($headerRow)
    {
        //Set flag so additional rows will be processed
        $headerRow = false;

        //Modify the name
        $data[$nameColumn] = formatName($dataRow[$nameColumn]);

        //Modify the date
        $data[$dateColumn] = formatDate($dataRow[$dateColumn]);
    }

    //Add record to temp array
    $tempData = $dataRow;
}


//Records are processed. NOW add process to sort the tempData array
// . . .
// . . .
// . . .


//After records are storted, put into output file
foreach ($lastnameArray as $dataRow)
{
    fputcsv($outputHandle, $dataRow);
}

//Close the files
fclose($inputHandle);
fclose($outputHandle);

?>
Link to post
Share on other sites

Thank you for the framework.

I managed to everything except for the last part where foreach loop should write final data to the file. I get 

 

Warning: fputcsv() expects parameter 2 to be array, string given

 

this is shortened version of the code:

$inputfile  = 'Data.csv';
$outputFile = 'cleandata.'.date("Ymd_His"). 'output.csv';


$nameColumn = 1;
$BDColumn = 8;


function formatName($name) {
    $nameArray = explode(' ', $name);
    //Extract the last name
    $lastName = array_pop($nameArray);
    //Format lanstname, rest of name and put back in string
    $name = $lastName . (count($nameArray) ? ', ' : '' ) . implode(' ', $nameArray);
    
    return $name;;
}

function formatDate($date)
{
        
        $timestamp = strtotime($date);
        //if there is timestamp in a row, do data bd conversion else it is header field
        if($timestamp) {
         $date = date('Ymd', $timestamp);
        }
    
    
   return $date;
}


$inputHandle = fopen($inputfile, "r");
$outputHandle = fopen($outputFile, 'w');


//Temporary array to hold processed data
$tempData = array();
//Flag to not process the header data
$headerRow = true;
//Process the input data
while (($dataRow = fgetcsv($inputHandle, 1024, ",")) !== FALSE)
{
    //Do not process the first row of data
    if($headerRow)
    {
        //Set flag so additional rows will be processed
        $headerRow = false;


        //Modify the name
        $dataRow[$nameColumn] = formatName($dataRow[$nameColumn]);
        // and date
        $dataRow[$BDColumn] = formatDate($dataRow[$BDColumn]);
       
    }
   //Add record to temp array
   $tempData = $dataRow;
  //now $tempData contains modified rows
   
  
   $name1 = $tempData[$nameColumn];
   //extracting names again to sort
   $ar[] = $name1;
   //putting them in array
}

array_multisort($ar);  
//now ar has all names sorted alphabetically

foreach($ar as $value) {

  $tempData[$nameColumn] = $value; 
 //putting back sorted names
  
}

now $tempData array has all requirements

 

then I try to put it  here

   
    
//
foreach ($tempData as  $dataRow)
{
    
    fputcsv($outputHandle, $dataRow); - this gives an error : fputcsv() expects parameter 2 to be array, string given
}


//Close the files
fclose($inputHandle);
fclose($outputHandle);

What did I miss?

Link to post
Share on other sites

Output the array before and after you modify it. I'm not following the logic you are using to try and sort the array. But, that is the problem. When you process the data you create an array of JUST the last names in the variable $ar. Then you use array_multisort() even though it is only a single dimensional array (sort() would have worked the same).

 

But, THEN you do this

 

foreach($ar as $value) {

  $tempData[$nameColumn] = $value; 
 //putting back sorted names
  
}

 

Since the variable $nameColumn is not changing on each iteration of the loop, all that loop does is continually overwrite the index $nameColumn with each last name. So, when you are done the variable $tempData has all the records that were generated during the processing PLUS it has a new record which is not a sub array, but is only a single element of the very last last_name. So, when you try to use fputscsv() it is failing on that record.

 

Don't use the $ar array at all or array_multuisort(). After processing the input, use usort() to sort the array. Check the manual on how to use it - but you will need to create a function

 

I think this should work

 

function sortMyArray($a, $b)
{
    $nameColumn = 1;
    return strcmp($a[$nameColumn], $b[$nameColumn]);
}
Link to post
Share on other sites

Hmm, I'm confused. If you mean:

usort($tempData, 'cmp');  
print("<pre>".print_r($tempData, true)."</pre>"); 

//
function cmp($a, $b)
{
 $nameColumn = 1;
 return strcmp($a["Name"], $b["Name"]);
   
}

after while loop, this produces only one completely messed up record, like this:

 

Array ( [0] => 01-380378 [1] => 02/04/1973 [2] => 1315 E Hillside, #63 [3] => 540.31 [4] => 154.00 [5] => 166.00 [6] => 78041-0986 [7] => GA|RA|LA [8] => TX [9] => Laredo [10] => Derksen Printing [11] => Roger W Baugher [12] => ttake@hotmail.com )

 

 

unlike arr, which I removed.

I checked the manual for usort(), it seems to be short and examples are very basic...could'n see how to apply it tho this case...

Link to post
Share on other sites
after while loop, this produces only one completely messed up record, like this:

Did you verify the array contents before you do the sorting? FYI: You changed the sorting function I provided. what you changed wouldn't sort the array, but I don't think it would have removed the records.

 

I checked the manual for usort(), it seems to be short and examples are very basic...could'n see how to apply it tho this case...

I gave you what should be a working solution for using usort() in this particular case.

Link to post
Share on other sites

This is interesting,

 

$tempData inside while loop contains all the pre-formated records, but when I print it outside  the loop, it contains only one record...:/ 

I don't understand where is it being overwritten?

while (($dataRow = fgetcsv($inputHandle, 1024, ",")) !== FALSE)
{
    //Do not process the first row of data
    if($headerRow)
    {
        //Set flag so additional rows will be processed
        //$headerRow = false;  //-- had to remove it, because if this line is executed, nothing is formatted


        //Modify the name
        $dataRow[$nameColumn] = formatName($dataRow[$nameColumn]);
        
        //Modify the date
        $dataRow[$BDColumn] = formatDate($dataRow[$BDColumn]);
        
    }
   //Add record to temp array
   $tempData = $dataRow;
   
   //print("<pre>".print_r($tempData, true)."</pre>"); //-- has all the records
    
}

print("<pre>".print_r($tempData, true)."</pre>"); //-- has only one record, what changed it?
Link to post
Share on other sites

Ah, here is the cause

 

   //Add record to temp array
   $tempData = $dataRow;

 

That line is overwriting $tempData with the value of $dataRow (the current record). We want to add/append the current records to $tempData

 

 

   //Add record to temp array
   $tempData[] = $dataRow;
Link to post
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.