Jump to content

to use a db a file or an array that is the question


sixlead

Recommended Posts

I have to import a csv change its structure and re-export it.

 

csv in example -

 

user1,2005

user1,2006

user2,2005

user2,2006

user2,2007

 

export structure

 

user1,2005,2006

user2,2005,2006,2007

 

I have thought about using a loop while reading in the file, matching each unique name value in turn and copying data into a new csv

 

However I am not certain that this is the best answer especially given the size of the csv import.

 

I am looking forward your comments

 

TIA

Link to comment
Share on other sites

There may be less code involved in using the arrays, but what about the performance of the process. the csv file contains 15,000 + records with 25 fields.  of course I think this is a large file to process, but that may be a mis-conception on my part.

Link to comment
Share on other sites

Okay so I was bored.  Here's what I came up with.  Idea is to loop through source and destination file 1 line at a time, instead of making a giant array.

 

<?php
// open source file for reading
$sourceFile = fopen("data.txt", "r");
// while not end of source file
while (!feof($sourceFile)) {
   // get the current line and explode it into an array
   $sourceLine = explode(',', fgets($sourceFile, 4096));
   // set found to false because we haven't found a username match yet
   $found = false;
   // open destination file for reading and writing
   $destFile = fopen("data2.txt", "r+");
   // while not end of destination file
   while (!feof($destFile)) {
      // find out what byte the file pointer is currently at (so we know what line to change)
      $lineLoc = ftell($destFile);
      // trim the current line (to get rid of the \n) and explode into array
      $destLine = explode(',', trim(fgets($destFile, 4096)));
      // check to see if the current user from source is same as 
      // the current user from destination.  If it's a match....
      if ($sourceLine[0] == $destLine[0]) {
         // remove the username from the source array
         array_shift($sourceLine);
         // append source info to the destination array and implode to a string
         $newLine = implode(',', array_merge($destLine, $sourceLine));
         // move file pointer to the right line (because fgets moved it forward to the next line)
         fseek($destFile, $lineLoc);
         // overwrite the old line with the new line           
         fwrite($destFile,$newLine);
         // set found to true
         $found = true;
         // since match was found, no need to keep checking lines, so break out of loop
         break;
      } // end if match found
   } // end while destFile
   // if we did not find a match...
   if ($found == false) {
      // move pointer to end of the file
      fseek($destFile, 0, SEEK_END);
      // add the source line to end of file as new entry
      fwrite($destFile, implode(',',$sourceLine));			   
   } // end if !found
   // close the destination file 
   fclose($destFile);
} // end while sourceFile
// close source file
fclose($sourceFile);
?>

 

Okay so I tested it out.  It seems to from formatA to formatB just like you wanted...except for one small bug.  In order for this code to work, the usernames must be sorted.  In your example, they are sorted.  But if you have for instance, this:

 

user1,2006

user2,2005

user2,2006

user2,2007

user1,2005

 

for some reason it bugs and ends up doing this:

 

user1,2006, 2005

,2005,2006,2007

 

another example, this:

 

user1,2005

user1,2006

user2,2005

user2,2006

user2,2007

user3,2001

user3,2000

user3,1999

user2,1111

 

turns into this:

 

user1,2005,2006

user2,2005,2006,2007,1111

,2001,2000,1999

 

I'm not entirely sure why.  Maybe someone else can spot the bug, or find a better solution; I'm tired, going to bed.

Link to comment
Share on other sites

Okay so I found the bug.  I was fairly certain of what it was last night.  Just too tired to confirm it.  Problem is that when you go back to a previous user, and add more to it, it overwrites part of the next entry, since the next entry starts at the byte position after the previous one. 

 

One way to fix this is to create a fixed line length by padding the line with spaces.  Basically same principle as like in a database char type column where if you have for example char(20) and the data is 10 chars, it will pad it with spaces to make it 20 chars.  So for instance if you want each line to be a total of 50 chars long, you would just do 50-strlen(data to insert) to find out how many spaces to pad the data with.  Problem with this method is if you're unsure how long each user's line can get.  If it can get longer than the max length, then we're back to square one.

 

Another way is to make use of a 3rd, temporary file.  Basically the idea would be that every time there is a match on user (not a new entry), you would start a loop that loops through each line of the destination file and writes each line to a new file (basically copying destFile to tempFile, line by line.  Throw in a condition to write the altered line in at the right time, finish writing the rest of the lines, save the temp file as destFile and delete the tempfile.  wash, rinse and repeat.  Advantage of this is that you don't have to worry about how long each user's line will get, because each line would be written to the end of the file.  Disadvantage of it is that it's throwing another file and another loop into the mix.  More resources, longer processing time, etc.. but..you can always break the process down to a couple hundred lines at a time, to avoid possible timing out.

 

Anyways, if your source files are already sorted by user, the already posted code should work fine.  If it is not, then you can consider adding onto the posted code, with one of those 2 options.  Or maybe you or someone else can think of a better option.  Lord knows, I do have a tendency to over complicate things. 

 

Or...another option that involves using a database:

 

Make a table with 2 columns, one for user, another for value.  Read the file line by line, exploding the line to have user/val separated.  Insert the line into the table with user in user column, val in value column.  Then you can select the rows ordered by user, run a loop that builds the string until user changes, fwrites the line, wash rinse and repeat.  Or hell, it may very likely be possible to write a sql statement that will format the output for you.  Maybe doing an inner join and a substringed concat or something.  Not sure though.  I'm not a sql expert. 

Link to comment
Share on other sites

WOW  ;D

That is so much more than I was looking for, thank you very much.  I was really just looking for a discussion on the best method to employ but you went an extra mile.  I will try your code on the actual file and see how it does.

 

As to getting the file sorted that can be done during the data export by using an ORDER BY clause in the select statement.

 

Thanks Again

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.