Jump to content

Slow performance function. With large files memory blows up! How can I refactor?


lopes_andre

Recommended Posts

Hi

 

I have a function that strips out lines from files. I'm handling with large files(more than 100Mb). I have the PHP Memory with 256MB but the function that handles with the strip out of lines blows up with a 100MB CSV File.

 

What the function must do is this:

 

Originally I have the CSV like:

Copyright (c) 2007 MaxMind LLC.  All Rights Reserved.
locId,country,region,city,postalCode,latitude,longitude,metroCode,areaCode
1,"O1","","","",0.0000,0.0000,,
2,"AP","","","",35.0000,105.0000,,
3,"EU","","","",47.0000,8.0000,,
4,"AD","","","",42.5000,1.5000,,
5,"AE","","","",24.0000,54.0000,,
6,"AF","","","",33.0000,65.0000,,
7,"AG","","","",17.0500,-61.8000,,
8,"AI","","","",18.2500,-63.1667,,
9,"AL","","","",41.0000,20.0000,,

 

When I pass the CSV file to this function I got:

locId,country,region,city,postalCode,latitude,longitude,metroCode,areaCode
1,"O1","","","",0.0000,0.0000,,
2,"AP","","","",35.0000,105.0000,,
3,"EU","","","",47.0000,8.0000,,
4,"AD","","","",42.5000,1.5000,,
5,"AE","","","",24.0000,54.0000,,
6,"AF","","","",33.0000,65.0000,,
7,"AG","","","",17.0500,-61.8000,,
8,"AI","","","",18.2500,-63.1667,,
9,"AL","","","",41.0000,20.0000,,

 

It only strips out the first line, nothing more. The problem is the performance of this function with large files, it blows up the memory.

 

The function is:

public function deleteLine($line_no, $csvFileName) {

	// this function strips a specific line from a file
	// if a line is stripped, functions returns True else false
	//
	// e.g.
	// deleteLine(-1, xyz.csv); // strip last line
	// deleteLine(1, xyz.csv); // strip first line

	// Assigna o nome do ficheiro
	$filename = $csvFileName;

	$strip_return=FALSE;

	$data=file($filename);
	$pipe=fopen($filename,'w');
	$size=count($data);

	if($line_no==-1) $skip=$size-1;
	else $skip=$line_no-1;

	for($line=0;$line<$size;$line++)
		if($line!=$skip)
			fputs($pipe,$data[$line]);
		else
			$strip_return=TRUE;

	return $strip_return;
}	

 

It is possible to refactor this function to not blow up with the 256MB PHP Memory?

 

Give me some clues.

 

Best Regards,

Link to comment
Share on other sites

I will give no guarantees since I do not have your files to work with, but I'll give some suggestions.

 

First don't run any code that you don't need to. For example, you define the variable $size, but you only use it within an IF condition. So, you only need to define it within the condition. Defining it outside the condition is unnecessary. In fact, I wouldn't even define it at all. In this case, however, the performance gain is imperceptible, but just giving an idea of the logic I would use to approach a problem such as this.

 

The real meat of the function is the for loop that loops over every record in the array created using file(). But, you are only removing one line, correct? So, you don't even need a loop. Just unset() the one line you don't want and then implode the array!

 

Also, PHP uses zero based indexes by default, I would adjust the function to do the same (i.e. pass a zero to remove the first line, 1 to remove the 2nd line , etc.). That makes it consistent with how PHP operates and will solve some other possible issues.

 

Give this a try

public function deleteLine($line_no, $csvFileName)
{
// this function strips a specific line from a file
// if a line is stripped, functions returns True else false
//
// e.g.
// deleteLine(-1, xyz.csv); // strip last line
// deleteLine(-2, xyz.csv); // strip 2nd to last line
// deleteLine(0, xyz.csv); // strip 1st line
// deleteLine(1, xyz.csv); // strip 2nd line
// Assigna o nome do ficheiro

    //Define default return value
    $strip_return = false;
    //Parse fiel into an arrya
    $lines = file($csvFileName);
    //Define the line index to be removed
    $skip_index = ($line_no < 0) ? count($lines)+$line_no : $line_no;

    //Check if the line exists
    if(isset($lines[$skip_index]))
    {
        //The line exists - remove it
        unset($lines[$skip_index]);
        $strip_return = true;
        //Open file to write new contents
        $pipe = fopen($csvFileName, 'w');
        //Write the array using implode
        fwrite($pipe, implode(PHP_EOL, $lines);
        //Close the file
        fclose($pipe);
    }
    return $strip_return;
}

Link to comment
Share on other sites

It is solved! What made the function blow out was the usage of the "file()" function. Instead of using the file() that put all the content of the file to memory the refactor of the function have used to read the file line by line and put the contents in a temporary file.

 

public function deleteLine($line_no, $csvFileName) {

		// this function strips a specific line from a file
		// if a line is stripped, functions returns True else false
		//
		// e.g.
		// deleteLine(1, xyz.csv); // strip first line

		$tmpFileName = tempnam(".", "csv");

		$strip_return=FALSE;

		$readFD=fopen($csvFileName,'r');
		$writeFD=fopen($tmpFileName,'w');

		// check for fopen errors.

		if($line_no==-1) {
				$skip=$size-1;
		} else {
				$skip=$line_no-1;
		}

		$line = 0;
		while (($buffer = fgets($readFD)) !== false) {
				if($line!=$skip)
						fputs($writeFD,$buffer);
				else
						$strip_return=TRUE;
				$line++;
		}

		// Vou agora fechar o acesso aos ficheiros
		fclose($readFD);  // Ficheiro Original
		fclose($writeFD); // Ficheiro Temporário

		// Apagar o csvFileName(Ficheiro Original)
		unlink($csvFileName);

		rename($tmpFileName,$csvFileName);
		return $strip_return;
}

 

So, here is the solution to edit large files using PHP.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.