Jump to content

Recommended Posts

Hello,

 

I have no experience when dealing with large files so I am not sure what to do about this. I have attempted to read several large files using file_get_contents ; the task is to clean and munge them using preg_replace().

 

My code runs fine on small files ; however, the large files (40 MB) trigger an Memory exhausted error:

 

PHP Fatal error:  Allowed memory size of 16777216 bytes exhausted (tried to allocate 41390283 bytes)

 

I was thinking of using fread() instead but I am not sure that'll work either. Is there a workaround for this problem?

 

Thanks for your input.

 

Al.

 

Can you provide some information on the operations you need to perform on the data? fread() might be an option depending on how you need to manipulate the data.

 

If this is YOUR server, you can increase the allowed memory limit in the php.ini file

Can you provide some information on the operations you need to perform on the data? fread() might be an option depending on how you need to manipulate the data.

preg_replace() and str_replace() operations on the file and save the mods into a new file. Again, this works on small files.

 

If this is YOUR server, you can increase the allowed memory limit in the php.ini file

Not my server, sadly.

Can you provide some information on the operations you need to perform on the data?

 

This is my code:

 

<?php
error_reporting(E_ALL);

##get find() results and remove DOS carriage returns.
##The error is thrown on the next line for large files!
$myData = file_get_contents("tmp11");
$newData = str_replace("^M", "", $myData);

##cleanup Model-Manufacturer field.
$pattern = '/(Model-Manufacturer:)(\n)(\w+)/i';
$replacement = '$1$3';
$newData = preg_replace($pattern, $replacement, $newData);

##cleanup Test_Version field and create comma delimited layout.
$pattern = '/(Test_Version=)(\d).(\d).(\d)(\n+)/';
$replacement = '$1$2.$3.$4      ';
$newData = preg_replace($pattern, $replacement, $newData);

##cleanup occasional empty Model-Manufacturer field.
$pattern = '/(Test_Version=)(\d).(\d).(\d)      (Test_Version=)/';
$replacement = '$1$2.$3.$4      Model-Manufacturer:N/A--$5';
$newData = preg_replace($pattern, $replacement, $newData);

##fix occasional Model-Manufacturer being incorrectly wrapped.
$newData = str_replace("--","\n",$newData);

##fix 'Binary file' message when find() utility cannot id file.
$pattern = '/(Binary file).*/';
$replacement = '';
$newData = preg_replace($pattern, $replacement, $newData);
$newData = removeEmptyLines($newData);

##replace colon with equal sign
$newData = str_replace("Model-Manufacturer:","Model-Manufacturer=",$newData);

##file stuff
$fh2 = fopen("tmp2","w");
fwrite($fh2, $newData);
fclose($fh2);

### Functions.

##Data cleanup
function removeEmptyLines($string)
{
        return preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $string);
}
?>

Looking at your patterns it appears that the replacements strings would need to fine "newline" characters. So, you wouldn't be able to read a line of data at a time and process it that way.

 

One options would be to create a "buffer". Start by reading five or more lines of data. Process that data. Then remove the first line from the buffer and write it to a new file and get a new line from the input file to add to the buffer. Process the buffer again and start the process over again until you have read all the data from the input file.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.