Jump to content

Need help reading a .gz file line by line


affordit

Recommended Posts

I have a script that reads a .gz file into an array and prints the name of each record but will not work on larger files. Is there a way to read 1 line at a time?

Here is the code I have so far.

<?php

if ($handle = opendir('.')) {

print "<ol>";
    
    while (false !== ($file = readdir($handle))) {
	if($file != '..' && $file!="." && $file!="start_update.php" && $file!="sharons_dbinfo.inc.php" && $file!="root.php" && $file!="read_directory.php" && $file!="read_dir.php" && $file!="new_category.php" && $file!="index.php" && $file!="file_count.php" && $file!="dir_loop2.php" && $file!="dir_loop1.php" && $file!=".htaccess" && $file!="Answer.txt" && $file!="Crucial_Technology-Crucial_US_Product_Catalog_Data_Feed.txt"){
$filename = $file;
$go = filesize($filename);
		if($go >= 1){
		$filename2 = explode("-", $filename);
		$filename2 = $filename2[0];
		echo str_replace("_"," ",$filename2) . ' | Filesize is: ' . filesize($filename) . ' bytes<br>';
		$gz = gzopen($filename, 'r');
$lines = gzfile($filename,10000);
foreach ($lines as $line) {
$line2 = explode(",", $line);
$line2 = str_replace("," , "-" , $line2);
    echo "<li>".str_replace("," , "-" , $line2[4])."</li><br>";
}

		}
	}
    }

    closedir($handle);
}
?>
</ol>

That works great, but not printing the whole name in some records.

This line

1GB kit (512MBx2) Upgrade for a Dell OptiPlex 745 Series (Desktop, Mini-Tower, and Small Form Factor) System

Is printing this

"1GB kit (512MBx2) Upgrade for a Dell OptiPlex 745 Series (Desktop

Any idea why? :shrug:

This is the first line that would not print right there are 40 columns in here...

Crucial Technology,http://www.crucial.com/index.asp,Crucial US Product Catalog-Data Feed,

12/06/2010,"1GB kit (512MBx2) Upgrade for a Dell OptiPlex 745 Series (Desktop, Mini-Tower, and Small Form Factor) System",MEMORY MODULE,"1GB kit (512MBx2), 240-pin DIMM, DDR2 PC2-5300, NON-ECC,",CT613060,Crucial,,,,USD,,37.99,,,http://www.kqzyfj.com/click-4349884-10273954?url=http%3A%2F%2Fwww.crucial.com%2Fstore%2Faffiliateredirect.asp%3Fmtbpoid%3DC2FFE7ABA5CA7304%26aid%3D10273954%26cid%3D777292%26subid%3D890%26PRS%3Duscj,http://www.tqlkg.com/image-4349884-10273954,http://images.crucial.com/images/resources/small/package/240-pinDIMMkit_2.gif,Memory > DDR2 PC2-5300,,,,,,,,,,,Free shipping for a limited time on qualified orders,,,,,Yes,New,Limited lifetime warranty,

Ok that's somewhat tricky but not impossible.  Some fields are quoted and others aren't, which makes it not as easy as it could be.  The ideal solution is to replace explode() with something which will recognize and honour the quoted fields, so it doesn't split on the commas within those fields.

 

OR, if you only need field 4 and none of the others, you could make a regexp to capture just that field.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.