read and parse file by portions

pyr02k1 · February 24, 2009

I have a bit of a weird question... I'm looking to parse a large, large file. Actually, 2 currently are about 3.5gb, 1 is 12gb and 1 is around 18gb. I plan to split them down to 250mb files to parse, no big deal. But, I need to be able to resume a download from the end and continue parsing from that point. The best way to describe what I want to do is resume from the end and place that part of it in a file, then parse it. instead of downloading 18gb or 250mb, i want to take the remaining part since the last time i did it, and parse that ending, ignoring what i've already parsed. as it sits, I run a script that parses some very large files, but every time, it opens the file, then runs a for until it reaches the line it needs. it sucks when thats line 250,000+ and the cron takes 30m instead of 2m. Anyone have any ideas how I could go about this? Is there a way to run an ftp resume from point X where I left off on the server, and have it placed in file Y, which starts from 0 instead of 18.2gb?

Thanks in advance,

--pyr0

MadTechie · February 24, 2009

to resume FTP you need a resumable server!

as for reading data from a large file.. well could give you 201 ways to do it but it depends on 201 things,

1. can you post your code (as this is in the PHP section i assume it PHP code problem),

2. why are you reading a file like this?

i assume this is from your PHP server!..

pyr02k1 · February 24, 2009

to resume FTP you need a resumable server!

as for reading data from a large file.. well could give you 201 ways to do it but it depends on 201 things,

1. can you post your code (as this is in the PHP section i assume it PHP code problem),

2. why are you reading a file like this?

i assume this is from your PHP server!..

For number 1, I dont have any at this point. The script is a log reader/parser called Ultrastats for COD4/5. I noticed in the code that it does exactly what I didnt want it to do... goes through it 100% until it reaches its like, and resumes making a big file bigger... this is bad as it bogs the server. i guess what im asking is if i resume filex.log from position 25000 and the local filex.log is 0, will it just append 25001 and up making this new file a much smaller file?

2. they're logs on some game servers for cod4 and 5. in both games, the guy im trying to help with this, runs some of the more popular servers and the logs increase a huge amount, very quickly. as it is, in just 2 months if logging on cod5, he has one thats running about 2gb. i already know im going to take these files down, split them into smaller portions and parse them 1 by 1. what im worried about is when all is said and done with the logs and i go to rerun ultrastats, it'll redownload the whole file, then try to parse it. bad times as it'll not only time out, but abuse the server to all hell.

Edit: Also, i may eventually, somehow, talk him into stopping the server to clear the logs, but i know talking him into doing it more then once wont happen. so the logs will eventually get huge again and i'll still be on this boat.

heres the problem in number 1 with how it parses the file and what im hoping to either work around or recode

while (!feof ($myhandle))

{

// A logline was never more then 1024 bytes, so it's enough buffer

$gl_linebuffer = fgets($myhandle, 1024);

if ( $currentline < $db_lastlogline )

{

// Repeat until new file position is reached

$currentline++;

it will go until it times out on larger files and abuse the server like nothing else. what i want to do is either get it to portion the files and, upon downloading, place from the lastlogline on through the end in a new file, so instead of 2gb, its only getting and holding the last 250mb that was placed since its last download.

thanks for the response,

--pyr0

pyr02k1 · February 25, 2009

Well, here's a different way to go about it if anyone would rather help me figure it out... is there a way to download a portion of a file through ftp, either through php or curl? say section 100mb through 200mb and write only that section to a new file? if i can figure that part out, i can actually get it to work by setting my last log line to 0 each time, wipe the file, and never have it reset file size back to 0. so it'll resume exactly where it left off, drop the portion in a blank file, parse it, then clear it and just resume where it left off next time. Any ideas?

--pyr0

Sign In

read and parse file by portions

Recommended Posts

pyr02k1

Link to comment

Share on other sites

MadTechie

Link to comment

Share on other sites

pyr02k1

Link to comment

Share on other sites

pyr02k1

Link to comment

Share on other sites

Join the conversation

Browse

Activity

Important Information