Jump to content

read and parse file by portions


pyr02k1

Recommended Posts

I have a bit of a weird question... I'm looking to parse a large, large file. Actually, 2 currently are about 3.5gb, 1 is 12gb and 1 is around 18gb. I plan to split them down to 250mb files to parse, no big deal. But, I need to be able to resume a download from the end and continue parsing from that point. The best way to describe what I want to do is resume from the end and place that part of it in a file, then parse it. instead of downloading 18gb or 250mb, i want to take the remaining part since the last time i did it, and parse that ending, ignoring what i've already parsed. as it sits, I run a script that parses some very large files, but every time, it opens the file, then runs a for until it reaches the line it needs. it sucks when thats line 250,000+ and the cron takes 30m instead of 2m. Anyone have any ideas how I could go about this? Is there a way to run an ftp resume from point X where I left off on the server, and have it placed in file Y, which starts from 0 instead of 18.2gb?

 

Thanks in advance,

--pyr0

Link to comment
Share on other sites

to resume FTP you need a resumable server!

as for reading data from a large file.. well could give you 201 ways to do it but it depends on 201 things,

1. can you post your code (as this is in the PHP section i assume it PHP code problem),

2. why are you reading a file like this?

 

i assume this is from your PHP server!..

Link to comment
Share on other sites

to resume FTP you need a resumable server!

as for reading data from a large file.. well could give you 201 ways to do it but it depends on 201 things,

1. can you post your code (as this is in the PHP section i assume it PHP code problem),

2. why are you reading a file like this?

 

i assume this is from your PHP server!..

 

For number 1, I dont have any at this point. The script is a log reader/parser called Ultrastats for COD4/5. I noticed in the code that it does exactly what I didnt want it to do... goes through it 100% until it reaches its like, and resumes making a big file bigger... this is bad as it bogs the server. i guess what im asking is if i resume filex.log from position 25000 and the local filex.log is 0, will it just append 25001 and up making this new file a much smaller file?

 

2. they're logs on some game servers for cod4 and 5. in both games, the guy im trying to help with this, runs some of the more popular servers and the logs increase a huge amount, very quickly. as it is, in just 2 months if logging on cod5, he has one thats running about 2gb. i already know im going to take these files down, split them into smaller portions and parse them 1 by 1. what im worried about is when all is said and done with the logs and i go to rerun ultrastats, it'll redownload the whole file, then try to parse it. bad times as it'll not only time out, but abuse the server to all hell.

 

Edit: Also, i may eventually, somehow, talk him into stopping the server to clear the logs, but i know talking him into doing it more then once wont happen. so the logs will eventually get huge again and i'll still be on this boat.

 

heres the problem in number 1 with how it parses the file and what im hoping to either work around or recode

 

                while (!feof ($myhandle))

                {

                        // A logline was never more then 1024 bytes, so it's enough buffer

                        $gl_linebuffer = fgets($myhandle, 1024);

 

                        if ( $currentline < $db_lastlogline )

                        {

                                // Repeat until new file position is reached

                                $currentline++;

it will go until it times out on larger files and abuse the server like nothing else. what i want to do is either get it to portion the files and, upon downloading, place from the lastlogline on through the end in a new file, so instead of 2gb, its only getting and holding the last 250mb that was placed since its last download.

 

thanks for the response,

--pyr0

Link to comment
Share on other sites

Well, here's a different way to go about it if anyone would rather help me figure it out... is there a way to download a portion of a file through ftp, either through php or curl? say section 100mb through 200mb and write only that section to a new file? if i can figure that part out, i can actually get it to work by setting my last log line to 0 each time, wipe the file, and never have it reset file size back to 0. so it'll resume exactly where it left off, drop the portion in a blank file, parse it, then clear it and just resume where it left off next time. Any ideas?

 

 

--pyr0

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.