Jump to content

Parsing large log file


efsnick

Recommended Posts

Hello all,

 

I need to take certain rows from a 800mb log file and insert them into a database.  The log file is setup as follows:

 

This is a file 1234 Here it is

1435  $sql=Select * from table;

1436  hello world

1437  $sql=Select * from table;

1438  your mom

1439  your dad

1440  $sql=Select * from anywhere;

end of file;

 

I basically only need the rows with the $sql statement, ID and all

 

Here is the code I came up with that works on a demo log file, but the file is only about 10mb.

 


function logparse($logfile){

$handle= file($logfile);
$count= count($handle);

for($i=0; $i < $count; $i++){
	$numbers= substr($handle[$i], 0, 5);
	$statement= strrchr($handle[$i], '$');
	$new_arr[$numbers]= $statement;
	$str_arr= array_filter($new_arr);
}
  	
print_r($str_arr);
  
}

 

I know that file() will eat up alot of memory, but this is a once a month type job.  Should I be ok?  I dont have the Insert to db statement in here, but basically I am just putting it in one record at a time.  Is there a better way?

Link to comment
Share on other sites

800Mb is a kinda large file I would think. You might be best to work on it daily with a cron job collecting the lines you want into a file then appending the daily data onto a monthly log and clearing the daily log.

 

Actually, handling large log files is better done by a shell script as you dont have to worry about the script timing out, and I think it would be a bit faster.

 

 

HTH

Teamatomic

Link to comment
Share on other sites

Hmm, what i would do is have the file split into smaller chunks.

You can do this with php quite nicely, Then have a timer in the code;

 

if the time elapsed is near the max execution time, then reload the current script and continue.

the only reason i never extend max_exec time for php, is that i like all my scripts to function on standard free webhosting (as compatible as possible).

---

 

Though if this is for some professional project you would usually use a shell script on a *nix system as teamatomic stated. (Much,much faster with less load on the server processes).

 

-cb-

Link to comment
Share on other sites

An 800 MB file is not much of an issue so long as the script only tries to load small pieces of it at a time (e.g. a line); I've done similar processing of large log files before for analytics.

 

Just to be clear, from the small sample given in the first post, you would only want the three lines (1435,1437,1440) which contain $sql?

 

P.S. Since it might be useful to know, are you familiar with working with objects / object-oriented programming?

Link to comment
Share on other sites

Just an update, the log file is actually set up like the following, I was mistaking about the $sql

 

1645 Connect    web@email.com

62541 Query      set commit=1

24520 Query      select * from table

24520 Quit

5819 Connect    email.web@test.com

61558 Query      set commit=1

1236 Query      select * from table where

 

Whereas I am only needing the lines that have a select * statement.

 

thanks, Nick

 

Link to comment
Share on other sites

If you haven't already, I would urge taking a good look at the Standard PHP Library, in particular the SplFileObject and FilterIterator. Using features available in there, this kind of task is often much more simple than the "old" way of fopen/fgets/fclose and complex conditions in loops.

 

For example, as a guide only (i.e. may not be quite right for your needs), something like following could iterate over each line of a file that matches a certain filter (in this case, the line contains "select * ") and do something useful with only those lines.

 

class SqlLogFilter extends FilterIterator {
    public function accept() {
        return strpos($this->getInnerIterator()->current(), 'select * ') !== FALSE;
    }
}

$filter = new SqlLogFilter(new SplFileObject($logfile));
foreach ($filter as $line) {
    sscanf($line, "%d Query %[^\n]", $number, $statement);
    // Do whatever with $number and $statement
}

 

P.S. The format string for sscanf might be unclear, see the user notes under sscanf. You're of course completely free to use a different approach to getting at the number and statement!

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.