Jump to content

Need help with web server log file data extraction


DanSgt

Recommended Posts

Hello

I am having a little trouble with a web server log file and wondered if anyone could give any advice?
 
I currently have the code below: it opens the April log file, explodes it into an Array (text_line_array).
I can echo each key in the array to show for example the bandwidth[5], as below, this gives me the 1000 entries of bandwidth looped through but after that I can't figure out how to add these sums together.
I can not seem to get the syntax right that allows me to do a loop on an array?
 
A snippet of the log file is below:
103.239.234.105 -- [2007-04-01 00:42:21] "GET articles/learn_PHP_basics HTTP/1.0" 200 12729 "Mozilla/4.0"
207.3.35.52 -- [2007-04-01 01:24:42] "GET index.php HTTP/1.0" 200 11411 "Mozilla/4.0"
51.4.190.113 -- [2007-04-01 02:07:04] "GET articles/php_classes_and_oop HTTP/1.0" 200 7674 "MSIE 7.0"
216.134.52.171 -- [2007-04-01 02:49:25] "GET articles/learn_PHP_basics HTTP/1.0" 200 12729 "MSIE 7.0"
97.212.128.181 -- [2007-04-01 03:31:46] "GET articles/using_regex_with_php HTTP/1.0" 200 12127 "Mozilla/4.0"
49.174.77.138 -- [2007-04-01 04:14:07] "GET about/contact.php HTTP/1.0" 200 7554 "Mozilla/4.0"
219.218.151.127 -- [2007-04-01 04:56:28] "GET reference/mysql_crib_sheet HTTP/1.0" 200 11109 "MSIE 7.0"
209.168.87.74 -- [2007-04-01 05:38:49] "GET articles/mysql_load_bala.0"ncing HTTP/1.0" 200 3189 "MSIE 7.0"
79.214.145.94 -- [2007-04-01 06:21:11] "GET articles/mysql_load_balancing HTTP/1.0" 200 3189 "MSIE 7.0"
177.158.203.244 -- [2007-04-01 07:03:32] "GET docs/regex_crib_sheet HTTP/1.0" 200 12439 "Mozilla/4"
 
This is what I have so far:
$handle = fopen('logs/april.log', 'r');
while (!feof($handle))
{
$text_line = fgets($handle, 1024);
$notNeeded = array(' --','[',']','GET ',' HTTP/1.0');
$text_line = str_replace($notNeeded,NULL,$text_line);
$text_line_array = explode(' ',$text_line);
$ipAddress = $text_line_array[0];
$timestamp1 = $text_line_array[1];
$timestamp2 = $text_line_array[2];
$filename = $text_line_array[3];
$statusCode = $text_line_array[4];
$bandwidth = $text_line_array[5];
$userAgent = $text_line_array[6];
}
Do you have any ideas?
 
I am trying to write a summary which displays: the total amount of requests, the total amount of requests form the articles directory, the total bandwidth consumed and finally the amount of 404 errors and their pages.
Link to comment
Share on other sites

I'm 100% sure that someone can do it better than this, but if it helps you on your way then all good.

<?php

    $data = array();
    $newData = array();
    $fourZeroFourItems = array();
    $articleItems = array();
    $articleCount = 0;
    $fourZeroFourCount = 0;

    $data = explode("\r",file_get_contents("logs.txt"));
    
    // Get Total Amount Of Rows
    
    $total = count($data);
    
    // Data Not Needed
    
    $notNeeded = array(' --','[',']','GET ',' HTTP/1.0');

    // Remove Unwanted Values
    foreach ($data as $item)
    {
    
    $item = str_replace($notNeeded,NULL,$item);
    
    $newData[] = $item;
    
    }
    
        
    // Split Up Data
    foreach ($newData as $item)
    {
    
        // Split Up Data
        $splitData = explode(" ",$item);
    
        // Build Bandwith Array
        $bandwidthItems[] = $splitData[5];
        
        // Build Article Count
        $articles = strpos($splitData[3],"articles/");
        
        if ($articles !== false) {
        $articleCount++;
        $articleItems[] = $splitData[3];
        }
        
        // Build 404 pages
        $fourzerofour = strpos($splitData[4],"404");
        
        if ($fourzerofour !== false) {
        $fourzerofourCount++;
        $fourzerofourItems[] = $splitData[4];
        }
    
    }
    
    // Output Data
    
    print_r($bandwidthItems); //All Bandwidth Values
    
    echo array_sum($bandwidthItems); // Bandwidth
    
    echo $articleCount; // Total Number Of Articles
    
    print_r($articleItems); // Output All Article Items

    echo $fourZeroFourCount; // Total Number Of Articles
    
    print_r($fourZeroFourItems) // Output All Article Items

?>
Edited by AaronClifford
Link to comment
Share on other sites

That's a great start, AaronClifford!  Kudos to you for helping.

Only observation I would make is that server log files are often very, very large, and you can create a very large memory structure and load your box's RAM if you keep all that data in an array.  Unless it's critical to have all the bandwidth values in an array, I'd just create a var to hold the total and add the value for each line to said var on each iteration of the loop.  

One var will not take up near as much RAM (and array_sum might use a heckuva lotta CPU, too, on a big array).

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.