Jump to content

Catching All HTTP Requests to Apache and Storing Them in DB


Go to solution Solved by requinix,

Recommended Posts

Good morning everyone!

I was just looking to pick everyone's mind on something I would like to do...

So, Apache has it's access log already which essentially already does most of what I want but it stores it in a flat file that is always large and even if it isn't it still doesn't provide me with everything I want to be able to do.

I would like to be able to store all of that type of information (url requests whether coming from front-end ajax or a backend script and any parameters within the URL) and the user_id that is stored within a $_SESSION.

Does anyone have any thoughts on how this could be accomplished? I built this little function just for testing purposes:

function log_uri ( $pdo ) {

	//check that user is set
	if ( isset ( $_SESSION['user_id'] ) ) {

		$query = '
			INSERT INTO log_uri (user_id, uri) VALUES (:user_id, :page) 
		';
		$statement = $pdo->prepare($query);
		$statement->execute([
			':user_id' => $_SESSION['user_id'],
			':page' => $_SERVER['REQUEST_URI'],
		]);

	}

}

But, it does not do what I want. Mainly, because it is only catching backend requests and not any of multiple AJAX requests that are happening very often and are of more interest to me than the back end requests. Here is what it a sample of what it captures so far:

image.png.37e0d026828ca1afe824d84bc55b9527.png

Any thoughts? Maybe there is a way to detect any ajax response or something? Or maybe I can modify the Apache logging system to go into a database and hopefully pick up the user_id?

4 hours ago, mongoose00318 said:

I would like to be able to store all of that type of information (url requests whether coming from front-end ajax or a backend script and any parameters within the URL) and the user_id that is stored within a $_SESSION.

For what purpose?

Not only as Requinix asked but what makes you think that all requests involve the use of PHP and therefore its SESSIONS?  And I am sure that you realize that getting access to somebody's PHP SESSION array is pretty much impossible without the skills of a hacker which we certainly do not want to help you accomplish.

@requinixFor the purpose of being able to track which pages are being used the most, see what jobs are being tracked, etc.

@ginerjmThis is an internal piece of software that I developed myself. I am not hacking my own software.

In reality, the access log itself has must of what I need. It's just very large and being that it is a flat file very hard to parse in an efficient way.

FYI I was using AWStats for a while and it was working well. But, at some point it had some issue with the log format? I think that was the issue. But, nothing ever changed in the log format that I am aware of so I couldn't ever figure out how to get awstats operational again.

OK then.  So - if this is something to track how your app gets utilized, why not add a call from each script/page in your app to some new function that can save the data that you want to have saved?  Too many pages/scripts to even consider?

Yes I guess that sums it up..I was just tinkering with AWStats and I made a backup of the original access log and created a new one. It imported that information fine. The line it has a problem with in the original file looks like this:

Quote

This means each line in your web server log file need to have "common log format" like this:
111.22.33.44 - - [10/Jan/2001:02:14:14 +0200] "GET / HTTP/1.1" 200 1234
And this is an example of records AWStats found in your log file (the record number 50 in your log):
::1 - - [13/Jul/2020:06:47:04 -0500] "-" 408 -

I would assume the ::1 was when I accessed the page internally from the server. I wonder if there is a way for it to ignore problem lines.

  • Solution
1 hour ago, mongoose00318 said:

I would assume the ::1 was when I accessed the page internally from the server. I wonder if there is a way for it to ignore problem lines.

Or it could be Apache couldn't log anything useful there because the client did not send a request in a timely manner - which is exactly what 408 means.

Ignore that line, possibly by a regex or something, or ignore the warning from AWStats itself.

First off, I understand what you are looking for, and it is in no way unusual to want to have instrumentation and information about what is happening.  There are many products out there, first and foremost Google Analytics.  

There are also log "mining/reporting" systems available.  Awstats is one of them, but it's pretty old and I haven't used it in a long time. I'm not sure how functional and up to date it is. 

Here's a partial Google list of "alternatives to Awstats":

  • Dynatrace.
  • LogicMonitor.
  • New Relic One.
  • Datadog.
  • Sumo Logic.
  • Graylog.
  • LogDNA.
  • Apache log4j.

Of these I've used New Relic, and Sumo Logic in the recent past, so it just goes to show you the many commercial and non-commercial offerings in this space.   One thing you sometimes need to do, to get the level of information you want, is to modify the web server log format, and sometimes to inject additional variables into the logs.   Things like session id's and cookies can be added to the logs to help establish things that can't be surmised otherwise.    The details of doing this are an aspect of system administration that depend on your specific webserver and hosting environment.

One specific example, would be the IP address of the request.  If your server has a load balancer in front of it, the IP address of all requests will be the load balancer, and not the actual client IP, so that is an example of where you need to customize the logs in order to see what is actually going on.

There are many many products and companies out there that offer logging infrastructure.  One I've used in the past, not just for webservers, but for analysis of an entire cluster is Splunk.  With that said, Splunk is a pricey commercial option. 

One FOSS stack that has a lot of mindshare and users is the ELK Stack, which consists of a setup of Elastic Search, Logstash and Kibana.  Each piece of that stack solves a particular part of the problem that companies with sometimes large and complicated infrastructures face in getting server side analytics.   You can do some reading about it here:  https://logz.io/learn/complete-guide-elk-stack/  This might be the type of server based analytics system you want, and is modern, scalable and far more functional than a simple log parser/web reports system like AWStats.  

Most companies use multiple different options, as each tends to have a strength and a weakness.  Google Analytics has a lot of features, but of course, it depends on the client running its javascript, and thus isn't ever going to show you requests that were still processed but didn't load javascript.  If there are errors or bugs in the javascript on the page, this might cause GA not to log correctly or at all.  Still you want to configure and start using GA with your site, and you will find it already gives you a lot of the functionality you want, without you having to do anything within your infrastructure.  In my experience companies often use a variety of different tools.   Sometimes, just looking at web logs is not enough, or doesn't really help you understand something, and you need logs of multiple different services.  You might need to look at graphs of webserver(s) and your database for example, to see that a problem your system was having was related to database load at a particular time, which was in turn related to some slow queries that were running tying up the database resources for a long period of time.   Resources on the server itself, like available memory, amount of swap being used, and cpu load, might show you that your server is overloaded or low on disk space.    There are different types of logging and monitoring you can setup, that can often provide valuable insights into issues you will never find just looking at web logs.  

 

  • Great Answer 1

@gizmola

First off, thanks for the validation that this is a genuine and NORMAL part of server administration. 

Second, I totally agree AWStats is old and not to mention clunky. It really doesn't do what I want. I have looked into some alternatives before but none of the ones you mentioned ring a bell so I am definitely going to look into it.

All around, hats off to you! Thanks for the detailed post and all of the information...I am going to do some digging into some of your suggestions! 

  • Like 1
This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.