Jump to content

FreQi

New Members
  • Posts

    3
  • Joined

  • Last visited

    Never

Everything posted by FreQi

  1. [quote author=fenway link=topic=110523.msg447433#msg447433 date=1160075344] Well, for starters, any column in a JOIN condition... but it gets more complicated with GROUP BYs/ORDER BYs.  Also, because you're using DESC ordering, you can't benefit from indexes on these columns (because MySQL doesn't support that yet).  I would recommend looking at some tutorials on index selection... I'm sure the one in the refman is a good start. [/quote] I was originally ordering just by ID since I figured the auto-incramenting ID would be more accurate for recording what order the requests were made in, however that didn't seem to be true.  I typically want to see the most recent to oldest requests, and 500 at a time is a pretty small sample.  When looking at the list of all the requests going through the proxy, I typically only see the last 1 or 2 minutes.  I'm thinking about splitting up the table somehow, but I'm not sure if I should make a table for each requesting IP address, or split it on month or what. [quote author=fenway link=topic=110523.msg447433#msg447433 date=1160075344] Use the INET_ATOI() function to store it, and INET_ITOA() to get it back.[/quote] Thanks for the tip, but I cannot find any documentation on the INET_ATOI() function...  That function doesn't seem to be documented on mysql.com, php.net... even google only has 2 pages of results when I search for it, and 10% of those link back to this forum to posts of your own ;]  I have no idea how to use it. [code]select IP, INET_ATOI(IP) as encoded from t_ipLogs limit 500[/code] returns [pre]Error Code : 1305 FUNCTION httpRequests.INET_ATOI does not exist (16 ms taken) [/pre] [quote author=fenway link=topic=110523.msg447433#msg447433 date=1160075344] joining on string fields is never ideal -- I don't see such an example in your queries above.  From an application standpoint, why do you need the entire URL (e.g. including the query string)? [/quote] The proxy server actually logs the action of blocking a URL before it logs the fact it was requested.  So what happens is the logfile->database script I wrote will monitor the logfile and see the blocked URL and insert it to the db.  The next line it sees will be the requested URL, so it looks in the Blocked table to see if there is a matching URL and returns the ID of the block.  That ID is then included in the httpLog record.  So the URL's are not necessarily used for JOIN's, but they are used to identify if they were blocked.  I may eventually also use them for statistics later to determine how many time a specific URL was visited.  Typically I only care about Domain's or full hosts, but that's exactly why those are in their own columns.
  2. [quote author=fenway link=topic=110523.msg446931#msg446931 date=1160000661]you don't have appropriate indexes[/quote] Can you be more specific as to what would be appropriate?  I am unfamiliar with how to set them in the first place, let alone what's good. [quote author=fenway link=topic=110523.msg446931#msg446931 date=1160000661]you're storing IPs as TINYTEXT, and not as UNSIGNED INTs[/quote] How would I store an IP as an integer?  There are three decimals... are you suggesting I convert the IP address into something interpretable like 192.168.0.1 would be [i]192[/i][u]168[/u][i]000[/i][u]001[/u] ?  Why not use a CHAR(15) instead and stick with the dotted quad syntax? [quote author=fenway link=topic=110523.msg446931#msg446931 date=1160000661]I doubt the URLs have to more than 255 characters long anyway.[/quote] There are some URL's that are over 2,000 characters long, which I didn't think was possible until I looked.  us.bc.yahoo.com and m.webtrends.com are in the top 30 longest URL's.  Since I do JOIN's on the URL, would you suggest I change the column type to something other than TINYTEXT?
  3. I have created a database that holds every http request that passes through a proxy server, and then made a php front end to parse, navigate and display the data.  Over the course of only a few weeks, the database has balooned up to more that 4 million records.  This makes many of my selections run very slowy, and I'm looking for suggestions on how to handle the situation.  Here's what I have now... This table holds the IP address and Logins of the people using the proxy server. [code]-- -- Table structure for table `t_ipLogs` -- CREATE TABLE `t_ipLogs` (   `ID` int(10) unsigned NOT NULL auto_increment,   `IP` tinytext NOT NULL,   `Domain` tinytext NOT NULL,   `Computer` tinytext NOT NULL,   `Who` tinytext,   `LogIn` datetime NOT NULL,   `LogOut` datetime NOT NULL,   `Modified` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,   `Removed` datetime NOT NULL,   PRIMARY KEY  (`ID`) ) ENGINE=MyISAM AUTO_INCREMENT=3312 DEFAULT CHARSET=utf8 AUTO_INCREMENT=3312 ;[/code] This table holds the list of URL's that the proxy server has blocked [code]-- -- Table structure for table `t_httpBlocks` -- CREATE TABLE `t_httpBlocks` (   `ID` mediumint(8) unsigned NOT NULL auto_increment,   `Blocked` datetime NOT NULL COMMENT 'When this URL was Blocked',   `URL` text NOT NULL,   PRIMARY KEY  (`ID`) ) ENGINE=MyISAM AUTO_INCREMENT=228866 DEFAULT CHARSET=utf8 COMMENT='Requested URL''s that were Blocked' AUTO_INCREMENT=228866 ; [/code] And this is the table that holds every http request passed through the proxy [code]-- -- Table structure for table `t_httpLogs` -- CREATE TABLE `t_httpLogs` (   `ID` mediumint(8) unsigned NOT NULL auto_increment,   `blockID` mediumint(9) unsigned default NULL COMMENT 'ID of record in the Block table that matches this Request',   `ipID` mediumint(8) unsigned default NULL COMMENT 'ID of record in the ipLogs table',   `IP` tinytext NOT NULL COMMENT 'IP Address if the client making the Request',   `Requested` datetime NOT NULL COMMENT 'When the Request was made',   `Host` text NOT NULL,   `Domain` tinytext NOT NULL,   `URL` text NOT NULL,   `Code` varchar(3) NOT NULL COMMENT 'HTTP Status Code',   `Size` bigint(10) unsigned NOT NULL default '0' COMMENT 'Sice (in bytes) of the data retrieved',   PRIMARY KEY  (`ID`) ) ENGINE=MyISAM AUTO_INCREMENT=4025737 DEFAULT CHARSET=utf8 AUTO_INCREMENT=4025737 ;[/code] I have written a perl script that monitors the proxy server's log file and inserts the records as it finds them, and that seems to run just fine, it's the PHP site which displays the records that is the problem.  Some sample queries listed below takes several minutes to return results. Find the most recent 500 http requests made by the user identified in the IP Logs as "3272" [code]SELECT t_ipLogs.Who, t_httpLogs.IP, t_httpLogs.Requested, t_httpLogs.Domain, t_httpLogs.blockID, t_httpLogs.URL, t_httpLogs.Size FROM t_httpLogs   LEFT OUTER JOIN t_ipLogs   ON t_httpLogs.ipID = t_ipLogs.ID WHERE t_httpLogs.ipID = "3272" ORDER BY t_httpLogs.Requested DESC, t_httpLogs.ID DESC LIMIT 0,500[/code] List all IP Records for user "pseudoname" and the number of requests made in each session [code]SELECT t_ipLogs.ID, t_ipLogs.Who, t_ipLogs.LogIn, Reqs.Requests FROM t_ipLogs LEFT OUTER JOIN (   select     ipID,     count(*) as "Requests"   from     t_httpLogs   group by     ipID   ) Reqs on t_ipLogs.ID = Reqs.ipID WHERE t_ipLogs.Who = "pseudoname" ORDER BY t_ipLogs.LogIn DESC[/code] List the most recent 500 requests made by anyone [code]SELECT t_ipLogs.ID, t_ipLogs.IP, t_ipLogs.Who, t_httpLogs.IP, t_httpLogs.Requested, t_httpLogs.Domain, t_httpLogs.blockID, t_httpLogs.URL, t_httpLogs.Size FROM t_httpLogs LEFT OUTER JOIN t_ipLogs   ON t_httpLogs.ipID = t_ipLogs.ID ORDER BY t_httpLogs.Requested DESC, t_httpLogs.ID DESC LIMIT 500[/code] The server is an AMD Athlon XP 2700+ (2.1gHz) with 512megs of RAM running Debian, MySQL 5.0.24a-4, PHP 5.1.6-1 and (if it matters) Apache 2.0.55-4.1 and phpMyAdmin 2.8.2-Debian-0.2. Can anyone suggest what can be done to work with this amount of data?  How can/should I restructure the tables to make things faster and more efficient?  At what point should I be concerned about the size getting out of hand?
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.