Jump to content

Optimizing aggregate / group by query


Driskell

Recommended Posts

Hello... hopefully some of you smart people out there will be able to help me as I'm pretty bemused by what MySQL is doing.

 

Here's my table:

 

CREATE TABLE `instances` (
  `error_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `activity_id` int(10) unsigned NOT NULL,
  `time` int(10) unsigned NOT NULL,
  `error` int(11) NOT NULL,
  `side` enum('System','Client') COLLATE utf8_unicode_ci NOT NULL,
  PRIMARY KEY (`error_id`),
  KEY `error` (`side`,`error`,`time`,`activity_id`)
) ENGINE=InnoDB AUTO_INCREMENT=11072409 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci ROW_FORMAT=COMPACT

 

Probably best to let you know, the table contains about 15 million entries. At the moment I've restricted the test table to just 25 or so distinct values for "error", but in production it is expected to hit about half a million.

 

OK so now you know the table layout and stuff, here's a query that runs nice and fast and returns all the error codes in "instances", and the last time the error happened:

 

mysql> DESCRIBE SELECT a.error, MAX(a.time) AS last_occurrence FROM instances a WHERE a.side = 'Client' GROUP BY a.side, a.error;
+----+-------------+-------+-------+---------------+-------+---------+------+------+---------------------------------------+
| id | select_type | table | type  | possible_keys | key   | key_len | ref  | rows | Extra                                 |
+----+-------------+-------+-------+---------------+-------+---------+------+------+---------------------------------------+
|  1 | SIMPLE      | a     | range | error         | error | 5       | NULL |   12 | Using where; Using index for group-by |
+----+-------------+-------+-------+---------------+-------+---------+------+------+---------------------------------------+
1 row in set (0.00 sec)

 

So we get the "Using index for group-by" optimization! Excellent, nice and quick, barely touches anything in the table as you would expect due to the index.

 

So hopefully you people might be able to offer advice or an explanation, as to why, as soon I add to this query's WHERE: "a.error = -1" (to restrict the result to just the last occurrence of a single error)... it suddenly turns into this:

 

mysql> DESCRIBE SELECT MAX(a.time) AS last_occurrence FROM instances a WHERE a.side = 'Client' AND a.error = -1 GROUP BY a.side, a.error;
+----+-------------+-------+------+---------------+-------+---------+-------------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key   | key_len | ref         | rows    | Extra                    |
+----+-------------+-------+------+---------------+-------+---------+-------------+---------+--------------------------+
|  1 | SIMPLE      | a     | ref  | error         | error | 5       | const,const | 1814564 | Using where; Using index |
+----+-------------+-------+------+---------------+-------+---------+-------------+---------+--------------------------+
1 row in set (0.00 sec)

 

Why is it all of a sudden not using the "Using index for group-by" optimization, and pulling out 1.8 million entries to examine?!

Common sense tells me it can just do exactly what it did in the first query, but instead of grouping each instance of error, jump to the "-1" and only group that one. Why does it loose the optimization? Is there any way we can "tweak" this query to get the "Using index for group-by" optimization to return?

 

Many thanks in advance. I can feel my head inflating and ready to explode with the seemingly counter-intuitive weirdness going on here.

Link to comment
https://forums.phpfreaks.com/topic/188891-optimizing-aggregate-group-by-query/
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.