Optimizing aggregate / group by query

Driskell · January 18, 2010

Hello... hopefully some of you smart people out there will be able to help me as I'm pretty bemused by what MySQL is doing.

Here's my table:

CREATE TABLE `instances` (
  `error_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `activity_id` int(10) unsigned NOT NULL,
  `time` int(10) unsigned NOT NULL,
  `error` int(11) NOT NULL,
  `side` enum('System','Client') COLLATE utf8_unicode_ci NOT NULL,
  PRIMARY KEY (`error_id`),
  KEY `error` (`side`,`error`,`time`,`activity_id`)
) ENGINE=InnoDB AUTO_INCREMENT=11072409 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci ROW_FORMAT=COMPACT

Probably best to let you know, the table contains about 15 million entries. At the moment I've restricted the test table to just 25 or so distinct values for "error", but in production it is expected to hit about half a million.

OK so now you know the table layout and stuff, here's a query that runs nice and fast and returns all the error codes in "instances", and the last time the error happened:

mysql> DESCRIBE SELECT a.error, MAX(a.time) AS last_occurrence FROM instances a WHERE a.side = 'Client' GROUP BY a.side, a.error;
+----+-------------+-------+-------+---------------+-------+---------+------+------+---------------------------------------+
| id | select_type | table | type  | possible_keys | key   | key_len | ref  | rows | Extra                                 |
+----+-------------+-------+-------+---------------+-------+---------+------+------+---------------------------------------+
|  1 | SIMPLE      | a     | range | error         | error | 5       | NULL |   12 | Using where; Using index for group-by |
+----+-------------+-------+-------+---------------+-------+---------+------+------+---------------------------------------+
1 row in set (0.00 sec)

So we get the "Using index for group-by" optimization! Excellent, nice and quick, barely touches anything in the table as you would expect due to the index.

So hopefully you people might be able to offer advice or an explanation, as to why, as soon I add to this query's WHERE: "a.error = -1" (to restrict the result to just the last occurrence of a single error)... it suddenly turns into this:

mysql> DESCRIBE SELECT MAX(a.time) AS last_occurrence FROM instances a WHERE a.side = 'Client' AND a.error = -1 GROUP BY a.side, a.error;
+----+-------------+-------+------+---------------+-------+---------+-------------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key   | key_len | ref         | rows    | Extra                    |
+----+-------------+-------+------+---------------+-------+---------+-------------+---------+--------------------------+
|  1 | SIMPLE      | a     | ref  | error         | error | 5       | const,const | 1814564 | Using where; Using index |
+----+-------------+-------+------+---------------+-------+---------+-------------+---------+--------------------------+
1 row in set (0.00 sec)

Why is it all of a sudden not using the "Using index for group-by" optimization, and pulling out 1.8 million entries to examine?!

Common sense tells me it can just do exactly what it did in the first query, but instead of grouping each instance of error, jump to the "-1" and only group that one. Why does it loose the optimization? Is there any way we can "tweak" this query to get the "Using index for group-by" optimization to return?

Many thanks in advance. I can feel my head inflating and ready to explode with the seemingly counter-intuitive weirdness going on here.

Driskell · January 18, 2010

Whoopsy I was a giddy goat with the Post button! Forgot the version info!

mysql> SELECT VERSION();
+------------+
| VERSION()  |
+------------+
| 5.1.39-log |
+------------+
1 row in set (0.00 sec)

fenway · January 20, 2010

You can always use USE INDEX, or FORCE INDEX if you're stuck.

Sign In

Optimizing aggregate / group by query

Recommended Posts

Driskell

Link to comment

Share on other sites

Driskell

Link to comment

Share on other sites

fenway

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information