Jump to content

Archived

This topic is now archived and is closed to further replies.

lostmouse

Need help optimizing ORDER BY

Recommended Posts

I cannot seem to properly optimize this statement used to show topics across all forums that have been recently replied to:

SELECT t.topic_id, t.topic_title, t.topic_sticky, t.topic_views, t.num_posts, f.forum_icon, f.forum_name, p.poster_name, p.post_time
FROM minibb_topics AS t
LEFT JOIN minibb_forums AS f ON t.forum_id = f.forum_id
INNER JOIN minibb_posts AS p ON p.post_id = t.topic_last_post_id
WHERE 5 >= forum_level_to_view && ( forum_site =0 || forum_site =1 )
ORDER BY t.topic_sticky DESC , t.topic_last_post_id DESC
LIMIT 20

I have isolated the issue to the ORDER BY clause.  If I remove the ORDER BY clause, the query takes 1/10th of the time to execute.  Both ORDER BY fields are indexed.  Also, if I limit the query to one forum such as, "WHERE f.forum_id = 1", the statement executes in 1/10th of the time.


This is the explain:

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE f ALL PRIMARY,forum_level_to_view,forum_site NULL NULL NULL 12 Using where; Using temporary; Using filesort
1 SIMPLE t ref forum_id,topic_last_post_id forum_id 2 aanime_aanimedb.f.forum_id 651
1 SIMPLE p eq_ref PRIMARY PRIMARY 4 aanime_aanimedb.t.topic_last_post_id 1

This is the table structure:

--
-- Table structure for table 'minibb_forums'
--

CREATE TABLE minibb_forums (
  forum_id smallint(6) NOT NULL auto_increment,
  forum_name varchar(150) NOT NULL default '',
  forum_desc varchar(255) NOT NULL default '',
  forum_order tinyint(2) NOT NULL default '0',
  forum_icon varchar(255) NOT NULL default 'default.gif',
  forum_group varchar(30) NOT NULL default '',
  forum_site tinyint(1) NOT NULL default '0',
  forum_level_to_view tinyint(2) unsigned NOT NULL default '0',
  forum_level_to_post tinyint(2) NOT NULL default '0',
  forum_level_to_topic tinyint(2) NOT NULL default '0',
  num_topics mediumint(8) unsigned NOT NULL default '0',
  num_posts mediumint(8) unsigned NOT NULL default '0',
  is_spam tinyint(1) NOT NULL default '0',
  PRIMARY KEY  (forum_id),
  KEY forum_order (forum_order),
  KEY forum_level_to_topic (forum_level_to_topic),
  KEY forum_level_to_view (forum_level_to_view),
  KEY forum_site (forum_site)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1;

-- --------------------------------------------------------

--
-- Table structure for table 'minibb_posts'
--

CREATE TABLE minibb_posts (
  post_id int(11) NOT NULL auto_increment,
  topic_id int(10) NOT NULL default '1',
  poster_name varchar(16) NOT NULL default 'Anonymous',
  post_text text NOT NULL,
  post_time datetime NOT NULL default '0000-00-00 00:00:00',
  poster_ip varchar(15) NOT NULL default '',
  post_status tinyint(1) NOT NULL default '0',
  PRIMARY KEY  (post_id),
  KEY topic_id (topic_id),
  KEY poster_name (poster_name)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1;

-- --------------------------------------------------------

--
-- Table structure for table 'minibb_topics'
--

CREATE TABLE minibb_topics (
  topic_id int(10) NOT NULL auto_increment,
  topic_title varchar(100) NOT NULL default '',
  topic_sticky tinyint(1) NOT NULL default '0',
  topic_poster_name varchar(16) NOT NULL default 'Anonymous',
  topic_time datetime NOT NULL default '0000-00-00 00:00:00',
  topic_views mediumint(9) NOT NULL default '0',
  forum_id smallint(6) NOT NULL default '1',
  topic_status tinyint(1) NOT NULL default '0',
  topic_last_post_id int(10) NOT NULL default '1',
  num_posts mediumint(8) unsigned NOT NULL default '0',
  PRIMARY KEY  (topic_id),
  KEY forum_id (forum_id),
  KEY topic_poster_name (topic_poster_name),
  KEY topic_sticky (topic_sticky),
  KEY topic_last_post_id (topic_last_post_id)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1;



Does anyone have any ideas about this?

Thanks.

Share this post


Link to post
Share on other sites
Without an ORDER BY, the database can just fetch any 20 results and give them to you.  But once you order the results, the database has to fetch ALL results, sort them, then give you the first 20.  That's probably what's slowing you down.

To fix it, you can try reducing the amount of data it needs to order.  Instead of ordering by sticky, you can do a seperate query or subquery (using an index) to fetch all the sticky topics.  That should be fast.  But the volume of results for the rest could be quite high.  One way to help is to partition your database, such as having one table for each month worth of topics.  Then you query the first month, see if you have enough topics, if not then you query the previous month, and so on until you get 20 topic worth.  Sticky topics would need their own table too, since stickiness is the most important criterion.

Another option is to cache results.  Caching per-forum may be a big win, especially if some forums are not updated often but others are.  Then again, it's the most frequently updated forums that will be the largest.

Maybe someone else has some other ideas?  I use postgresql usually, but a lot of the same principles apply.

Share this post


Link to post
Share on other sites
Thanks for the reply.

It's a creative idea, but if I start restructuring my database through separate tables, I will have to rewrite a lot of my existing sql statements.  Also, it is indeed the most active forums that are shown.

Is the size the reason why when I add "WHERE f.forum_id = 1" the statement executes quickly?

Share this post


Link to post
Share on other sites
I would say the real problem is that no index will be utilized if you use DESC, since the index is stored ASC.  Yeah, it's dump that MySQL doesn't handle this automatically, but it doesn't.

Share this post


Link to post
Share on other sites
Interesting... your primary table isn't using any index... and should be using "forum_level_to_view".

Share this post


Link to post
Share on other sites
That does seem to be the case.

I know there are times when mysql will do a filesort when there are very few rows, but in this case using an index is necessary and I cannot force it to use an index.  I feel like I need to restructure something, but I don't know where to start with this.

Share this post


Link to post
Share on other sites
First, what is the speed difference? 0.01 seconds compared to 0.1 seconds?

[quote author=fenway link=topic=112852.msg459098#msg459098 date=1162051461]
I would say the real problem is that no index will be utilized if you use DESC, since the index is stored ASC.  Yeah, it's dump that MySQL doesn't handle this automatically, but it doesn't.
[/quote]

MYSQL apparently reads the index in reverse when you use DESC. Perhaps a multi-column index on (topic_sticky, topic_last_post_id) may be useful, although I'm not sure if it'll be used in the current query.

[quote=mysql.com]
To sort or group a table if the sorting or grouping is done on a leftmost prefix of a usable key (for example, ORDER BY key_part1, key_part2). If all key parts are followed by DESC, the key is read in reverse order. See Section 7.2.12, “ORDER BY Optimization”.
[/quote]

http://dev.mysql.com/doc/refman/5.1/en/mysql-indexes.html


[quote author=fenway link=topic=112852.msg459627#msg459627 date=1162149894]
Interesting... your primary table isn't using any index... and should be using "forum_level_to_view".
[/quote]

I also think it should be using the index. Try running "ANALYZE TABLE minibb_forums" and see if that helps MYSQL use the index. Actually, try running "ANALYZE TABLE" on each table.

http://dev.mysql.com/doc/refman/5.1/en/analyze-table.html

[quote author=lostmouse link=topic=112852.msg459744#msg459744 date=1162169227]
That does seem to be the case.

I know there are times when mysql will do a filesort when there are very few rows, but in this case using an index is necessary and I cannot force it to use an index.  I feel like I need to restructure something, but I don't know where to start with this.
[/quote]

You can force it but I wouldn't do that just yet or did you mean "FORCE INDEX" doesn't help?
[code]
SELECT ... FROM tablename FORCE INDEX(index_name) INNER JOIN ...
[/code]

Btw, to be clear. Is the purpose of the query to pull the 20 most recent "posts" and what version of MYSQL are you using?

EDIT: Also, how is forum_level_view used. Is it possible for a topic be related to a forum_id with forum_level_view greater than 5?

Share this post


Link to post
Share on other sites
My bad about the DESC thing.. I was confusing that when you mix ASC and DESC, and I guess that's not MySQL's fault.  Maybe it's not optimizing "5 >= forum_level_to_view" to "forum_level_to_view <= 5", and therefore not using the index? A force shouldn't be necessary here.  As for the order by, there's no reason why you couldn't use a covering index, but it would be tricky with that OR in there.

Share this post


Link to post
Share on other sites
Mysql version 4.1.10

The purpose of the query is to pull the 20 most recently replied to topics (in other words, the 20 most recently active topics).  Any sticky topics have priority.

forum_level_to_view is used to allow only certain members to view certain forums (for example, the administration forum has a permission "forum_level_to_view" of 4).  It makes no difference in speed whether it's left in or out of the query.  At this point, there is no forum with forum_level_to_view greater than 5 as there are no permission levels greater than 5.

I've analyzed all of the tables, there is no improvement in this query's execution speed.

I made a mistake about the improvement in speed difference when the ORDER BY clause is removed.  With the clause in, the query takes about 0.2 - 0.4 seconds.  Removed it takes 0.001 to 0.002 seconds.

I tried using FORCE INDEX again with a very reasonable improvement in speed to 0.02 to 0.04 seconds.  The query now looks like this:

[code]
SELECT t.topic_id, t.topic_title, t.topic_sticky, t.topic_views, t.num_posts, f.forum_icon, f.forum_name, p.poster_name, p.post_time
FROM minibb_topics AS t FORCE INDEX(topic_last_post_id, topic_sticky)
LEFT JOIN minibb_forums AS f on t.forum_id=f.forum_id
INNER JOIN minibb_posts as p ON p.post_id = t.topic_last_post_id
WHERE 5 >= forum_level_to_view && (forum_site = 0 || forum_site = 1)
ORDER BY t.topic_sticky DESC, t.topic_last_post_id DESC
LIMIT 20
[/code]

And the explain looks like this:
[code]
1  SIMPLE  t  ALL  topic_last_post_id  NULL  NULL  NULL  7819  Using filesort
1 SIMPLE f eq_ref PRIMARY,forum_level_to_view,forum_site PRIMARY 2 aanime_aanimedb.t.forum_id 1 Using where
1 SIMPLE p eq_ref PRIMARY PRIMARY 4 aanime_aanimedb.t.topic_last_post_id 1
[/code]

From the look of the explain, it seems the query could still further be optimized.  Still, a x10 improvement in execution speed is quite good.

Note that if I remove either of the ORDER BY conditions, the speed is improved to about 0.0015 seconds.  It does not matter if topic_sticky or topic_last_post_id is removed.

Share this post


Link to post
Share on other sites
Create the following multi-column index.

[code]
ALTER TABLE topics ADD INDEX(forum_id, topic_sticky, topic_last_post_id)
[/code]

Remove the FORCE INDEX from the query and run the query again. This index should be used if the "forums" table is chosen as the first table in the JOIN.

If MYSQL chooses the topics table as the first table then make the index on (topic_sticky, topic_last_post_id) instead and you can DROP the previous INDEX.

If you're still not getting any better results try the JOINs without the "posts" table and post if there is a significant difference.

Looking back at the previous EXPLAIN, the forum_level_view index may not have been used because as you said all the forums fall under that number and even if they didn't the table is so small that a table scan is estimated as being faster.

EDIT: You may also want to test with "SELECT STRAIGHT_JOIN * FROM ..." and keep both indexes mentioned earlier on the table to see what MYSQL does when you try different queries.

Share this post


Link to post
Share on other sites
Adding the composite index had no effect.  It did not seem to matter how I structured the query or what composite indexes I had on the topics table.

Removing the post table had a significant difference reducing the execution time to 1/10th of the original time (0.020). 

If I added the "STRAIGHT_JOIN" keyword instead, it had a very similar effect on the execution time (about 1/5th to 1/10th).

This is the EXPLAIN for the straight_join:

[code]1  SIMPLE  t  ALL  forum_id,topic_last_post_id  NULL  NULL  NULL  7823  Using filesort
1 SIMPLE f eq_ref PRIMARY,forum_level_to_view,forum_site PRIMARY 2 aanime_aanimedb.t.forum_id 1 Using where
1 SIMPLE p eq_ref PRIMARY PRIMARY 4 aanime_aanimedb.t.topic_last_post_id 1 [/code]

The "forum" table is still being selected first.

Share this post


Link to post
Share on other sites
[quote author=lostmouse link=topic=112852.msg462050#msg462050 date=1162508690]
This is the EXPLAIN for the straight_join:

[code]1  SIMPLE  t  ALL  forum_id,topic_last_post_id  NULL  NULL  NULL  7823  Using filesort
1 SIMPLE f eq_ref PRIMARY,forum_level_to_view,forum_site PRIMARY 2 aanime_aanimedb.t.forum_id 1 Using where
1 SIMPLE p eq_ref PRIMARY PRIMARY 4 aanime_aanimedb.t.topic_last_post_id 1 [/code]

The "forum" table is still being selected first.
[/quote]
The EXPLAIN shows the "topics" table as the first table.

I can't say that you'll be able to see the difference using this query with the current size of the DB/topics table, but this should be faster and hold up as the table size grows. I've used a large test table and made a similar query which gives good results. You may want to consider setting up a large topics table of your own to query.

Requires MYSQL >= 4.1.
[code]
SELECT
t1.*, p.poster_name, p.post_time
FROM
(
    SELECT
    STRAIGHT JOIN
    t.topic_id, t.topic_title, t.topic_sticky, t.topic_views, t.num_posts, t.topic_last_post_id,
    f.forum_icon, f.forum_name
    FROM
    minibb_topics AS t
    INNER JOIN
    minibb_forums AS f
    ON
    t.forum_id=f.forum_id
    WHERE
    5 >= forum_level_to_view && (forum_site = 0 || forum_site = 1)
    ORDER BY
    t.topic_sticky DESC, t.topic_last_post_id DESC
    LIMIT 20
) AS t1
INNER JOIN
minibb_posts AS p
ON
p.post_id = t1.topic_last_post_id
[/code]

You should still have the multi-column index on (topic_sticky, topic_last_post_id). If it's still not being used then the query will most likely be slow.

Assuming you don't think that 0.02 seconds is a slow query but are only concerned about the difference between ordering and not ordering the results, then revisting the query if or when it becomes slow is an option that you should consider.

Share this post


Link to post
Share on other sites
Wow, this query works very well!  The execution speed is actually more around 0.002 for my server which is more than acceptable.

Thank you so much for the help!

Share this post


Link to post
Share on other sites
There are 2 things I should mention. First, after thinking about what you're after note that the query will show sticky topics even if they have not been recently replied to. Meaning that if there are 20 or more sticky topics in total across all forums, then only those will be shown.

If what you'd like is to have the 20 most recently replied to topics shown and then if any sticky topics are a part of that 20 that they show first, then you should change the query to something similar to the following.

[code]
SELECT
t1.*, p.poster_name, p.post_time
FROM
(
    SELECT
    STRAIGHT JOIN
    t.topic_id, t.topic_title, t.topic_sticky, t.topic_views, t.num_posts, t.topic_last_post_id,
    f.forum_icon, f.forum_name
    FROM
    minibb_topics AS t
    INNER JOIN
    minibb_forums AS f
    ON
    t.forum_id=f.forum_id
    WHERE
    5 >= forum_level_to_view && (forum_site = 0 || forum_site = 1)
    ORDER BY
    t.topic_last_post_id DESC
    LIMIT 20
) AS t1
INNER JOIN
minibb_posts AS p
ON
p.post_id = t1.topic_last_post_id
ORDER BY
t1.topic_sticky DESC
[/code]

Note the ordering inside the subquery and then the ordering after the JOIN.

Secondly, I've noticed that the OPTIMIZER in MYSQL 5 won't use the index on the ORDER BY column even after using STRAIGHT_JOIN. You should be able to have MYSQL use the index by changing the structure to the following.

[code]
SELECT
t1.*, f.*, p.*
FROM
(
    SELECT
    *
    FROM
    topics
    WHERE forum_id IN (SELECT forum_id FROM forums WHERE 5 >= ....)
    ORDER BY
    topic_last....
    LIMIT 20
) AS t1
INNER JOIN
forums
ON forum_id = forum_id
INNER JOIN
posts
ON post_id = post_id
ORDER BY
sticky .....
[/code]

The query should still be quick and it can be used in both versions. Note that an individual index should be put on the topic_last... column to accommodate the new ordering and that the multi-column index on (sticky , topic_last) can serve as an index on the "sticky" column. I'd guess that you'd use the multi-column index on sticky and topic_last when selecting topics from individual forums.

Share this post


Link to post
Share on other sites

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.