aunquarra Posted January 7, 2008 Share Posted January 7, 2008 Okay, so I've got three tables that have data that I want. I've done left joins on three tables before, but I've managed to avoid it for the most part because I wasn't sure if it was a good or bad idea. In fact, I've erred on the side of just having to do two separate queries with simple two-table joins to avoid it. This time, I'm going to have to either do it, or err on the side of just not having the data visible. So, given that joins are done on integers that are indexed, if not primary keyed in all three tables... Is it a bad idea to do three-table joins? Also, specifically in this scenario, I'll be pulling a lot of data from one table, its one-to-one value from a second table, and a COUNT() of one-to-many rows in a third table. Which brings up a second question... Assuming three-table joins are acceptable, what about when working with a GROUP BY as well? If table sizes make a difference, the primary table could have up to a few thousand entries or so, the second table (the one-to-one relationship) could have up to 20,000 entries, and the third table (the one-to-many which would be COUNT()ed) could have 50,000 or more entries. Quote Link to comment Share on other sites More sharing options...
fenway Posted January 7, 2008 Share Posted January 7, 2008 Hard to say... the joining columns should definitely be indexed. Post the EXPLAIN for your current query. Also, unless you really need LEFT JOIN, INNER JOIN is often much, much faster. Quote Link to comment Share on other sites More sharing options...
aunquarra Posted January 7, 2008 Author Share Posted January 7, 2008 +----+-------------+--------------+--------+---------------+----------+---------+------------------+------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+--------------+--------+---------------+----------+---------+------------------+------+----------------------------------------------+ | 1 | SIMPLE | g_info | ALL | NULL | NULL | NULL | NULL | 3 | Using where; Using temporary; Using filesort | | 1 | SIMPLE | persons | eq_ref | PRIMARY | PRIMARY | 3 | g_info.person_id | 1 | | | 1 | SIMPLE | g_membership | ref | g/person | g/person | 3 | g_info.id | 1 | Using index | +----+-------------+--------------+--------+---------------+----------+---------+------------------+------+----------------------------------------------+ I'll look into inner joins... All I ever learned was left joins, and they've filled every purpose I've had so far. I'll research and if I don't specifically need left joins, then I'll start fixing all my old code... Quote Link to comment Share on other sites More sharing options...
fenway Posted January 7, 2008 Share Posted January 7, 2008 Could you post the query too...? I'm not liking that temp/filesort at this point... As far as LEFT vs INNER, unless one of the intermediate tables might not match, LEFT is going to be much slower. Quote Link to comment Share on other sites More sharing options...
aunquarra Posted January 7, 2008 Author Share Posted January 7, 2008 Here's the query slimmed down (some of the excessive and verbose retrieved g_info columns have been removed, but everything else is there). As you can tell, I ended up doing both an inner and a left join. Since it's possible for g_membership to have zero rows, it needed to be joined with left join. SELECT `g_info`.`id`, `persons`.`name`, COUNT(`g_membership`.`g_id`) FROM (`g_info` INNER JOIN `persons` ON `persons`.`id`=`g_info`.`person_id`) LEFT JOIN `g_membership` ON `g_info`.`id`=`g_membership`.`g_id` WHERE `category_id`='2' GROUP BY `g_info`.`id` Thanks for the INNER join tip. I'm finding that a lot of my LEFT joins can be accomplished with INNERs. Quote Link to comment Share on other sites More sharing options...
btherl Posted January 8, 2008 Share Posted January 8, 2008 I think left joins are fine as long as you understand how they work. They usually result in SQL fetching rows from the left table, then probing an index for matching rows in the right table. If that index on the right table is missing, you can get awful performance. If you are selecting a subset of data on the left side of the join and then fetching data that may or may not exist on the right hand side with an index, then the left join is usually fine. That looks like what you're doing there. Quote Link to comment Share on other sites More sharing options...
fenway Posted January 8, 2008 Share Posted January 8, 2008 I think left joins are fine as long as you understand how they work. They usually result in SQL fetching rows from the left table, then probing an index for matching rows in the right table. If that index on the right table is missing, you can get awful performance. If you are selecting a subset of data on the left side of the join and then fetching data that may or may not exist on the right hand side with an index, then the left join is usually fine. That looks like what you're doing there. Except that an inner join will stop "joining" as soon as one of the join tables fails, while a left join has to keep joining as doesn't use the where clause until the end -- which can get very expensive if you only want matching rows anyway! Quote Link to comment Share on other sites More sharing options...
btherl Posted January 8, 2008 Share Posted January 8, 2008 It doesn't use the where clause until the end? What do you mean by that? Quote Link to comment Share on other sites More sharing options...
btherl Posted January 9, 2008 Share Posted January 9, 2008 Ok I had a think about this. A condition applying to the left side of the join can be applied first, but a condition applying to the right side will need to applied afterwards. Does that sound right? The other day I was helping someone with a left join in mysql where there was no matching index on the right hand table. The solution we used was to apply the condition on the right table in a subquery, and then left join with the subquery. Before we did that, it was generating the entire result set and filtering, which was so horrible that it timed out. Quote Link to comment Share on other sites More sharing options...
fenway Posted January 9, 2008 Share Posted January 9, 2008 It doesn't use the where clause until the end? What do you mean by that? Ok I had a think about this. A condition applying to the left side of the join can be applied first, but a condition applying to the right side will need to applied afterwards. Does that sound right? The other day I was helping someone with a left join in mysql where there was no matching index on the right hand table. The solution we used was to apply the condition on the right table in a subquery, and then left join with the subquery. Before we did that, it was generating the entire result set and filtering, which was so horrible that it timed out. What I mean is that the join conditions are applied *before* the where clause is applied. If you have a three table join -- all inner joins -- it will try and find matching rows in each of the subsequent tables ("sweeping join plan") and "combine" (a.k.a. join) that record with its matching record(s). However, if any condition fails, it will discard that record before even looking at the subsequent tables and join conditions. That is, non-matching records from the join are eliminated early on in statement execution. On the other hand, If it's all left joins, this is NEVER possible, since you need to keep all the "NULL records", because presumably you need them (e.g. for an is null check, etc.). The WHERE clause is only applied after all of the tables are joined... so you end up (potentially) collecting a lot more records than you need with left join, only to throw them out later on! Quote Link to comment Share on other sites More sharing options...
btherl Posted January 10, 2008 Share Posted January 10, 2008 How about this situation? SELECT * FROM t1 LEFT JOIN t2 ON (t1.id = t2.id) WHERE t1.name = 'fenway' AND t2.something IS NULL In postgresql at least this will usually be an index scan on t1.name = 'fenway', followed by an index scan on t1.id = t2.id (implementing the left join), and finally a filter on t2.something IS NULL. Quote Link to comment Share on other sites More sharing options...
fenway Posted January 10, 2008 Share Posted January 10, 2008 How about this situation? SELECT * FROM t1 LEFT JOIN t2 ON (t1.id = t2.id) WHERE t1.name = 'fenway' AND t2.something IS NULL In postgresql at least this will usually be an index scan on t1.name = 'fenway', followed by an index scan on t1.id = t2.id (implementing the left join), and finally a filter on t2.something IS NULL. I guess a clarification is in order... You are, of course, correct -- I should have specified that I was talking about the situation where the where clause contains references to the left join-ed tables. That is, if you query on t2.something = 'something', and you never check IS NULL, then gathering all those extra rows is just more expensive, since the where clause throws them out anyway. It would be more efficient to use an inner join to discard them at the earlier stage, especially if there are subsequent tables to join! Otherwise, what you described for postgresql will occur in mysql too, and you should be able to demonstrate that with EXPLAIN. Quote Link to comment Share on other sites More sharing options...
btherl Posted January 11, 2008 Share Posted January 11, 2008 Thanks, just wanted to make sure I wasn't missing anything Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.