Jump to content

[SOLVED] LEFT JOIN three tables... bad idea?


aunquarra

Recommended Posts

Okay, so I've got three tables that have data that I want. I've done left joins on three tables before, but I've managed to avoid it for the most part because I wasn't sure if it was a good or bad idea. In fact, I've erred on the side of just having to do two separate queries with simple two-table joins to avoid it. This time, I'm going to have to either do it, or err on the side of just not having the data visible.

 

So, given that joins are done on integers that are indexed, if not primary keyed in all three tables... Is it a bad idea to do three-table joins?

 

Also, specifically in this scenario, I'll be pulling a lot of data from one table, its one-to-one value from a second table, and a COUNT() of one-to-many rows in a third table. Which brings up a second question... Assuming three-table joins are acceptable, what about when working with a GROUP BY as well?

 

If table sizes make a difference, the primary table could have up to a few thousand entries or so, the second table (the one-to-one relationship) could have up to 20,000 entries, and the third table (the one-to-many which would be COUNT()ed) could have 50,000 or more entries.

Link to comment
Share on other sites

+----+-------------+--------------+--------+---------------+----------+---------+------------------+------+----------------------------------------------+
| id | select_type | table        | type   | possible_keys | key      | key_len | ref              | rows | Extra                                        |
+----+-------------+--------------+--------+---------------+----------+---------+------------------+------+----------------------------------------------+
|  1 | SIMPLE      | g_info       | ALL    | NULL          | NULL     | NULL    | NULL             |    3 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | persons      | eq_ref | PRIMARY       | PRIMARY  | 3       | g_info.person_id |    1 |                                              |
|  1 | SIMPLE      | g_membership | ref    | g/person      | g/person | 3       | g_info.id        |    1 | Using index                                  |
+----+-------------+--------------+--------+---------------+----------+---------+------------------+------+----------------------------------------------+

 

I'll look into inner joins... All I ever learned was left joins, and they've filled every purpose I've had so far. I'll research and if I don't specifically need left joins, then I'll start fixing all my old code...

Link to comment
Share on other sites

Here's the query slimmed down (some of the excessive and verbose retrieved g_info columns have been removed, but everything else is there).

 

As you can tell, I ended up doing both an inner and a left join. Since it's possible for g_membership to have zero rows, it needed to be joined with left join.

 

SELECT `g_info`.`id`, `persons`.`name`, COUNT(`g_membership`.`g_id`)
FROM (`g_info` INNER JOIN `persons` ON `persons`.`id`=`g_info`.`person_id`)
LEFT JOIN `g_membership` ON `g_info`.`id`=`g_membership`.`g_id`
WHERE `category_id`='2' GROUP BY `g_info`.`id`

 

Thanks for the INNER join tip. I'm finding that a lot of my LEFT joins can be accomplished with INNERs.

Link to comment
Share on other sites

I think left joins are fine as long as you understand how they work.  They usually result in SQL fetching rows from the left table, then probing an index for matching rows in the right table.  If that index on the right table is missing, you can get awful performance.

 

If you are selecting a subset of data on the left side of the join and then fetching data that may or may not exist on the right hand side with an index, then the left join is usually fine.  That looks like what you're doing there.

Link to comment
Share on other sites

I think left joins are fine as long as you understand how they work.  They usually result in SQL fetching rows from the left table, then probing an index for matching rows in the right table.  If that index on the right table is missing, you can get awful performance.

 

If you are selecting a subset of data on the left side of the join and then fetching data that may or may not exist on the right hand side with an index, then the left join is usually fine.  That looks like what you're doing there.

Except that an inner join will stop "joining" as soon as one of the join tables fails, while a left join has to keep joining as doesn't use the where clause until the end -- which can get very expensive if you only want matching rows anyway!

Link to comment
Share on other sites

Ok I had a think about this.  A condition applying to the left side of the join can be applied first, but a condition applying to the right side will need to applied afterwards.  Does that sound right?

 

The other day I was helping someone with a left join in mysql where there was no matching index on the right hand table.  The solution we used was to apply the condition on the right table in a subquery, and then left join with the subquery.  Before we did that, it was generating the entire result set and filtering, which was so horrible that it timed out.

Link to comment
Share on other sites

It doesn't use the where clause until the end?  What do you mean by that?

Ok I had a think about this.  A condition applying to the left side of the join can be applied first, but a condition applying to the right side will need to applied afterwards.  Does that sound right?

 

The other day I was helping someone with a left join in mysql where there was no matching index on the right hand table.  The solution we used was to apply the condition on the right table in a subquery, and then left join with the subquery.  Before we did that, it was generating the entire result set and filtering, which was so horrible that it timed out.

 

What I mean is that the join conditions are applied *before* the where clause is applied. 

 

If you have a three table join -- all inner joins -- it will try and find matching rows in each of the subsequent tables ("sweeping join plan") and "combine" (a.k.a. join) that record with its matching record(s). 

 

However, if any condition fails, it will discard that record before even looking at the subsequent tables and join conditions.  That is, non-matching records from the join are eliminated early on in statement execution. 

 

On the other hand, If it's all left joins, this is NEVER possible, since you need to keep all the "NULL records", because presumably you need them (e.g. for an is null check, etc.). 

 

The WHERE clause is only applied after all of the tables are joined... so you end up (potentially) collecting a lot more records than you need with left join, only to throw them out later on!

Link to comment
Share on other sites

How about this situation?

 

SELECT * FROM t1 LEFT JOIN t2 ON (t1.id = t2.id) WHERE t1.name = 'fenway' AND t2.something IS NULL

 

In postgresql at least this will usually be an index scan on t1.name = 'fenway', followed by an index scan on t1.id = t2.id (implementing the left join), and finally a filter on t2.something IS NULL.

Link to comment
Share on other sites

How about this situation?

 

SELECT * FROM t1 LEFT JOIN t2 ON (t1.id = t2.id) WHERE t1.name = 'fenway' AND t2.something IS NULL

 

In postgresql at least this will usually be an index scan on t1.name = 'fenway', followed by an index scan on t1.id = t2.id (implementing the left join), and finally a filter on t2.something IS NULL.

I guess a clarification is in order...

 

You are, of course, correct -- I should have specified that I was talking about the situation where the where clause contains references to the left join-ed tables.  That is, if you query on t2.something = 'something', and you never check IS NULL, then gathering all those extra rows is just more expensive, since the where clause throws them out anyway.  It would be more efficient to use an inner join to discard them at the earlier stage, especially if there are subsequent tables to join!

 

Otherwise, what you described for postgresql will occur in mysql too, and you should be able to demonstrate that with EXPLAIN.

 

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.