Jump to content

Can this query be improved?


fastsol

Recommended Posts

I have the following query that works totally fine and seems to be fairly fast.  But I feel that it can be improved on to be faster and more efficient.  I've tried several different ways with different joins and such, but nothing besides this seems to give me the results I need.

SELECT `v_make`, `id` 
FROM v_make 
WHERE `active` = TRUE 
AND NOT EXISTS(
   SELECT * 
   FROM vehicle_makes_by_location_inactive 
   WHERE vehicle_makes_by_location_inactive.v_make_id = v_make.`id` 
   AND vehicle_makes_by_location_inactive.location_id = $id) 
ORDER BY `v_make` ASC

It's a pretty straight forward query.  I've not used NOT EXISTS very often and that's the part that feel can be improved by doing some kind of JOIN or something. Any insight would be greatly appreciated.

Link to comment
Share on other sites

When you want to check the performance of a query, you need hard facts like the query plan and/or a benchmark. Going by a gut feeling doesn't work, especially since the query you write isn't the query which actually gets executed. The MySQL optimizer can turn seemingly “bad” queries into efficient ones and sometimes seemingly “good” queries into bad ones.

 

One potential optimization (which must be validated) is to turn your correlated subquery which depends on the outer query into an uncorrelated subquery which may only be executed once.

SELECT
    id,
    v_make
FROM
    v_make
WHERE
    active
    AND v_make.id NOT IN (
        SELECT
            v_make_id
        FROM
            vehicle_makes_by_location_inactive
        WHERE
            location_id = 123
    )
ORDER BY
    v_make ASC
;

If you want to try a join:

SELECT
    v_make.id,
    v_make.v_make
FROM
    v_make
    LEFT JOIN vehicle_makes_by_location_inactive ON vehicle_makes_by_location_inactive.v_make_id = v_make.id
                                                    AND vehicle_makes_by_location_inactive.location_id = 123
WHERE
    v_make.active
    AND vehicle_makes_by_location_inactive.v_make_id IS NULL
;

Again, none of this is guaranteed to be better. You need to actually compare the query plans and execution times.

Link to comment
Share on other sites

Personally I'd use the JOIN method.

 

You can get an understanding of how MySQL will execute a query by EXPLAINing it.

EXPLAIN SELECT...
Of course then you have to know how to read the output of that, which isn't the easiest thing to learn, but if you're doing it to compare two queries then you can probably find enough to understand what it means. Or you can post them here.
Link to comment
Share on other sites

Let's do a reality check with a database system that actually has a decent optimizer like PostgreSQL.

 

Your NOT EXISTS query:

Sort  (cost=15.98..15.98 rows=1 width=
   Sort Key: v_make.v_make
   ->  Hash Anti Join  (cost=14.91..15.97 rows=1 width=
         Hash Cond: (v_make.id = vehicle_makes_by_location_inactive.v_make_id)
         ->  Seq Scan on v_make  (cost=0.00..1.04 rows=2 width=
               Filter: active
         ->  Hash  (cost=14.79..14.79 rows=10 width=4)
               ->  Bitmap Heap Scan on vehicle_makes_by_location_inactive  (cost=4.23..14.79 rows=10 width=4)
                     Recheck Cond: (location_id = 123)
                     ->  Bitmap Index Scan on vehicle_makes_by_location_inactive_location_id_idx  (cost=0.00..4.23 rows=10 width=0)
                           Index Cond: (location_id = 123)

The join:

 Sort  (cost=15.98..15.98 rows=1 width=
   Sort Key: v_make.v_make
   ->  Hash Anti Join  (cost=14.91..15.97 rows=1 width=
         Hash Cond: (v_make.id = vehicle_makes_by_location_inactive.v_make_id)
         ->  Seq Scan on v_make  (cost=0.00..1.04 rows=2 width=
               Filter: active
         ->  Hash  (cost=14.79..14.79 rows=10 width=4)
               ->  Bitmap Heap Scan on vehicle_makes_by_location_inactive  (cost=4.23..14.79 rows=10 width=4)
                     Recheck Cond: (location_id = 123)
                     ->  Bitmap Index Scan on vehicle_makes_by_location_inactive_location_id_idx  (cost=0.00..4.23 rows=10 width=0)
                           Index Cond: (location_id = 123)

Even if you don't understand any of the information, you'll recognize one thing: The query plains are exactly the same.

 

In other words, the “optimization” which looked so great on paper does absolutely nothing. Not a thing. All you've done is turn a perfectly readable query which clearly expresses your intentions into a less readable hack which doesn't even work.

 

I hope it's clear now why intuitions and gut feelings (“a join should be faster than a subquery”) really break down when you're using a highly optimizing database system. Just because a query “feels faster” doesn't mean it actually is.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.