fastsol Posted July 8, 2017 Share Posted July 8, 2017 I have the following query that works totally fine and seems to be fairly fast. But I feel that it can be improved on to be faster and more efficient. I've tried several different ways with different joins and such, but nothing besides this seems to give me the results I need. SELECT `v_make`, `id` FROM v_make WHERE `active` = TRUE AND NOT EXISTS( SELECT * FROM vehicle_makes_by_location_inactive WHERE vehicle_makes_by_location_inactive.v_make_id = v_make.`id` AND vehicle_makes_by_location_inactive.location_id = $id) ORDER BY `v_make` ASC It's a pretty straight forward query. I've not used NOT EXISTS very often and that's the part that feel can be improved by doing some kind of JOIN or something. Any insight would be greatly appreciated. Quote Link to comment https://forums.phpfreaks.com/topic/304276-can-this-query-be-improved/ Share on other sites More sharing options...
Jacques1 Posted July 8, 2017 Share Posted July 8, 2017 When you want to check the performance of a query, you need hard facts like the query plan and/or a benchmark. Going by a gut feeling doesn't work, especially since the query you write isn't the query which actually gets executed. The MySQL optimizer can turn seemingly “bad” queries into efficient ones and sometimes seemingly “good” queries into bad ones. One potential optimization (which must be validated) is to turn your correlated subquery which depends on the outer query into an uncorrelated subquery which may only be executed once. SELECT id, v_make FROM v_make WHERE active AND v_make.id NOT IN ( SELECT v_make_id FROM vehicle_makes_by_location_inactive WHERE location_id = 123 ) ORDER BY v_make ASC ; If you want to try a join: SELECT v_make.id, v_make.v_make FROM v_make LEFT JOIN vehicle_makes_by_location_inactive ON vehicle_makes_by_location_inactive.v_make_id = v_make.id AND vehicle_makes_by_location_inactive.location_id = 123 WHERE v_make.active AND vehicle_makes_by_location_inactive.v_make_id IS NULL ; Again, none of this is guaranteed to be better. You need to actually compare the query plans and execution times. Quote Link to comment https://forums.phpfreaks.com/topic/304276-can-this-query-be-improved/#findComment-1548108 Share on other sites More sharing options...
fastsol Posted July 8, 2017 Author Share Posted July 8, 2017 Thanks, I will take a look at these. Quote Link to comment https://forums.phpfreaks.com/topic/304276-can-this-query-be-improved/#findComment-1548110 Share on other sites More sharing options...
requinix Posted July 9, 2017 Share Posted July 9, 2017 Personally I'd use the JOIN method. You can get an understanding of how MySQL will execute a query by EXPLAINing it. EXPLAIN SELECT...Of course then you have to know how to read the output of that, which isn't the easiest thing to learn, but if you're doing it to compare two queries then you can probably find enough to understand what it means. Or you can post them here. 1 Quote Link to comment https://forums.phpfreaks.com/topic/304276-can-this-query-be-improved/#findComment-1548118 Share on other sites More sharing options...
fastsol Posted July 9, 2017 Author Share Posted July 9, 2017 I think I tried that same exact JOIN example posted, but I missed the IS NULL aspect of it, so that may have been my problem. I will give it a go again. Quote Link to comment https://forums.phpfreaks.com/topic/304276-can-this-query-be-improved/#findComment-1548119 Share on other sites More sharing options...
Jacques1 Posted July 9, 2017 Share Posted July 9, 2017 Let's do a reality check with a database system that actually has a decent optimizer like PostgreSQL. Your NOT EXISTS query: Sort (cost=15.98..15.98 rows=1 width= Sort Key: v_make.v_make -> Hash Anti Join (cost=14.91..15.97 rows=1 width= Hash Cond: (v_make.id = vehicle_makes_by_location_inactive.v_make_id) -> Seq Scan on v_make (cost=0.00..1.04 rows=2 width= Filter: active -> Hash (cost=14.79..14.79 rows=10 width=4) -> Bitmap Heap Scan on vehicle_makes_by_location_inactive (cost=4.23..14.79 rows=10 width=4) Recheck Cond: (location_id = 123) -> Bitmap Index Scan on vehicle_makes_by_location_inactive_location_id_idx (cost=0.00..4.23 rows=10 width=0) Index Cond: (location_id = 123) The join: Sort (cost=15.98..15.98 rows=1 width= Sort Key: v_make.v_make -> Hash Anti Join (cost=14.91..15.97 rows=1 width= Hash Cond: (v_make.id = vehicle_makes_by_location_inactive.v_make_id) -> Seq Scan on v_make (cost=0.00..1.04 rows=2 width= Filter: active -> Hash (cost=14.79..14.79 rows=10 width=4) -> Bitmap Heap Scan on vehicle_makes_by_location_inactive (cost=4.23..14.79 rows=10 width=4) Recheck Cond: (location_id = 123) -> Bitmap Index Scan on vehicle_makes_by_location_inactive_location_id_idx (cost=0.00..4.23 rows=10 width=0) Index Cond: (location_id = 123) Even if you don't understand any of the information, you'll recognize one thing: The query plains are exactly the same. In other words, the “optimization” which looked so great on paper does absolutely nothing. Not a thing. All you've done is turn a perfectly readable query which clearly expresses your intentions into a less readable hack which doesn't even work. I hope it's clear now why intuitions and gut feelings (“a join should be faster than a subquery”) really break down when you're using a highly optimizing database system. Just because a query “feels faster” doesn't mean it actually is. 1 Quote Link to comment https://forums.phpfreaks.com/topic/304276-can-this-query-be-improved/#findComment-1548125 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.