boo_lolly Posted October 28, 2015 Share Posted October 28, 2015 (edited) In short, I have have a table `foods` that has duplicates on occasion. I have another associated table `food_tags` that has tags associated with the foods listed in the `foods` table. **Objective:** I want to retrieve the records from the `food_tags` table for all duplicates found in the `foods` table except for the *first* records in the `foods` table (the records with the lowest `id` value). SQL Fiddle - http://sqlfiddle.com/#!9/ee1c6/2 I have a working example of what I want but I'm almost certain this is not the best way to do it. Here's what I have: SELECT * FROM `food_tags` WHERE `food_id` IN ( SELECT `id` FROM `foods` WHERE `id` NOT IN ( SELECT MIN(`id`) FROM `foods` GROUP BY `name` ) ) Basically I want to exclude the first record of the duplicates in the `foods` table (the one with the lowest `id`), but retrieve all the remaining duplicates and then get the results from the `food_tags` table based on the `food_id` of the duplicates. I'm very curious to see what your ideas are on how to solve this problem and the best way to approach it based on the constraints. **Additional Information:** Ultimately the goal is to delete these records, along with other records in other tables based on the same criteria. I realize that there are ways you can setup the schema or normalize the database to handle duplicate entries and automatic deletions of records from one table based on deletions in another related table. For the sake of the issue, let's assume that these options are not available. Thanks in advance for your time and insight. Edited October 28, 2015 by boo_lolly Quote Link to comment Share on other sites More sharing options...
Psycho Posted October 28, 2015 Share Posted October 28, 2015 (edited) And why wouldn't you want to fix the schema? I know you stated " . . . let's assume that these options are not available". But, many times people come here asking for a complicated solution for something that was built incorrectly because they can't conceptualize the correct solution. Then the people helping spend time providing something that is less than optimal. Or, we spend a lot of time trying to dig to the source of the issue and have to guide the OP to the better solution. Not sure why you only want the tags of the duplicates and not the first record. Are you using an existing query to get the tags of the original and are wanting the tags of the duplicates to append them together? Anyway, what you are asking is quite simple. Here is the query to get ALL the records from food_tags associated with 'duplicate' records (excluding the original record that is duplicated) SELECT * FROM food_tags WHERE food_id NOT IN ( SELECT MIN(id) FROM foods GROUP BY name ) Here's a version with JOINs to show descriptive values SELECT foods.name, foods.id, tags.name, tags.id FROM food_tags JOIN foods ON foods.id = food_tags.food_id JOIN tags ON tags.id = food_tags.tag_id WHERE food_id NOT IN ( SELECT MIN(id) FROM foods GROUP BY name ) ORDER BY foods.name, foods.id Edited October 28, 2015 by Psycho Quote Link to comment Share on other sites More sharing options...
boo_lolly Posted October 28, 2015 Author Share Posted October 28, 2015 (edited) @Psycho, Thanks for your quick reply! Your first query does what it says, but not what I need. The issue is that we cannot simply grab all that don't match. We must grab records from the food_tags table that have a specific set of food ids. Otherwise we would be deleting several records for several other foods based on additional criteria from the food_tags table which we do not want to do. It must be a specific set of food ids. Hence the initial SELECT * FROM `food_tags` WHERE `food_id` IN (... Think of it like narrowing down the food ids list rather than simply grabbing all that don't match a specific criteria. However, I believe your second query does in fact grab a specific list of only matching food ids. I will test that out and see if this is the case. In response to your initial question - the goal is the migrate the database schema incrementally to accommodate this. Edited October 28, 2015 by boo_lolly Quote Link to comment Share on other sites More sharing options...
boo_lolly Posted October 28, 2015 Author Share Posted October 28, 2015 (edited) Unfortunately the second query does not work because MySQL will not delete from a table that you are performing joins on in a SELECT sub-query. For example DELETE FROM food_tags WHERE food_id IN ( SELECT foods.id FROM food_tags JOIN foods ON foods.id = food_tags.food_id JOIN tags ON tags.id = food_tags.tag_id WHERE food_id NOT IN ( SELECT MIN(id) FROM foods GROUP BY name ) ) This does not work. Error message: You can't specify target table 'food_tags' for update in FROM clause Also it doesn't alleviate the fact that it has 2 SELECT sub-queries which I'm trying to avoid if at all possible for performance reasons. Edited October 28, 2015 by boo_lolly Quote Link to comment Share on other sites More sharing options...
Psycho Posted October 28, 2015 Share Posted October 28, 2015 **Objective:** I want to retrieve the records from the `food_tags` table for all duplicates found in the `foods` table except for the *first* records in the `foods` table (the records with the lowest `id` value). You stated the objective was to retrieve the records. I assumed you would have a script that would select the records then perform some actions that included DELETE - e.g. possible apply the labels of the duplicates to the original if needed. Your first query does what it says, but not what I need. The issue is that we cannot simply grab all that don't match. We must grab records from the food_tags table that have a specific set of food ids. Otherwise we would be deleting several records for several other foods based on additional criteria from the food_tags table which we do not want to do. It must be a specific set of food ids. Hence the initial SELECT * FROM `food_tags` WHERE `food_id` IN (... Think of it like narrowing down the food ids list rather than simply grabbing all that don't match a specific criteria. However, I believe your second query does in fact grab a specific list of only matching food ids. I will test that out and see if this is the case. In response to your initial question - the goal is the migrate the database schema incrementally to accommodate this. I'm not following. The first query I provided should return the records from the food_tags table associated with foods that are duplicated (excluding the original food records). Which is exactly what you stated in your objective. Perhaps you can provide a small subset of example data and what you expect to have returned (deleted)? Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.