Get rows from table "B" based on duplicates in related table "A" excluding original record of the duplicates in MySQL

boo_lolly · October 28, 2015

In short, I have have a table `foods` that has duplicates on occasion. I have another associated table `food_tags` that has tags associated with the foods listed in the `foods` table.

**Objective:** I want to retrieve the records from the `food_tags` table for all duplicates found in the `foods` table except for the *first* records in the `foods` table (the records with the lowest `id` value).

SQL Fiddle - http://sqlfiddle.com/#!9/ee1c6/2

I have a working example of what I want but I'm almost certain this is not the best way to do it. Here's what I have:



    SELECT

      *

    FROM

      `food_tags`

    WHERE

      `food_id`

    IN (

      SELECT

        `id`

      FROM

        `foods`

      WHERE

        `id`

      NOT IN (

        SELECT

          MIN(`id`)

        FROM

          `foods`

        GROUP BY

          `name`

      )

    )

Basically I want to exclude the first record of the duplicates in the `foods` table (the one with the lowest `id`), but retrieve all the remaining duplicates and then get the results from the `food_tags` table based on the `food_id` of the duplicates.

I'm very curious to see what your ideas are on how to solve this problem and the best way to approach it based on the constraints.

**Additional Information:** Ultimately the goal is to delete these records, along with other records in other tables based on the same criteria. I realize that there are ways you can setup the schema or normalize the database to handle duplicate entries and automatic deletions of records from one table based on deletions in another related table. For the sake of the issue, let's assume that these options are not available.

Thanks in advance for your time and insight.

Edited October 28, 2015 by boo_lolly

Psycho · October 28, 2015

And why wouldn't you want to fix the schema? I know you stated " . . . let's assume that these options are not available". But, many times people come here asking for a complicated solution for something that was built incorrectly because they can't conceptualize the correct solution. Then the people helping spend time providing something that is less than optimal. Or, we spend a lot of time trying to dig to the source of the issue and have to guide the OP to the better solution.

Not sure why you only want the tags of the duplicates and not the first record. Are you using an existing query to get the tags of the original and are wanting the tags of the duplicates to append them together?

Anyway, what you are asking is quite simple. Here is the query to get ALL the records from food_tags associated with 'duplicate' records (excluding the original record that is duplicated)

SELECT *
FROM food_tags
WHERE food_id NOT IN (
    SELECT MIN(id)
    FROM foods
    GROUP BY name
)

Here's a version with JOINs to show descriptive values

SELECT foods.name, foods.id, tags.name, tags.id
FROM food_tags
JOIN foods ON foods.id = food_tags.food_id
JOIN tags ON tags.id = food_tags.tag_id
WHERE food_id NOT IN (
    SELECT MIN(id)
    FROM foods
    GROUP BY name
)
ORDER BY foods.name, foods.id

Edited October 28, 2015 by Psycho

boo_lolly · October 28, 2015

@Psycho,

Thanks for your quick reply!

Your first query does what it says, but not what I need. The issue is that we cannot simply grab all that don't match. We must grab records from the food_tags table that have a specific set of food ids. Otherwise we would be deleting several records for several other foods based on additional criteria from the food_tags table which we do not want to do. It must be a specific set of food ids. Hence the initial SELECT * FROM `food_tags` WHERE `food_id` IN (...

Think of it like narrowing down the food ids list rather than simply grabbing all that don't match a specific criteria.

However, I believe your second query does in fact grab a specific list of only matching food ids. I will test that out and see if this is the case.

In response to your initial question - the goal is the migrate the database schema incrementally to accommodate this.

Edited October 28, 2015 by boo_lolly

boo_lolly · October 28, 2015

Unfortunately the second query does not work because MySQL will not delete from a table that you are performing joins on in a SELECT sub-query. For example

DELETE FROM
    food_tags
WHERE
    food_id
IN (
    SELECT foods.id
    FROM food_tags
    JOIN foods ON foods.id = food_tags.food_id
    JOIN tags ON tags.id = food_tags.tag_id
    WHERE food_id NOT IN (
        SELECT MIN(id)
        FROM foods
        GROUP BY name
    )
)

This does not work. Error message: You can't specify target table 'food_tags' for update in FROM clause

Also it doesn't alleviate the fact that it has 2 SELECT sub-queries which I'm trying to avoid if at all possible for performance reasons.

Edited October 28, 2015 by boo_lolly

Psycho · October 28, 2015

**Objective:** I want to retrieve the records from the `food_tags` table for all duplicates found in the `foods` table except for the *first* records in the `foods` table (the records with the lowest `id` value).

You stated the objective was to retrieve the records. I assumed you would have a script that would select the records then perform some actions that included DELETE - e.g. possible apply the labels of the duplicates to the original if needed.

Your first query does what it says, but not what I need. The issue is that we cannot simply grab all that don't match. We must grab records from the food_tags table that have a specific set of food ids. Otherwise we would be deleting several records for several other foods based on additional criteria from the food_tags table which we do not want to do. It must be a specific set of food ids. Hence the initial SELECT * FROM `food_tags` WHERE `food_id` IN (...

Think of it like narrowing down the food ids list rather than simply grabbing all that don't match a specific criteria.

However, I believe your second query does in fact grab a specific list of only matching food ids. I will test that out and see if this is the case.

In response to your initial question - the goal is the migrate the database schema incrementally to accommodate this.

I'm not following. The first query I provided should return the records from the food_tags table associated with foods that are duplicated (excluding the original food records). Which is exactly what you stated in your objective. Perhaps you can provide a small subset of example data and what you expect to have returned (deleted)?

Sign In

Get rows from table "B" based on duplicates in related table "A" excluding original record of the duplicates in MySQL

Recommended Posts

boo_lolly

Link to comment

Share on other sites

Psycho

Link to comment

Share on other sites

boo_lolly

Link to comment

Share on other sites

boo_lolly

Link to comment

Share on other sites

Psycho

Link to comment

Share on other sites

Join the conversation

Browse

Activity

Important Information