Jump to content

Get rows from table "B" based on duplicates in related table "A" excluding original record of the duplicates in MySQL


boo_lolly

Recommended Posts

In short, I have have a table `foods` that has duplicates on occasion. I have another associated table `food_tags` that has tags associated with the foods listed in the `foods` table.

 

**Objective:** I want to retrieve the records from the `food_tags` table for all duplicates found in the `foods` table except for the *first* records in the `foods` table (the records with the lowest `id` value).

 


 

I have a working example of what I want but I'm almost certain this is not the best way to do it. Here's what I have:

 



    SELECT
      *
    FROM
      `food_tags`
    WHERE
      `food_id`
    IN (
      SELECT
        `id`
      FROM
        `foods`
      WHERE
        `id`
      NOT IN (
        SELECT
          MIN(`id`)
        FROM
          `foods`
        GROUP BY
          `name`
      )
    )


 

Basically I want to exclude the first record of the duplicates in the `foods` table (the one with the lowest `id`), but retrieve all the remaining duplicates and then get the results from the `food_tags` table based on the `food_id` of the duplicates.

 

I'm very curious to see what your ideas are on how to solve this problem and the best way to approach it based on the constraints.

 

**Additional Information:** Ultimately the goal is to delete these records, along with other records in other tables based on the same criteria. I realize that there are ways you can setup the schema or normalize the database to handle duplicate entries and automatic deletions of records from one table based on deletions in another related table. For the sake of the issue, let's assume that these options are not available.

 

Thanks in advance for your time and insight.

Edited by boo_lolly
Link to comment
Share on other sites

And why wouldn't you want to fix the schema? I know you stated " . . . let's assume that these options are not available". But, many times people come here asking for a complicated solution for something that was built incorrectly because they can't conceptualize the correct solution. Then the people helping spend time providing something that is less than optimal. Or, we spend a lot of time trying to dig to the source of the issue and have to guide the OP to the better solution.

 

Not sure why you only want the tags of the duplicates and not the first record. Are you using an existing query to get the tags of the original and are wanting the tags of the duplicates to append them together?

 

Anyway, what you are asking is quite simple. Here is the query to get ALL the records from food_tags associated with 'duplicate' records (excluding the original record that is duplicated)

SELECT *
FROM food_tags
WHERE food_id NOT IN (
    SELECT MIN(id)
    FROM foods
    GROUP BY name
)

Here's a version with JOINs to show descriptive values

 

SELECT foods.name, foods.id, tags.name, tags.id
FROM food_tags
JOIN foods ON foods.id = food_tags.food_id
JOIN tags ON tags.id = food_tags.tag_id
WHERE food_id NOT IN (
    SELECT MIN(id)
    FROM foods
    GROUP BY name
)
ORDER BY foods.name, foods.id
Edited by Psycho
Link to comment
Share on other sites

@Psycho,

 

Thanks for your quick reply!

 

Your first query does what it says, but not what I need. The issue is that we cannot simply grab all that don't match. We must grab records from the food_tags table that have a specific set of food ids. Otherwise we would be deleting several records for several other foods based on additional criteria from the food_tags table which we do not want to do. It must be a specific set of food ids. Hence the initial SELECT FROM `food_tags` WHERE `food_id` IN (...
 

Think of it like narrowing down the food ids list rather than simply grabbing all that don't match a specific criteria.

 

However, I believe your second query does in fact grab a specific list of only matching food ids. I will test that out and see if this is the case.

 

In response to your initial question - the goal is the migrate the database schema incrementally to accommodate this.

Edited by boo_lolly
Link to comment
Share on other sites

Unfortunately the second query does not work because MySQL will not delete from a table that you are performing joins on in a SELECT sub-query. For example

DELETE FROM
    food_tags
WHERE
    food_id
IN (
    SELECT foods.id
    FROM food_tags
    JOIN foods ON foods.id = food_tags.food_id
    JOIN tags ON tags.id = food_tags.tag_id
    WHERE food_id NOT IN (
        SELECT MIN(id)
        FROM foods
        GROUP BY name
    )
)

This does not work. Error message: You can't specify target table 'food_tags' for update in FROM clause

 

Also it doesn't alleviate the fact that it has 2 SELECT sub-queries which I'm trying to avoid if at all possible for performance reasons.

Edited by boo_lolly
Link to comment
Share on other sites

 

**Objective:** I want to retrieve the records from the `food_tags` table for all duplicates found in the `foods` table except for the *first* records in the `foods` table (the records with the lowest `id` value).
 

 

You stated the objective was to retrieve the records. I assumed you would have a script that would select the records then perform some actions that included DELETE - e.g. possible apply the labels of the duplicates to the original if needed.

 

 

Your first query does what it says, but not what I need. The issue is that we cannot simply grab all that don't match. We must grab records from the food_tags table that have a specific set of food ids. Otherwise we would be deleting several records for several other foods based on additional criteria from the food_tags table which we do not want to do. It must be a specific set of food ids. Hence the initial SELECT FROM `food_tags` WHERE `food_id` IN (...

 

Think of it like narrowing down the food ids list rather than simply grabbing all that don't match a specific criteria.

 

However, I believe your second query does in fact grab a specific list of only matching food ids. I will test that out and see if this is the case.

 

In response to your initial question - the goal is the migrate the database schema incrementally to accommodate this.

 

I'm not following. The first query I provided should return the records from the food_tags table associated with foods that are duplicated (excluding the original food records). Which is exactly what you stated in your objective. Perhaps you can provide a small subset of example data and what you expect to have returned (deleted)?

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.