Jump to content

Finding duplicate columns


Go to solution Solved by Barand,

Recommended Posts

Here is my table structure:

CREATE TABLE IF NOT EXISTS `production_data` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `job_number` int(8) NOT NULL,
  `enterprise` tinytext NOT NULL,
  `part_number` text NOT NULL,
  `description` text NOT NULL,
  `psm` tinytext NOT NULL,
  `qty` int(11) NOT NULL,
  `line_item` varchar(11) NOT NULL,
  `as400_ship_date` date DEFAULT NULL,
  `hold_reason` text NOT NULL DEFAULT '0',
  `hold_date` date DEFAULT NULL,
  `insert_time` timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp() COMMENT 'time order was inserted',
  PRIMARY KEY (`id`),
  KEY `job_line` (`job_number`,`line_item`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
COMMIT;

And here is some sample data that matches what I am looking for:

INSERT INTO `production_data` (`id`, `job_number`, `enterprise`, `part_number`, `description`, `psm`, `qty`, `line_item`, `as400_ship_date`, `hold_reason`, `hold_date`, `insert_time`) VALUES
(22277, 23669401, 'BURGER', 'BK9009FM.20CLDWG', 'BURGER KING 9\'OAH CLEARNCE BAR WOODGRAIN||20/20 DESIGN WOOD GRAIN VINYL BOTH SIDES||DWG||DOES NOT INCLUDE ANCHOR BOLTS||', 'S', 1, 'D', '2021-11-22', '', '0000-00-00', '2021-07-14 11:35:27'),
(22276, 23669401, 'BURGER', 'BK7010BA.OCUWG', 'BURGER KING BOLTS FOR 9\'OAH OCU CANOPY||(4) 3/4\"DIAX48\" BOLTS||HIGH STRENGTH||W/HHN AND WASHERS AND TEMPLATE||', 'S', 1, 'E', '2021-11-22', '', '0000-00-00', '2021-07-14 11:35:27'),
(22278, 23669401, 'BURGER', 'BK9010FM.OCUWG49LG', 'BURGER KING 9\'OAH OCU CANOPY 49\" LG||DISPLAY WOOD GRAIN VINYL BOTH SIDES||DWG BK7067CP_OCU_2019_3DR3  #195336||DWG BK7067CP_OCU_ADPTRKIT2019||', 'S', 1, 'C', '2021-11-22', '', '0000-00-00', '2021-07-14 11:35:27'),
(22272, 23669401, 'BURGER', 'BK1072CS.20', 'BURGER KING 6\' DIA ID SF SGN||20/20 DESIGN||GE TETRA MAX LEDS WITH 3\" RETURN||', 'S', 1, 'A', '2021-11-22', '', '0000-00-00', '2021-07-14 11:35:27'),
(38162, 23669401, 'BURGER', 'BK3023RF.0V', 'BURGER KING 1\'7x3 DIR FACE||(1) DRIVE THRU(ARROW RIGHT)/(ARROW LEFT)||DRIVE THRU||', 'M', 2, 'F', '2021-11-22', '', '0000-00-00', '2021-12-06 12:35:25'),
(38163, 23669401, 'BURGER', 'BK7009BA.20CLWG', 'BURGER KING BOLTS FOR 9\'OAH CLEARANCE||(4) 1\"DIAX42\" BOLTS||HIGH STRENGTH||W/HHN AND WASHERS AND TEMPLATE||', 'S', 1, 'F', '2021-11-22', '', '0000-00-00', '2021-12-06 12:35:25');
COMMIT;

I'm trying to make a query that grabs items like 38162 and 38163 that have the same job_number but duplicate line_items as well as any job_numbers that have empty values for line_items.

Link to comment
https://forums.phpfreaks.com/topic/314281-finding-duplicate-columns/
Share on other sites

  • Solution

Like this?

SELECT id as ids
     , job_number
     , line_item
FROM production_data
WHERE line_item = ''
UNION
SELECT GROUP_CONCAT(id separator ', ') as ids
     , job_number
     , line_item
FROM production_data
GROUP BY job_number, line_item
HAVING COUNT(*) > 1;

[edit] If you want the ids in their own rows, then

SELECT id
     , job_number
     , line_item
FROM production_data
WHERE line_item = ''
UNION
SELECT p.id
     , dupes.job_number
     , dupes.line_item
FROM production_data p 
     JOIN ( 
            SELECT job_number
                 , line_item
            FROM production_data
            GROUP BY job_number, line_item
            HAVING COUNT(*) > 1
        ) dupes USING (job_number, line_item);

 

Edited by Barand
  • Great Answer 1

@BarandSorry for the late response man, I haven't had the time to get back to this project this week until now. Yea man, that works great! 

Can you explain a little more how the "HAVING COUNT(*) > 1" works in the first statement?

I've always had trouble understanding how to properly use the count function in a SQL statement other than simple statements where all I'm doing is something like "SELECT COUNT(*).." to count rows, etc.

I do understand the difference between a UNION and a JOIN...which is interesting because I really thought you were going to use a JOIN on the same table to accomplish this. Shows how much I know!

If you have a GROUP BY to calculate an aggregate (SUM, COUNT etc) you can't use WHERE on the aggregate value as it's value isn't known until the end of the query. So while a WHERE clause filters inputs, a HAVING clause filters outputs.

The COUNT(*) .. GROUP BY  is givings me a count the records for each combination of job_number/line_item. As we are looking for duplicates we are only interested in those with a count of more than one.

As for JOIN vs UNION...

  • SELECT ... FROM A JOIN B gives you rows with columns from both A and B. So a join extends the columns.
  • SELECT ... FROM A UNION SELECT ... FROM B gives you the rows from the first select followed by those from the second. So a union extends the rows.
  • Great Answer 1

Okay awesome man. I've been playing around with it and I am going to continue to. I love learning more about SQL.

I have found there is so much I've done in PHP when it comes to data handling that could be handled all with a proper SQL statement. I just didn't have the knowledge I have now but I still have more to learn. 

I have two books on SQL that are helpful that I reference on a regular basis but sometimes I do get confused with combining certain parts of a SQL statement. 

I greatly appreciate all that you have taught me! Thank you!

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.