July 30, 2012 at 4:59 am
I have a table of customer names and ids, and I want to identify the duplicate names in a new table
The customer table has records like so:
ID -- Name
1275 -- Customer A
3472 -- Customer A
2812 -- Customer A
1245 -- Customer B
1544 -- Customer C
2567 -- Customer D
3446 -- Customer D
So, I have 3 duplicates for Customer A and 2 for Customer D. I want to take the first as the "master". The output table should look like this:
Master ID -- Slave ID
1275 -- 3472
1275 -- 2812
2567 -- 3446
And so on. (Each duplicate will cause one record to be written to the output table).
I'm using Excel at the moment, but I could switch to SQL. What is the easiest way to do this?
July 30, 2012 at 5:14 am
How's this? Not sure how it'll scale, but it runs pretty much instantly for me on a few thousand records.
From a design point of view, an extra column in the Customer table with a self-referencing FK might be another solution, rather than an extra table.
Cheers
WITH CTE_Dups AS
(
SELECT CustomerId, CustomerName, ROW_NUMBER() OVER (PARTITION BY CustomerName ORDER BY CustomerId) AS RN
from Customer
)
INSERT TargetTable (MasterId, SlaveId)
SELECT cd.CustomerId AS MasterId, cd2.CustomerId AS SlaveId
FROM CTE_Dups cd
INNER JOIN CTE_Dups cd2 ON cd2.CustomerName = cd.CustomerName AND cd2.RN > 1
WHERE cd.RN = 1
GO
July 30, 2012 at 5:41 am
That looks good, however I am using SQLite (www.sqlite.org) and it doesn't seem to work with it.
Would I need MS SQL 2008 R2 Express to run it?
July 30, 2012 at 6:42 am
Yes, it's using MS T-SQL specific functions. I'm not familiar with SQLite so not sure how to translate it into that dialect, sorry.
July 30, 2012 at 7:03 am
This could be easier to translate (if even needed)
DECLARE @tTabletable(
IDint,
namevarchar( 20));
INSERT INTO @tTable
SELECT 1275, 'Customer A' UNION ALL
SELECT 3472, 'Customer A' UNION ALL
SELECT 2812, 'Customer A' UNION ALL
SELECT 1245, 'Customer B' UNION ALL
SELECT 1544, 'Customer C' UNION ALL
SELECT 2567, 'Customer D' UNION ALL
SELECT 3446, 'Customer D' ;
SELECT m.Master, t.ID FROM @tTable t
JOIN ( SELECT MIN(ID) AS Master,
name
FROM @tTable
GROUP BY name) M ON t.name = m.name AND m.Master <> t.ID;
July 30, 2012 at 8:43 am
Thanks Luis, that works well.
July 30, 2012 at 9:10 am
Luis Cazares (7/30/2012)
SELECT m.Master, t.ID FROM @tTable t
JOIN ( SELECT MIN(ID) AS Master,
name
FROM @tTable
GROUP BY name) M ON t.name = m.name AND m.Master <> t.ID;
[/code]
Would you mind explaining what is going on here? It works but I'm not sure why!
July 30, 2012 at 9:14 am
tmccar (7/30/2012)
Luis Cazares (7/30/2012)
SELECT m.Master, t.ID FROM @tTable t
JOIN ( SELECT MIN(ID) AS Master,
name
FROM @tTable
GROUP BY name) M ON t.name = m.name AND m.Master <> t.ID;
[/code]
Would you mind explaining what is going on here? It works but I'm not sure why!
It is creating a derived table of minimum id's when grouped by name. So you get your "master" id and the associated name in a "virtual" table. You are then joining that back to the main table to include that value with each record from the main table and excluding those rows where the master matches the actual id.
Jared
CE - Microsoft
July 30, 2012 at 9:22 am
Thanks Jared, that explains it.
July 30, 2012 at 9:22 am
Thanks Jared, I think I couldn't have explained that well.
Of course, I know what I'm doing but I have problems to explain when I see some things that might look simple to me.
July 30, 2012 at 9:25 am
Luis Cazares (7/30/2012)
Thanks Jared, I think I couldn't have explained that well.Of course, I know what I'm doing but I have problems to explain when I see some things that might look simple to me.
I'm sure you could have explained it just as well 🙂
Jared
CE - Microsoft
July 30, 2012 at 10:06 am
Hi Jared, Luis
"...It is creating a derived table of minimum id's when grouped by name."
Instead of creating the derived table sorted by minimum ids, how could I change it so that it takes the ID that corresponds to the first occurence of "Name" in the table?
Thanks
July 30, 2012 at 10:10 am
tmccar (7/30/2012)
Hi Jared, Luis"...It is creating a derived table of minimum id's when grouped by name."
Instead of creating the derived table sorted by minimum ids, how could I change it so that it takes the ID that corresponds to the first occurence of "Name" in the table?
Thanks
How do you define the "first occurrence?" There is no such thing in SQL Server without an ORDER BY or some other grouping. In this case it is grouping by the name and then finding the minimum id. How do you define first occurrence?
Jared
CE - Microsoft
July 30, 2012 at 10:11 am
How would you define "the first ocurrence"?
I'm not sure if this is the best place to search for that answer, as the best solutions would be oriented on MS SQL Server.
July 30, 2012 at 10:14 am
I imported an Excel file, so I was looking at that. It's sorted by name, then by date (newest to oldest) so that the first occurrence of each customer is the newest. (I have a date field in this file, if that would help)
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply