Look up duplicates

Question

Look up duplicates

tmccar

SSC Eights!

Points: 990
More actions
July 30, 2012 at 4:59 am

#259932

I have a table of customer names and ids, and I want to identify the duplicate names in a new table
The customer table has records like so:
ID -- Name
1275 -- Customer A
3472 -- Customer A
2812 -- Customer A
1245 -- Customer B
1544 -- Customer C
2567 -- Customer D
3446 -- Customer D
So, I have 3 duplicates for Customer A and 2 for Customer D. I want to take the first as the "master". The output table should look like this:
Master ID -- Slave ID
1275 -- 3472
1275 -- 2812
2567 -- 3446
And so on. (Each duplicate will cause one record to be written to the output table).
I'm using Excel at the moment, but I could switch to SQL. What is the easiest way to do this?

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

Gazareth One Orange Chip Points: 27737 More actions · Answer 1

How's this? Not sure how it'll scale, but it runs pretty much instantly for me on a few thousand records.

From a design point of view, an extra column in the Customer table with a self-referencing FK might be another solution, rather than an extra table.

Cheers

WITH CTE_Dups AS

(

SELECT CustomerId, CustomerName, ROW_NUMBER() OVER (PARTITION BY CustomerName ORDER BY CustomerId) AS RN

from Customer

)

INSERT TargetTable (MasterId, SlaveId)

SELECT cd.CustomerId AS MasterId, cd2.CustomerId AS SlaveId

FROM CTE_Dups cd

INNER JOIN CTE_Dups cd2 ON cd2.CustomerName = cd.CustomerName AND cd2.RN > 1

WHERE cd.RN = 1

GO

tmccar SSC Eights! Points: 990 More actions · Answer 2

That looks good, however I am using SQLite (www.sqlite.org) and it doesn't seem to work with it.

Would I need MS SQL 2008 R2 Express to run it?

Gazareth One Orange Chip Points: 27737 More actions · Answer 3

Yes, it's using MS T-SQL specific functions. I'm not familiar with SQLite so not sure how to translate it into that dialect, sorry.

Luis Cazares SSC Guru Points: 183706 More actions · Answer 4

This could be easier to translate (if even needed)

DECLARE @tTabletable(

IDint,

namevarchar( 20));

INSERT INTO @tTable

SELECT 1275, 'Customer A' UNION ALL

SELECT 3472, 'Customer A' UNION ALL

SELECT 2812, 'Customer A' UNION ALL

SELECT 1245, 'Customer B' UNION ALL

SELECT 1544, 'Customer C' UNION ALL

SELECT 2567, 'Customer D' UNION ALL

SELECT 3446, 'Customer D' ;

SELECT m.Master, t.ID FROM @tTable t

JOIN ( SELECT MIN(ID) AS Master,

name

FROM @tTable

GROUP BY name) M ON t.name = m.name AND m.Master <> t.ID;

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

tmccar SSC Eights! Points: 990 More actions · Answer 5

tmccar

SSC Eights!

Points: 990

July 30, 2012 at 8:43 am

#1519184

Thanks Luis, that works well.

tmccar SSC Eights! Points: 990 More actions · Answer 6

Luis Cazares (7/30/2012)
SELECT m.Master, t.ID FROM @tTable t
JOIN ( SELECT MIN(ID) AS Master,
name
FROM @tTable
GROUP BY name) M ON t.name = m.name AND m.Master <> t.ID;
[/code]

Would you mind explaining what is going on here? It works but I'm not sure why!

Jared SSC Guru Points: 61793 More actions · Answer 7

tmccar (7/30/2012)
Luis Cazares (7/30/2012)
SELECT m.Master, t.ID FROM @tTable t
JOIN ( SELECT MIN(ID) AS Master,
name
FROM @tTable
GROUP BY name) M ON t.name = m.name AND m.Master <> t.ID;
[/code]
Would you mind explaining what is going on here? It works but I'm not sure why!

It is creating a derived table of minimum id's when grouped by name. So you get your "master" id and the associated name in a "virtual" table. You are then joining that back to the main table to include that value with each record from the main table and excluding those rows where the master matches the actual id.

Jared
CE - Microsoft

tmccar SSC Eights! Points: 990 More actions · Answer 8

tmccar

SSC Eights!

Points: 990

July 30, 2012 at 9:22 am

#1519231

Thanks Jared, that explains it.

Luis Cazares SSC Guru Points: 183706 More actions · Answer 9

Thanks Jared, I think I couldn't have explained that well.

Of course, I know what I'm doing but I have problems to explain when I see some things that might look simple to me.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Jared SSC Guru Points: 61793 More actions · Answer 10

Luis Cazares (7/30/2012)
Thanks Jared, I think I couldn't have explained that well.
Of course, I know what I'm doing but I have problems to explain when I see some things that might look simple to me.

I'm sure you could have explained it just as well 🙂

Jared
CE - Microsoft

tmccar SSC Eights! Points: 990 More actions · Answer 11

Hi Jared, Luis

"...It is creating a derived table of minimum id's when grouped by name."

Instead of creating the derived table sorted by minimum ids, how could I change it so that it takes the ID that corresponds to the first occurence of "Name" in the table?

Thanks

Jared SSC Guru Points: 61793 More actions · Answer 12

tmccar (7/30/2012)
Hi Jared, Luis
"...It is creating a derived table of minimum id's when grouped by name."
Instead of creating the derived table sorted by minimum ids, how could I change it so that it takes the ID that corresponds to the first occurence of "Name" in the table?
Thanks

How do you define the "first occurrence?" There is no such thing in SQL Server without an ORDER BY or some other grouping. In this case it is grouping by the name and then finding the minimum id. How do you define first occurrence?

Jared
CE - Microsoft

Luis Cazares SSC Guru Points: 183706 More actions · Answer 13

How would you define "the first ocurrence"?

I'm not sure if this is the best place to search for that answer, as the best solutions would be oriented on MS SQL Server.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

tmccar SSC Eights! Points: 990 More actions · Answer 14

I imported an Excel file, so I was looking at that. It's sorted by name, then by date (newest to oldest) so that the first occurrence of each customer is the newest. (I have a date field in this file, if that would help)