Selecting And Merging Duplicate Records In SQL server 2005

Question

Selecting And Merging Duplicate Records In SQL server 2005

4EverandEver

Default port

Points: 1485
More actions
April 29, 2009 at 4:02 am

#220681

Hi All,
I have a requirement to clean up a DB with more than half a million records with duplicates. The challange is i need to return the duplicate records and then merge them, keeping on recod for each record set. The reason for that is that the users have been updating any of the duplicate record, so no data shouldbe lost after the clean up. Is this possible may using using other tools like SSIS or any script?
Any suggestions welcome.
Thanks.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

steveb. SSC-Forever Points: 46733 More actions · Answer 1

You can do this using a script, you will probably need to use SELECT DISTINCT and GROUP BY to remove duplicates. Without knowing exactly what you are doing it is hard to give specific advices

Post your table defs, and some sample data and I am sure someone can help out..

Atif-ullah Sheikh SSChampion Points: 12495 More actions · Answer 2

You can duplicte records by Count(*)/ Group By or Row_Number.

But how will you find out the latest update in those duplicate records. Is there any UpdatedOnDateTime column in your tables?

----------------------------------------------------------------------------------------------------------------------------------------------------------------------

Sometimes, winning is not an issue but trying.
You can check my BLOG [font="Arial Black"]here[/font][/url][/right]

Kruti Kansara Hall of Fame Points: 3118 More actions · Answer 3

@phumlo1

what exactly you want to know?

1) how to select duplicate records?

2) How to cleanup your database without keeping duplicate records?

Please give some more specification.

"Don't limit your challenges, challenge your limits"

GSquared SSC Guru Points: 260824 More actions · Answer 4

There are lots of methods of handling duplicates in SQL. To help you design the best one for your situation, we would need table definitions ("create table..." scripts) and some sample data to put in them (insert... scripts).

When you say you want to return the duplicates and then merge them, do you mean you need to view them first before you merge them? Or do you mean you need to keep a record of some sort of the duplicates that are being removed? Or something else?

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Florian Reischl SSC-Dedicated Points: 37301 More actions · Answer 5

I confirm all other posts. More information imply better answers. Maybe have a look to the link in my signature.

For now a very basic way to handle this:

DECLARE @t TABLE (DuplicateId INT)

INSERT INTO @t

SELECT 1

UNION ALL SELECT 1

UNION ALL SELECT 2

UNION ALL SELECT 3

UNION ALL SELECT 2

; WITH

todo (RowNum) AS

(

SELECT

ROW_NUMBER() OVER (PARTITION BY DuplicateId ORDER BY DuplicateId)

FROM @t

)

DELETE FROM todo WHERE RowNum != 1

SELECT * FROM @t

Greets

Flo

4EverandEver Default port Points: 1485 More actions · Answer 6

Hi All,

Thanks for all the responses. Here is a sample of my DB:

Fullname, ID_no, Modifiedon, Membershipno,Adress1, Address2, Postal Code

Joe Soap, 123, 2008-09-04, 001, 1 sloane str, Hampton, ,

Joe Soap, 123, 2008-09-06, 001, 1 sloane str, Hampton, 3009

Marie Smith,456, 2008-10-10, 002, 2 Brite str,,,

Marie Smith,456, 2008-10-11, 002, 2 Brite str,Hampton,,

Marie Smith, 465, 2008-10-12,,Brite str,Hampton,3009

In this sample Joe soap has 2 records, in his case i want to keep the second one because its got more data. In Marie's case she's got

3 records, all with different data, first one has no address2 and postal code, second has no postal code, third one has got no membershipno.The three records have pieces of data that i need for Maries complete record. I need to get all the pieces into one record(merge) and delete all others.

Many Thanks.

4EverandEver Default port Points: 1485 More actions · Answer 7

Hi All,

Apologies for sending the sample data in the format as in my previous post. will send a proper one.

malaytech2008 Valued Member Points: 65 More actions · Answer 8

This is complex case:

To resolve this issue you can try the following way--

1. First identify the duplicate based on your requirement (may not duplicate)

2. Find the number of records associated with the duplicates.

3. As per your requirement data may come from duplicate records, so you need to write a temp table and insert all the duplicate records into the temp table and check each column value with respective column of other records and insert the data into final temp table which you require.

4. Finally delete the duplicate records from main table and insert the records of final table.

You can do this by using dynamic query.

tfifield SSCrazy Eights Points: 9655 More actions · Answer 9

The only way I've found to handle this sort of thing requires several passes at the data. You want the most complete data possible for each duplicate:

-- Assuming that your table name is MemberShips and that ID_no is really unique

-- for each name.

-- This first get the dups

SELECT

Fullname, ID_no, MAX(Modifiedon) AS Modifieldon

, CAST(NULL AS VARCHAR(10)) AS Membershipno

, CAST(NULL AS VARCHAR(50)) AS Address1

, CAST(NULL AS VARCHAR(50)) AS Address2

, CAST(NULL(AS VARCHAR(10)) AS PostalCode

INTO #DUP

FROM MemberShips

GROUP BY

Fullname, ID_no

HAVING COUNT(*) > 1

-- Get tne non-null MembershipNo

SELECT M.ID_no, M.Membershipno

INTO #M

FROM #Dup

INNER JOIN Memberships M ON

#Dup.ID_no = M.ID_no

WHERE

M.Membershipno IS NOT NULL

-- Get the non-null Address1

SELECT

M.ID_no, M.Address1

INTO #A1

FROM #Dup

INNER JOIN MemberShips M ON

#Dup.ID_no = M.ID_no

WHERE

M.Address1 IS NOT NULL

-- Do the same for Address2, Postal Code, etc.

-- Use these tables to update the #Dup table.

UPDATE #Dup

SET #Dup.Membershipno = #M.Membershipno

FROM #Dup

INNER JOIN #M ON

#Dup.ID_no = #M.ID_no

-- Update addresses and so forth.

-- #Dup should have something for each column if there was one in

-- the original table.

-- Delete the dups in the original table.

DELETE M

FROM #Dup

INNER JOIN Memberships M ON

#Dup.ID_no = M.ID_no

-- Re-insert the completed records from #Dup

INSERT INTO Memeberships

( Fullname, ID_no, etc.)

SELECT

Fullname, IDNo, etc

FROM #DUP

Selecting And Merging Duplicate Records In SQL server 2005

Cookies on SQLServerCentral