How to remove duplicate record?

Question

Post reply

How to remove duplicate record?

jenniferhu46

SSCrazy

Points: 2990
More actions
January 18, 2006 at 9:48 am

#166717

Hi all,
I have a staging table need to be imported to the employee table which has a primary key on employee id.
This staging table contains many duplicated employees records which has same employee id, name but have different department names or different pay rate or addresses.
I want to delete all duplicate employee records and just leave the first one. How can I do it?
Thank you.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

PW-201837 SSC-Insane Points: 20805 More actions · Answer 1

>>and just leave the first one. How can I do it?

You start with the business requirements and a definition of "first one".

Oldest ? Newest ? Most complete address ? Highest Pay rate ? Lowest Pay Rate ?

You need the rules which define which records to discard & which to keep before even opening the SQL code editor.

John Rowan SSC Guru Points: 56440 More actions · Answer 2

This should grab the first row it runs accross based on the EmpID:

INSERT INTO EmployeeTable (Columns) SELECT Top 1 FROM StagingDatabase..StagingTable WHERE .......

John Rowan

======================================================
======================================================
Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

Sergiy SSC Guru Points: 110208 More actions · Answer 3

Finish your statement:

ORDER BY <criteria>

_____________
Code for TallyGenerator

Mark Schneeberger Old Hand Points: 303 More actions · Answer 4

I've used the below script to remove duplicates (may even have found it on this site):

ALTER TABLE TableWithDuplicates ADD EliminateDuplicates_RowID int NOT NULL IDENTITY(1, 1)

DELETE

FROM t1

FROM TableWithDuplicates t1

JOIN(

SELECT EliminateDuplicates_RowID=MAX(EliminateDuplicates_RowID),

Field1,

Field2,

Field3,

Field4

FROM TableWithDuplicates

GROUP BY Field1,

Field2,

Field3,

Field4

HAVING COUNT(*) >= 2

) t2

ON( t1.EliminateDuplicates_RowID <> t2.EliminateDuplicates_RowID

AND t1.Field1 = t2.Field1

AND t1.Field2 = t2.Field2

AND t1.Field3 = t2.Field3

AND t1.Field4 = t2.Field4)

ALTER TABLE TableWithDuplicates DROP COLUMN EliminateDuplicates_RowID

Add a primary key to your staging table and then add in the fields which will make up a unique record.

David Korb SSCrazy Points: 2505 More actions · Answer 5

DELETE FROM

y

FROM

tblA AS x

INNER JOIN

tblB AS y

ON

x.id > y.id

AND

x.attribute1 = y.attribute1

AND

x.attribute2 = y.attribute2

jenniferhu46 SSCrazy Points: 2990 More actions · Answer 6

All the employees have more than 1 record, and I only want to have one based on employid and I don't care which one I deleted.

However, the code I write is deleting the whole table. PLEASE HELP!!

DELETE

FROM TestEmp

WHERE TestEmp.EmployID IN

-- List 1 - all rows that have duplicates

(SELECT F.EmployID

FROM TestEmp AS F

WHERE Exists (

SELECT EmployID, Count(TestEmp.EmployID)

FROM TestEmp

WHERE TestEmp.LastName =TestEmp.LastName

AND TestEmp.Frstname = TestEmp.Frstname

GROUP BY TestEmp.Employid

HAVING Count(TestEmp. EmployID) > 1))

PW-201837 SSC-Insane Points: 20805 More actions · Answer 7

>>has same employee id, name but have different department names or different pay rate or addresses.

And that's why I asked my original question. If you can't come up with business rules for which record to keep based on dept name, pay rate or address, then you have no choice but to add an additional IDENTITY column to your staging table because the only way you can arbitrarily delete all but 1 of a group of dupes is to have some way of uniquely determining which of the group you don't want to delete.

David Korb SSCrazy Points: 2505 More actions · Answer 8

There's almost always at least two or three ways to get something accomplished. Here is an example without a unique identifier:

INSERT INTO x

SELECT y.*

FROM

tbl2 AS x

LEFT JOIN

(

SELECT * FROM #tbl1

UNION

SELECT * FROM #tbl1

[close paren] AS y

ON

x.EmployID = y.EmployID

WHERE

x.EmployID IS NULL

The union eliminates the duplicates and the check for isnull inserts records that are not in the base table yet.

homebrew01 SSC Guru Points: 55548 More actions · Answer 9

Here's a completely different way.

1) In Enterprise Manager, create a NEW copy of the OLD table, with unique contraints, and "Ignore Duplicate Key" checked.

2) Copy from OLD table with duplicates, to NEW table. Only 1 rcd of each dup group will get copied in to NEW table.

3a) Drop OLD table, and rename NEW table to OLD name

OR

3b) Truncate OLD table, and copy NEW records to OLD table.

NOTE: This may not be a good way if you have FK relationships.