January 18, 2006 at 9:48 am
Hi all,
I have a staging table need to be imported to the employee table which has a primary key on employee id.
This staging table contains many duplicated employees records which has same employee id, name but have different department names or different pay rate or addresses.
I want to delete all duplicate employee records and just leave the first one. How can I do it?
Thank you.
January 18, 2006 at 9:58 am
>>and just leave the first one. How can I do it?
You start with the business requirements and a definition of "first one".
Oldest ? Newest ? Most complete address ? Highest Pay rate ? Lowest Pay Rate ?
You need the rules which define which records to discard & which to keep before even opening the SQL code editor.
January 18, 2006 at 10:02 am
This should grab the first row it runs accross based on the EmpID:
INSERT INTO EmployeeTable (Columns) SELECT Top 1 FROM StagingDatabase..StagingTable WHERE .......
January 18, 2006 at 6:20 pm
January 19, 2006 at 3:29 am
I've used the below script to remove duplicates (may even have found it on this site):
ALTER TABLE TableWithDuplicates ADD EliminateDuplicates_RowID int NOT NULL IDENTITY(1, 1)
DELETE
FROM t1
FROM TableWithDuplicates t1
JOIN(
SELECT EliminateDuplicates_RowID=MAX(EliminateDuplicates_RowID),
Field1,
Field2,
Field3,
Field4
FROM TableWithDuplicates
GROUP BY Field1,
Field2,
Field3,
Field4
HAVING COUNT(*) >= 2
) t2
ON( t1.EliminateDuplicates_RowID <> t2.EliminateDuplicates_RowID
AND t1.Field1 = t2.Field1
AND t1.Field2 = t2.Field2
AND t1.Field3 = t2.Field3
AND t1.Field4 = t2.Field4)
ALTER TABLE TableWithDuplicates DROP COLUMN EliminateDuplicates_RowID
Add a primary key to your staging table and then add in the fields which will make up a unique record.
January 19, 2006 at 10:49 am
DELETE FROM
y
FROM
tblA AS x
INNER JOIN
tblB AS y
ON
x.id > y.id
AND
x.attribute1 = y.attribute1
AND
x.attribute2 = y.attribute2
January 19, 2006 at 1:27 pm
All the employees have more than 1 record, and I only want to have one based on employid and I don't care which one I deleted.
However, the code I write is deleting the whole table. PLEASE HELP!!
DELETE
FROM TestEmp
WHERE TestEmp.EmployID IN
-- List 1 - all rows that have duplicates
(SELECT F.EmployID
FROM TestEmp AS F
WHERE Exists (
SELECT EmployID, Count(TestEmp.EmployID)
FROM TestEmp
WHERE TestEmp.LastName =TestEmp.LastName
AND TestEmp.Frstname = TestEmp.Frstname
GROUP BY TestEmp.Employid
HAVING Count(TestEmp. EmployID) > 1))
January 19, 2006 at 1:45 pm
>>has same employee id, name but have different department names or different pay rate or addresses.
And that's why I asked my original question. If you can't come up with business rules for which record to keep based on dept name, pay rate or address, then you have no choice but to add an additional IDENTITY column to your staging table because the only way you can arbitrarily delete all but 1 of a group of dupes is to have some way of uniquely determining which of the group you don't want to delete.
January 19, 2006 at 2:10 pm
There's almost always at least two or three ways to get something accomplished. Here is an example without a unique identifier:
INSERT INTO x
SELECT y.*
FROM
tbl2 AS x
LEFT JOIN
(
SELECT * FROM #tbl1
UNION
SELECT * FROM #tbl1
[close paren] AS y
ON
x.EmployID = y.EmployID
WHERE
x.EmployID IS NULL
The union eliminates the duplicates and the check for isnull inserts records that are not in the base table yet.
January 19, 2006 at 2:53 pm
Here's a completely different way.
1) In Enterprise Manager, create a NEW copy of the OLD table, with unique contraints, and "Ignore Duplicate Key" checked.
2) Copy from OLD table with duplicates, to NEW table. Only 1 rcd of each dup group will get copied in to NEW table.
3a) Drop OLD table, and rename NEW table to OLD name
OR
3b) Truncate OLD table, and copy NEW records to OLD table.
NOTE: This may not be a good way if you have FK relationships.
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply