How to Delete Duplicate records in following senario???

Question

Post reply

How to Delete Duplicate records in following senario???

prathamesh

Grasshopper

Points: 14
More actions
November 20, 2006 at 11:07 am

#173505

Dear All,
I have one table
Respondent
-------------
RespondentID -- primary key
LastName
OfficePhoneNumber
EmailAddress
DateOfBirth
LastUpdated
And another one table
RespondentLov
----------------
LOVID
RespondentId -- foreign key references Respondentid of Respondent table
TypeId
LastUpdated
UpdatedBy
In my Respondent table there are duplicate values based on
LastName,OfficePhoneNumber,EmailAddress,DateOfBirth colums.
I want to delete these duplicate records. But need to retain the record
that was recently updated (based on LastUpdated field from the Respondent Table).
Also I have to update the RespondentID in RespondentLOV table with
the record that was recently updated.
wrote following procedure but I received the error.
CREATE PROCEDURE pratty_del
AS
DECLARE
@intErrorCode INT
BEGIN TRAN

UPDATE RespondentLOV
SET respondentid = r1.respondentid
FROM RespondentLOV rl
INNER JOIN respondent r1 ON (rl.respondentid = r1.respondentid)
INNER JOIN respondent r2 ON (r1.lastname = r2.lastname
     AND r2.OfficePhoneNumber = r1.OfficePhoneNumber
     AND r2.EmailAddress = r1.EmailAddress
     AND r2.DateOfBirth = r1.DateOfBirth
     AND r1.lastupdated > r2.lastupdated)
SELECT @intErrorCode = @@ERROR
IF (@intErrorCode <> 0) GOTO PROBLEM
DELETE Respondent FROM Respondent
INNER JOIN Respondent r1 ON lastname = r1.lastname
     AND OfficePhoneNumber = r1.OfficePhoneNumber
     AND EmailAddress = r1.EmailAddress
     AND DateOfBirth = r1.DateOfBirth
     AND lastupdated > r1.lastupdated

SELECT @intErrorCode = @@ERROR
IF (@intErrorCode <> 0) GOTO PROBLEM
COMMIT TRAN
PROBLEM:
IF (@intErrorCode <> 0) BEGIN
PRINT 'Unexpected error occurred!'
    ROLLBACK TRAN
END
Here I received Foreign key error in RespondentLOV table.
Please some one help me.
Thanks In advance.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

SQL ORACLE One Orange Chip Points: 27807 More actions · Answer 1

One possible approach, probably the only way, is:

Create a new column with a primary key (or with unique value);

Remove the duplicate rows (not include the column with the primary key) but one;

Remove the added column;

PW-201837 SSC-Insane Points: 20805 More actions · Answer 2

I would create a temp table which identifies the records to be

deleted and at the same time maps them to their replacements

Select r1.RespondentID As DeleteThisID,

r2.RespondentID As ReplaceWithThisID

Into #remap

From Respondent As r1

Inner Join Respondent As r2

On (r1.LastUpdated < r2.MostRecentUpdate And

r1.LastName = r2.LastName And

r1.OfficePhoneNumber = r2.OfficePhoneNumber And

r1.EmailAddress = r2.EmailAddress And

r1.DateOfBirth = r2.DateOfBirth

)

Inner Join

-- Join a derived table to get most recent update per group. This locates

-- the record to keep

(

Select LastName, OfficePhoneNumber, EmailAddress, DateOfBirth,

Max(LastUpdated) As MostRecentUpdateDate

From Respondent

Group By LastName, OfficePhoneNumber, EmailAddress, DateOfBirth

) r3

On (r3.MostRecentUpdate = r2.LastUpdated And

r3.LastName = r2.LastName And

r3.OfficePhoneNumber = r2.OfficePhoneNumber And

r3.EmailAddress = r2.EmailAddress And

r3.DateOfBirth = r2.DateOfBirth

)

Once the temp table is built, use it to update the dependant table,

then do the delete:

UPDATE rl

SET Respondentid = rm.ReplaceWithThisID

FROM RespondentLOV As rl

INNER JOIN #remap as rm

ON (rl.RespondentID = rm.DeleteThisID)

DELETE R

FROM Respondent As R

INNER JOIN #remap as rm

ON (r.RespondentID = rm.DeleteThisID)

prathamesh Grasshopper Points: 14 More actions · Answer 3

Hi,

Here I have to add another one condition RespondentType in (2,4).

Can u tell me where I have to add this condition.

Dan Murray Right there with Babe Points: 779 More actions · Answer 4

I agree with 'Sql Oracle' on this one except for one thing. You would probably want to maintain your unique ID field. It's a good idea and can come in so handy, (i.e. if you've ever partially appended X number of records to a table in error and need to find and remove them, it's a life save to have a unique ID field).

Either way here's some sample code for removing duplicates while retaining the last record. The key is the 'Group By' clause.

Delete

Table

where

RecID

in

(

select

max(Recid)

from

Table

where Condition in (X,Y,Z)

group by

Value1,Value2,Value3

having

count

(*)> 1

)

------ In Case of Multiple Duplicates

while(@@RowCount > 0)

begin

Delete

Table

where

RecID

in

(

select

max(Recid)

from

Table

where Condition in (X,Y,Z)

group

by

Value1,Value2,Value3

having

count

(*)> 1

)

end

Robert Davis One Orange Chip Points: 28027 More actions · Answer 5

Wow, such complex solutions!!

It is a simple process. The 4 steps are:

Disable foreign key
Delete duplicates in Respondent
Delete entries in RespondentLOV that do not exist in Respondent
Enable foreign key

Alter Table RespondentLOV NOCHECK CONSTRAINT <Foreign key constraint name>

Delete copy1

From Respondent As copy1

Inner Join Respondent As copy2 On copy1.LastName = copy2.LastName

And copy1.OfficePhoneNumber = copy2.OfficePhoneNumber

And copy1.EmailAddress = copy2.EmailAddress

And copy1.DateOfBirth = copy2.DateOfBirth

Where copy1.LastUpdated < copy2.LastUpdated

Or (copy1.LastUpdated = copy2.LastUpdated

And copy1.RespondentID < copy2.RespondentID)

Delete LOV

From RespondentLOV As LOV

Where Not Exists (Select 1 From Respondent

Where RespondentID = LOV.RespondentID)

Alter Table RespondentLOV CHECK CONSTRAINT <Foreign key constraint name>

My blog: SQL Soldier[/url]
SQL Server Best Practices: SQL Server Best Practices
Twitter: @SQLSoldier
My book: Pro SQL Server 2008 Mirroring[/url]
Microsoft Certified Master: SQL Server, Data Platform MVP
Database Engineer at BlueMountain Capital Management[/url]

PW-201837 SSC-Insane Points: 20805 More actions · Answer 6

>>3. Delete entries in RespondentLOV that do not exist in Respondent

That wasn't the requirement. Hence the complexity. RespondentLOV records were not to be deleted, they were to be remapped to the ID of the 'survivor' of each grouping.

i.e. remove 1 of the duplicate respondents, but re-link any children of the deleted respondent by updating their foreign keys to point to the survivor of each dupe pair.

Robert Davis One Orange Chip Points: 28027 More actions · Answer 7

That doesn't complicate it much at all. I could actually do it in 2 steps without disabling the foreign key, but the update query would be overly complex, and I want it to be easily understandable.

Declare @RespondentIDs Table (RespondentID1 int not null,

RespondentID2 int not null)

Alter Table RespondentLOV NOCHECK CONSTRAINT <Foreign key constraint name>

Insert Into @RespondentIDs (RespondentID1, RespondentID2)

Select copy1.RespondentID, Copy2.RespondentID

From Respondent As copy1

Inner Join Respondent As copy2 On copy1.LastName = copy2.LastName

And copy1.OfficePhoneNumber = copy2.OfficePhoneNumber

And copy1.EmailAddress = copy2.EmailAddress

And copy1.DateOfBirth = copy2.DateOfBirth

Where copy1.LastUpdated < copy2.LastUpdated

Or (copy1.LastUpdated = copy2.LastUpdated

And copy1.RespondentID < copy2.RespondentID)

Delete copy1

From Respondent As copy1

Inner Join Respondent As copy2 On copy1.LastName = copy2.LastName

And copy1.OfficePhoneNumber = copy2.OfficePhoneNumber

And copy1.EmailAddress = copy2.EmailAddress

And copy1.DateOfBirth = copy2.DateOfBirth

Where copy1.LastUpdated < copy2.LastUpdated

Or (copy1.LastUpdated = copy2.LastUpdated

And copy1.RespondentID < copy2.RespondentID)

Update LOV

Set LOV.RespondentID = Rs.RespondentID2

From RespondentLOV As LOV

Inner Join @RespondentIDs As Rs On Rs.RespondentID1 = LOV.RespondentID

Alter Table RespondentLOV CHECK CONSTRAINT <Foreign key constraint name>

Another simple option would be to put a trigger on the Respondent table that takes the RespondentID of the deleted record, finds any matching records in RespondentLOV, finds any "duplicate" recrods in Respondent, and updates RespondentID in RespondentLOV accordingly. You could either remove the trigger after teh delete is finished or leave it on there so that future deleted duplicates would be handled automatically.

My blog: SQL Soldier[/url]
SQL Server Best Practices: SQL Server Best Practices
Twitter: @SQLSoldier
My book: Pro SQL Server 2008 Mirroring[/url]
Microsoft Certified Master: SQL Server, Data Platform MVP
Database Engineer at BlueMountain Capital Management[/url]