November 20, 2006 at 11:07 am
Dear All,
I have one table
Respondent
-------------
RespondentID -- primary key
LastName
OfficePhoneNumber
EmailAddress
DateOfBirth
LastUpdated
And another one table
RespondentLov
----------------
LOVID
RespondentId -- foreign key references Respondentid of Respondent table
TypeId
LastUpdated
UpdatedBy
In my Respondent table there are duplicate values based on
LastName,OfficePhoneNumber,EmailAddress,DateOfBirth colums.
I want to delete these duplicate records. But need to retain the record
that was recently updated (based on LastUpdated field from the Respondent Table).
Also I have to update the RespondentID in RespondentLOV table with
the record that was recently updated.
wrote following procedure but I received the error.
CREATE PROCEDURE pratty_del
AS
DECLARE
@intErrorCode INT
BEGIN TRAN
UPDATE RespondentLOV
SET respondentid = r1.respondentid
FROM RespondentLOV rl
INNER JOIN respondent r1 ON (rl.respondentid = r1.respondentid)
INNER JOIN respondent r2 ON (r1.lastname = r2.lastname
AND r2.OfficePhoneNumber = r1.OfficePhoneNumber
AND r2.EmailAddress = r1.EmailAddress
AND r2.DateOfBirth = r1.DateOfBirth
AND r1.lastupdated > r2.lastupdated)
SELECT @intErrorCode = @@ERROR
IF (@intErrorCode <> 0) GOTO PROBLEM
DELETE Respondent FROM Respondent
INNER JOIN Respondent r1 ON lastname = r1.lastname
AND OfficePhoneNumber = r1.OfficePhoneNumber
AND EmailAddress = r1.EmailAddress
AND DateOfBirth = r1.DateOfBirth
AND lastupdated > r1.lastupdated
SELECT @intErrorCode = @@ERROR
IF (@intErrorCode <> 0) GOTO PROBLEM
COMMIT TRAN
PROBLEM:
IF (@intErrorCode <> 0) BEGIN
PRINT 'Unexpected error occurred!'
ROLLBACK TRAN
END
Here I received Foreign key error in RespondentLOV table.
Please some one help me.
Thanks In advance.
November 20, 2006 at 11:18 am
One possible approach, probably the only way, is:
Create a new column with a primary key (or with unique value);
Remove the duplicate rows (not include the column with the primary key) but one;
Remove the added column;
November 20, 2006 at 2:43 pm
I would create a temp table which identifies the records to be
deleted and at the same time maps them to their replacements
Select r1.RespondentID As DeleteThisID,
r2.RespondentID As ReplaceWithThisID
Into #remap
From Respondent As r1
Inner Join Respondent As r2
On (r1.LastUpdated < r2.MostRecentUpdate And
r1.LastName = r2.LastName And
r1.OfficePhoneNumber = r2.OfficePhoneNumber And
r1.EmailAddress = r2.EmailAddress And
r1.DateOfBirth = r2.DateOfBirth
)
Inner Join
-- Join a derived table to get most recent update per group. This locates
-- the record to keep
(
Select LastName, OfficePhoneNumber, EmailAddress, DateOfBirth,
Max(LastUpdated) As MostRecentUpdateDate
From Respondent
Group By LastName, OfficePhoneNumber, EmailAddress, DateOfBirth
) r3
On (r3.MostRecentUpdate = r2.LastUpdated And
r3.LastName = r2.LastName And
r3.OfficePhoneNumber = r2.OfficePhoneNumber And
r3.EmailAddress = r2.EmailAddress And
r3.DateOfBirth = r2.DateOfBirth
)
Once the temp table is built, use it to update the dependant table,
then do the delete:
UPDATE rl
SET Respondentid = rm.ReplaceWithThisID
FROM RespondentLOV As rl
INNER JOIN #remap as rm
ON (rl.RespondentID = rm.DeleteThisID)
DELETE R
FROM Respondent As R
INNER JOIN #remap as rm
ON (r.RespondentID = rm.DeleteThisID)
November 21, 2006 at 6:51 am
Hi,
Here I have to add another one condition RespondentType in (2,4).
Can u tell me where I have to add this condition.
November 21, 2006 at 8:11 am
I agree with 'Sql Oracle' on this one except for one thing. You would probably want to maintain your unique ID field. It's a good idea and can come in so handy, (i.e. if you've ever partially appended X number of records to a table in error and need to find and remove them, it's a life save to have a unique ID field).
Either way here's some sample code for removing duplicates while retaining the last record. The key is the 'Group By' clause.
Delete
Table
where
RecID
in
(
select
max(Recid)
from
Table
where Condition in (X,Y,Z)
group by
Value1,Value2,Value3
having
count
(*)> 1
)
------ In Case of Multiple Duplicates
while(@@RowCount > 0)
begin
Delete
Table
where
RecID
in
(
select
max(Recid)
from
Table
where Condition in (X,Y,Z)
group
by
Value1,Value2,Value3
having
count
(*)> 1
)
end
November 21, 2006 at 1:23 pm
Wow, such complex solutions!!
It is a simple process. The 4 steps are:
Alter Table RespondentLOV NOCHECK CONSTRAINT <Foreign key constraint name>
Delete copy1
From Respondent As copy1
Inner Join Respondent As copy2 On copy1.LastName = copy2.LastName
And copy1.OfficePhoneNumber = copy2.OfficePhoneNumber
And copy1.EmailAddress = copy2.EmailAddress
And copy1.DateOfBirth = copy2.DateOfBirth
Where copy1.LastUpdated < copy2.LastUpdated
Or (copy1.LastUpdated = copy2.LastUpdated
And copy1.RespondentID < copy2.RespondentID)
Delete LOV
From RespondentLOV As LOV
Where Not Exists (Select 1 From Respondent
Where RespondentID = LOV.RespondentID)
Alter Table RespondentLOV CHECK CONSTRAINT <Foreign key constraint name>
November 21, 2006 at 1:32 pm
>>3. Delete entries in RespondentLOV that do not exist in Respondent
That wasn't the requirement. Hence the complexity. RespondentLOV records were not to be deleted, they were to be remapped to the ID of the 'survivor' of each grouping.
i.e. remove 1 of the duplicate respondents, but re-link any children of the deleted respondent by updating their foreign keys to point to the survivor of each dupe pair.
November 21, 2006 at 1:50 pm
That doesn't complicate it much at all. I could actually do it in 2 steps without disabling the foreign key, but the update query would be overly complex, and I want it to be easily understandable.
Declare @RespondentIDs Table (RespondentID1 int not null,
RespondentID2 int not null)
Alter Table RespondentLOV NOCHECK CONSTRAINT <Foreign key constraint name>
Insert Into @RespondentIDs (RespondentID1, RespondentID2)
Select copy1.RespondentID, Copy2.RespondentID
From Respondent As copy1
Inner Join Respondent As copy2 On copy1.LastName = copy2.LastName
And copy1.OfficePhoneNumber = copy2.OfficePhoneNumber
And copy1.EmailAddress = copy2.EmailAddress
And copy1.DateOfBirth = copy2.DateOfBirth
Where copy1.LastUpdated < copy2.LastUpdated
Or (copy1.LastUpdated = copy2.LastUpdated
And copy1.RespondentID < copy2.RespondentID)
Delete copy1
From Respondent As copy1
Inner Join Respondent As copy2 On copy1.LastName = copy2.LastName
And copy1.OfficePhoneNumber = copy2.OfficePhoneNumber
And copy1.EmailAddress = copy2.EmailAddress
And copy1.DateOfBirth = copy2.DateOfBirth
Where copy1.LastUpdated < copy2.LastUpdated
Or (copy1.LastUpdated = copy2.LastUpdated
And copy1.RespondentID < copy2.RespondentID)
Update LOV
Set LOV.RespondentID = Rs.RespondentID2
From RespondentLOV As LOV
Inner Join @RespondentIDs As Rs On Rs.RespondentID1 = LOV.RespondentID
Alter Table RespondentLOV CHECK CONSTRAINT <Foreign key constraint name>
Another simple option would be to put a trigger on the Respondent table that takes the RespondentID of the deleted record, finds any matching records in RespondentLOV, finds any "duplicate" recrods in Respondent, and updates RespondentID in RespondentLOV accordingly. You could either remove the trigger after teh delete is finished or leave it on there so that future deleted duplicates would be handled automatically.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply