How to Delete Duplicate records in following senario???

  • Dear All,

    I have one table

    Respondent

    -------------

    RespondentID -- primary key

    LastName

    OfficePhoneNumber

    EmailAddress

    DateOfBirth

    LastUpdated

    And another one table

    RespondentLov

    ----------------

    LOVID 

    RespondentId -- foreign key references Respondentid of Respondent table

    TypeId

    LastUpdated

    UpdatedBy

    In my Respondent table there are duplicate values based on

    LastName,OfficePhoneNumber,EmailAddress,DateOfBirth colums.

    I want to delete these duplicate records. But need to retain the record

    that was recently updated (based on LastUpdated field from the Respondent Table).

    Also I have to update the RespondentID in RespondentLOV table with

    the record that was recently updated.

     wrote following procedure but I received the error.

    CREATE PROCEDURE pratty_del

    AS

    DECLARE

     @intErrorCode INT

    BEGIN TRAN

         

     UPDATE RespondentLOV

     SET respondentid = r1.respondentid

     FROM RespondentLOV rl

     INNER JOIN respondent r1 ON (rl.respondentid = r1.respondentid)

     INNER JOIN respondent r2 ON (r1.lastname = r2.lastname

         AND r2.OfficePhoneNumber = r1.OfficePhoneNumber

         AND r2.EmailAddress = r1.EmailAddress

         AND r2.DateOfBirth = r1.DateOfBirth

         AND r1.lastupdated > r2.lastupdated)

     SELECT @intErrorCode = @@ERROR

     IF (@intErrorCode <> 0) GOTO PROBLEM

     DELETE Respondent FROM Respondent

     INNER JOIN Respondent r1 ON  lastname = r1.lastname

         AND OfficePhoneNumber = r1.OfficePhoneNumber

         AND EmailAddress = r1.EmailAddress

         AND DateOfBirth = r1.DateOfBirth

         AND lastupdated > r1.lastupdated

        

     SELECT @intErrorCode = @@ERROR

     IF (@intErrorCode <> 0) GOTO PROBLEM

    COMMIT TRAN

    PROBLEM:

    IF (@intErrorCode <> 0) BEGIN

    PRINT 'Unexpected error occurred!'

        ROLLBACK TRAN

    END

    Here I received Foreign key error in RespondentLOV table.

    Please some one help me.

    Thanks In advance.

     

  • One possible approach, probably the only way, is:

    Create a new column with a primary key (or with unique value);

    Remove the duplicate rows (not include the column with the primary key) but one;

    Remove the added column;

  • I would create a temp table which identifies the records to be

    deleted and at the same time maps them to their replacements

    Select r1.RespondentID As DeleteThisID,

           r2.RespondentID As ReplaceWithThisID

    Into #remap

    From Respondent As r1

    Inner Join Respondent As r2

      On (r1.LastUpdated < r2.MostRecentUpdate And

          r1.LastName = r2.LastName And

          r1.OfficePhoneNumber = r2.OfficePhoneNumber And

          r1.EmailAddress = r2.EmailAddress And

          r1.DateOfBirth = r2.DateOfBirth

      )

    Inner Join

    -- Join a derived table to get most recent update per group. This locates

    -- the record to keep

    (

      Select LastName, OfficePhoneNumber, EmailAddress, DateOfBirth,

             Max(LastUpdated) As MostRecentUpdateDate

      From Respondent

      Group By LastName, OfficePhoneNumber, EmailAddress, DateOfBirth

    ) r3

      On (r3.MostRecentUpdate = r2.LastUpdated And

          r3.LastName = r2.LastName And

          r3.OfficePhoneNumber = r2.OfficePhoneNumber And

          r3.EmailAddress = r2.EmailAddress And

          r3.DateOfBirth = r2.DateOfBirth

      )

    Once the temp table is built, use it to update the dependant table,

    then do the delete:

    UPDATE rl

    SET Respondentid = rm.ReplaceWithThisID

    FROM RespondentLOV As rl

    INNER JOIN #remap as rm

      ON (rl.RespondentID = rm.DeleteThisID)

    DELETE R

    FROM Respondent As R

    INNER JOIN #remap as rm

      ON (r.RespondentID = rm.DeleteThisID)

  • Hi,

    Here I have to add another one condition RespondentType in (2,4).

    Can u tell me where I have to add this condition.

     

  • I agree with 'Sql Oracle' on this one except for one thing. You would probably want to maintain your unique ID field. It's a good idea and can come in so handy, (i.e. if you've ever partially appended X number of records to a table in error and need to find and remove them, it's a life save to have a unique ID field).

    Either way here's some sample code for removing duplicates while retaining the last record. The key is the 'Group By' clause.

     

    Delete

    Table

    where

    RecID

    in

    (

    select

    max(Recid)

    from

    Table

    where Condition in (X,Y,Z)

    group by

    Value1,Value2,Value3

    having

    count

    (*)> 1

    )

    ------ In Case of Multiple Duplicates

    while(@@RowCount > 0)

    begin

    Delete

    Table

    where

    RecID

    in

    (

    select

    max(Recid)

    from

    Table

    where Condition in (X,Y,Z)

    group

    by

    Value1,Value2,Value3

    having

    count

    (*)> 1

    )

    end

  • Wow, such complex solutions!!

    It is a simple process. The 4 steps are:

    1. Disable foreign key
    2. Delete duplicates in Respondent
    3. Delete entries in RespondentLOV that do not exist in Respondent
    4. Enable foreign key

    Alter Table RespondentLOV NOCHECK CONSTRAINT <Foreign key constraint name>

    Delete copy1

    From Respondent As copy1

    Inner Join Respondent As copy2 On copy1.LastName = copy2.LastName

      And copy1.OfficePhoneNumber = copy2.OfficePhoneNumber

      And copy1.EmailAddress = copy2.EmailAddress

      And copy1.DateOfBirth = copy2.DateOfBirth

    Where copy1.LastUpdated < copy2.LastUpdated

    Or (copy1.LastUpdated = copy2.LastUpdated

     And copy1.RespondentID < copy2.RespondentID)

    Delete LOV

    From RespondentLOV As LOV

    Where Not Exists (Select 1 From Respondent

      Where RespondentID = LOV.RespondentID)

    Alter Table RespondentLOV CHECK CONSTRAINT <Foreign key constraint name>


    My blog: SQL Soldier[/url]
    SQL Server Best Practices:
    SQL Server Best Practices
    Twitter: @SQLSoldier
    My book: Pro SQL Server 2008 Mirroring[/url]
    Microsoft Certified Master: SQL Server, Data Platform MVP
    Database Engineer at BlueMountain Capital Management[/url]

  • >>3. Delete entries in RespondentLOV that do not exist in Respondent

    That wasn't the requirement. Hence the complexity. RespondentLOV records were not to be deleted, they were to be remapped to the ID of the 'survivor' of each grouping.

    i.e. remove 1 of the duplicate respondents, but re-link any children of the deleted respondent by updating their foreign keys to point to the survivor of each dupe pair.

     

  • That doesn't complicate it much at all. I could actually do it in 2 steps without disabling the foreign key, but the update query would be overly complex, and I want it to be easily understandable.

    Declare @RespondentIDs Table (RespondentID1 int not null,

        RespondentID2 int not null)

    Alter Table RespondentLOV NOCHECK CONSTRAINT <Foreign key constraint name>

    Insert Into @RespondentIDs (RespondentID1, RespondentID2)

    Select copy1.RespondentID, Copy2.RespondentID

    From Respondent As copy1

    Inner Join Respondent As copy2 On copy1.LastName = copy2.LastName

      And copy1.OfficePhoneNumber = copy2.OfficePhoneNumber

      And copy1.EmailAddress = copy2.EmailAddress

      And copy1.DateOfBirth = copy2.DateOfBirth

    Where copy1.LastUpdated < copy2.LastUpdated

    Or (copy1.LastUpdated = copy2.LastUpdated

     And copy1.RespondentID < copy2.RespondentID)

    Delete copy1

    From Respondent As copy1

    Inner Join Respondent As copy2 On copy1.LastName = copy2.LastName

      And copy1.OfficePhoneNumber = copy2.OfficePhoneNumber

      And copy1.EmailAddress = copy2.EmailAddress

      And copy1.DateOfBirth = copy2.DateOfBirth

    Where copy1.LastUpdated < copy2.LastUpdated

    Or (copy1.LastUpdated = copy2.LastUpdated

     And copy1.RespondentID < copy2.RespondentID)

    Update LOV

    Set LOV.RespondentID = Rs.RespondentID2

    From RespondentLOV As LOV

    Inner Join @RespondentIDs As Rs On Rs.RespondentID1 = LOV.RespondentID

    Alter Table RespondentLOV CHECK CONSTRAINT <Foreign key constraint name>

    Another simple option would be to put a trigger on the Respondent table that takes the RespondentID of the deleted record, finds any matching records in RespondentLOV, finds any "duplicate" recrods in Respondent, and updates RespondentID in RespondentLOV accordingly. You could either remove the trigger after teh delete is finished or leave it on there so that future deleted duplicates would be handled automatically.


    My blog: SQL Soldier[/url]
    SQL Server Best Practices:
    SQL Server Best Practices
    Twitter: @SQLSoldier
    My book: Pro SQL Server 2008 Mirroring[/url]
    Microsoft Certified Master: SQL Server, Data Platform MVP
    Database Engineer at BlueMountain Capital Management[/url]

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply