Update rows to resolve issues about duplicate keys on Create Unique Index

  • Hi there ...here comes a tricky one.

    I have a database table which needs to make the Index "ParentREF, UniqueName" unique - but this fails because duplicate keys are found. Thus I now need to cleanup these duplicate rows - but I cannot just delete the duplicates, because they might have rows in detail tables.

    This means that all duplicate rows needs an update on the "UniqueName" value - but not the first (valid) one!

    I can find those rows by

    SELECT OID, UniqueName, ParentREF, CreatedUTC, ModifiedUTC FROM dbo.CmsContent AS table0

    WHERE EXISTS (

    SELECT OID, UniqueName, ParentREF FROM dbo.CmsContent AS table1

    WHERE table0.ParentREF = table1.ParentREF

    AND table0.UniqueName = table1.UniqueName

    AND table0.OID != table1.OID

    )

    ORDER BY ParentREF, UniqueName, ModifiedUTC desc

    ...but I struggle to make the required SQL (SP?) to update the "invalid" rows.

    Note: the "valid" row is the one with the newest ModifiedUTC value - this row must kept unchanged!

    ATM the preferred (cause easiest) way is to rename the invalid rows with

    UniqueName = OID

    because if I use any other name I risk to create another double entry.

    Thanks in advance to whoever can help me

    Heiko

  • I think this code will do what you need:

    [font="Courier New"];WITH cteInvalidRecords AS

       (  

       SELECT

           UniqueName,

           ParentRef,

           MIN(ModifiedUTC) AS ModfiedUTC

       FROM

           dbo.CmsContent

       GROUP BY

           UniqueName,

           ParentREF

       HAVING

           COUNT(OID) > 1

       )

    UPDATE CC

       SET UniqueName = CC.OID

    FROM

       dbo.CmcContent CC JOIN

       cteInvalidRecords IR ON

           CC.UniqueName = IR.UniqueName AND

           CC.ParentREF = IR.ParentREF AND

           CC.ModifiedUTC = IR.ModifiedUTC[/font]

    The CTE in the header gets all the records with dupes and the earliest ModifiedUTC for each record then I join the CTE with the table to update the Invalid records. The only issue is if you have NULLS in ModifiedUTC. You may want to put your dupes in a table variable or temp table with a default value for Null ModifiedUTC records.

  • Hi Jack

    thanks for the fast response 🙂

    It works ... only 1 question ... it does not do all matches in 1 run?!

    I had 1936 matches and your method required a couple of runs until it reached 0

    :: affected rows ::

    - 563

    - 43x

    - 3xx

    - 1xx

    ..

    - 3

    - 1

    - 0

    Do you know why??

    If it has to be that way, then I would need that method in a 'while loop' until it reaches 0

    **EDIT: I assume this is because there are a certain number of rows with 3 duplicates .and 4 duplicates etc. ?! thus need additional runs to find those?!

    thanks in advance

    Heiko

  • The only reason this would take more than 1 pass is that there are more than 2 records for some of the UniqueName and ParentREF combinations.

    You could reverse the logic in the CTE and get the valid rows and delete the non-valid rows using a join where

    name <> name

    parentRef <> parentRef

    date <> date

  • Thanks again, Jack 🙂

    Hm, I dont follow :satisfied:

    Can you show it as SQL what you mean?!

    *remind that I cannot delete

    best regards

    Heiko

  • Hi Jack

    One more problem ... in some rare cases ther can exist such rows with identical ModifiedUTC.

    That would result in renaming all (and therefore not left 1 "valid" we wanted)

    Do you have another idea how to solve that?!

    Here is what we do now (first we run the "turnedaround" method to find and rename all "invalid" rows) . then for the case that there were some with identical ModifiedUTC (and thus being left in the Table as double entries) we run it with MIN (your suggest) again to elmininate those as well.

    The only issue is as described above, that we "loose" all rows for these and dont keep 1 valid.

    BEGIN

    ;WITH cteInvalidRecords AS

    (

    SELECT

    UniqueName,

    ParentRef,

    MAX(ModifiedUTC) AS ModifiedUTC

    FROM

    dbo.CmsContent

    GROUP BY

    UniqueName,

    ParentREF

    HAVING

    COUNT(OID) > 1

    )

    UPDATE CC

    SET UniqueName = CC.OID

    FROM

    dbo.CmsContent CC JOIN

    cteInvalidRecords IR ON

    CC.UniqueName = IR.UniqueName AND

    CC.ParentREF = IR.ParentREF AND

    CC.ModifiedUTC <> IR.ModifiedUTC

    END

    BEGIN

    ;WITH cteInvalidRecords AS

    (

    SELECT

    UniqueName,

    ParentRef,

    MIN(ModifiedUTC) AS ModifiedUTC

    FROM

    dbo.CmsContent

    GROUP BY

    UniqueName,

    ParentREF

    HAVING

    COUNT(OID) > 1

    )

    UPDATE CC

    SET UniqueName = CC.OID

    FROM

    dbo.CmsContent CC JOIN

    cteInvalidRecords IR ON

    CC.UniqueName = IR.UniqueName AND

    CC.ParentREF = IR.ParentREF AND

    CC.ModifiedUTC = IR.ModifiedUTC

    END

    kind regards

    Heiko

  • All right. Does it matter which one of the records with duplicate ModifiedUTC, you keep? Here's an idea:

    [font="Courier New"];WITH cteDuplicateValidRecords AS -- get all the records with more than 1 match and keep the first 1?(Min(OID)

       (                              --assuming OID is an uincrementing numeric    

        SELECT

            MIN(OID) AS OID,

            UniqueName,

            ParentRef

            MAX(ModifiedUTC)

        FROM

            dbo.CmsContent CC ON

           VR.UniqueName = CC.UniqueName AND

           VR.ParentRef = CC.ParentRef AND

           VR.ModifiedUTC = CC.ModifiedUTC

       GROUP BY

           UniqueName,

           ParentRef

       HAVING

           COUNT(OID > 1

       ),

    cteInvalidRecords AS -- Find all the non-matching OID's

       (  

       SELECT -- this will get all the duplicates that do not have matching ModifiedUTC

           CC.UniqueName,

           CC.ParentRef,

           MAX(CC.ModifiedUTC) AS ModfiedUTC

       FROM

           dbo.CmsContent CC JOIN

           cteDuplicateValidRecords DVR ON

           CC.OID <> DVR.OID

       GROUP BY

           CC.UniqueName,

           CC.ParentREF

       HAVING

           COUNT(CC.OID) > 1

       UNION -- will remove duplicates since not UNION ALL

       SELECT -- this gets the records with matching ModifiedUTC

           CC.UniqueName,

           CC.ParentRef,

           MAX(CC.ModifiedUTC) AS ModfiedUTC

       FROM

           dbo.CmsContent CC JOIN

           cteDuplicateValidRecords DVR ON

           CC.OID <> DVR.OID

       GROUP BY

           CC.UniqueName,

           CC.ParentREF  

       )

    UPDATE CC

       SET UniqueName = CC.OID

    FROM

       dbo.CmcContent CC JOIN

       cteInvalidRecords IR ON

           CC.UniqueName = IR.UniqueName AND

           CC.ParentREF = IR.ParentREF AND

           CC.ModifiedUTC = IR.ModifiedUTC

    [/font]

  • Thanks Jack

    Yes which one of the Multiple rows with same ModifiedUtc doesnt matter then.

    But OID is an Guid field (i saw your assumption in comment)

    ...and it doesnt validate:

    First it complains about the ( at first MAX

    ParentRef

    MAX(ModifiedUTC)

    obviously because it misses the , behind the ParentRef.

    But then it says "Wrong syntax near ON in line 9 ... but I dont see why... the only thing I see is that the "VR" seems to be nowhere "declared"?! Hmm ....

    thanks in advance

    Heiko

  • My bad, remove from the ON to the GROUP BY in the first CTE and correct the other issues you have found. Just bad typing and deleting by me.:D

  • Hi Jack.

    Okay 🙂 ....

    unfortunately it doesnt work because OID is an Guid, and not an integer.

    Any alternate idea?!

    Thanks in advance

    Heiko

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply