Find Duplicate Records in Table and Update one of them.

  • Trying to figure a way to update the 2nd of 2 Duplicate Records

    REQUESTID OTHERSYSTEMID

    951 454215

    952 454215

    I would like to first search for all duplicate OTHERSYSTEMID'S in the table and then append '_D' to the end of the 2nd Duplicate Record for All duplicates found.

    ie. REQUESTID OTHERSYSTEMID

    951 454215

    952 454215_D

  • have you considered using ROW_NUMBER() ?

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • a really nice feature is you an update on a cte affects the base table, so if you generate an identifier for dupes, you update everything that is a dupe.

    WITH MyCTE

    AS

    (

    SELECT ROW_NUMBER() OVER (PARTITION BY REQUESTID,OTHERSYSTEMID ORDER BY REQUESTID,OTHERSYSTEMID) AS RW,*

    FROM MyTable

    )

    --UPDATE MyTarget SET OTHERSYSTEMID = OTHERSYSTEMID + '_D' --what you asked

    --UPDATE MyTarget SET OTHERSYSTEMID = OTHERSYSTEMID + '_D' + CONVERT(varchar,RW) --D2,D3,D4 etc

    SELECT OTHERSYSTEMID , OTHERSYSTEMID + '_D' + CONVERT(varchar,RW) As NewID,* FROM MyTable AS MyTarget

    WHERE RW >1 --is it ok that dupes 2 thru a million all have _D? maybe it shoudl count the dupes?

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Depending on number of rows, and indexes etc... but I would usually do a "GROUP BY" to identify dupes together with MIN to identify original record, then joining back onto main table (ignoring the original record)

    e.g.

    DECLARE @Tbl TABLE ( REQUESTID int, OTHERSYSTEMID varchar(50) )

    INSERT INTO @Tbl ( REQUESTID, OTHERSYSTEMID ) VALUES ( 951, '454215'), (952, '454215' ), ( 953, '454215' )

    ;

    UPDATE Dat SET Dat.OTHERSYSTEMID = Dat.OTHERSYSTEMID + '_D'

    FROM ( SELECT MIN( REQUESTID ) AS MIN_REQUESTID, OTHERSYSTEMID FROM @Tbl GROUP BY OTHERSYSTEMID HAVING COUNT(*) > 1 ) Dupes

    INNER JOIN @Tbl Dat ON Dat.OTHERSYSTEMID = Dupes.OTHERSYSTEMID AND Dat.REQUESTID != Dupes.MIN_REQUESTID

    SELECT * FROM @Tbl

  • Con Alexis (5/8/2016)


    Depending on number of rows, and indexes etc... but I would usually do a "GROUP BY" to identify dupes together with MIN to identify original record, then joining back onto main table (ignoring the original record)

    e.g.

    DECLARE @Tbl TABLE ( REQUESTID int, OTHERSYSTEMID varchar(50) )

    INSERT INTO @Tbl ( REQUESTID, OTHERSYSTEMID ) VALUES ( 951, '454215'), (952, '454215' ), ( 953, '454215' )

    ;

    UPDATE Dat SET Dat.OTHERSYSTEMID = Dat.OTHERSYSTEMID + '_D'

    FROM ( SELECT MIN( REQUESTID ) AS MIN_REQUESTID, OTHERSYSTEMID FROM @Tbl GROUP BY OTHERSYSTEMID HAVING COUNT(*) > 1 ) Dupes

    INNER JOIN @Tbl Dat ON Dat.OTHERSYSTEMID = Dupes.OTHERSYSTEMID AND Dat.REQUESTID != Dupes.MIN_REQUESTID

    SELECT * FROM @Tbl

    It is usually much more efficient to use a CTE with a ROW_NUMBER(), because you only need to read the records once instead of twice (once for the GROUP BY and once for the actual update). ROW_NUMBER() also has the advantage that it's more flexible in that you can specify any order within the partition.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • update t

    set t.id1=999

    from t b join

    (

    select id1,max(id) as id from t

    group by id1 ) a on a.id=b.id

    select * from t

  • prafuljemail (5/9/2016)


    update t

    set t.id1=999

    from t b join

    (

    select id1,max(id) as id from t

    group by id1 ) a on a.id=b.id

    select * from t

    The group by is fine if you KNOW there is only one duplicate record per OtherSystemID.

    if there is more than one duplicate per ID, group by logic will only fix one of them, you'd have to rerun it repetatively for N-1 times.

    the row number version does not suffer from that issue.

    try any of the above solutions in the situation where the data is like this:

    REQUESTID OTHERSYSTEMID

    951 454215

    952 454215

    953 454215

    954 454215

    955 454215

    961 454216

    962 454216

    963 454217

    964 454217

    965 454217

    ;WITH MyCTE([REQUESTID],[OTHERSYSTEMID],[])

    AS

    (

    SELECT '951','454215' UNION ALL

    SELECT '952','454215' UNION ALL

    SELECT '953','454215' UNION ALL

    SELECT '954','454215' UNION ALL

    SELECT '955','454215' UNION ALL

    SELECT '961','454216' UNION ALL

    SELECT '962','454216' UNION ALL

    SELECT '963','454217' UNION ALL

    SELECT '964','454217' UNION ALL

    SELECT '965','454217'

    )

    SELECT * FROM MyCTE;

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Lowell (5/9/2016)


    prafuljemail (5/9/2016)


    update t

    set t.id1=999

    from t b join

    (

    select id1,max(id) as id from t

    group by id1 ) a on a.id=b.id

    select * from t

    The group by is fine if you KNOW there is only one duplicate record per OtherSystemID.

    if there is more than one duplicate per ID, group by logic will only fix one of them, you'd have to rerun it repetatively for N-1 times.

    the row number version does not suffer from that issue.

    try any of the above solutions in the situation where the data is like this:

    REQUESTID OTHERSYSTEMID

    951 454215

    952 454215

    953 454215

    954 454215

    955 454215

    961 454216

    962 454216

    963 454217

    964 454217

    965 454217

    ;WITH MyCTE([REQUESTID],[OTHERSYSTEMID],[])

    AS

    (

    SELECT '951','454215' UNION ALL

    SELECT '952','454215' UNION ALL

    SELECT '953','454215' UNION ALL

    SELECT '954','454215' UNION ALL

    SELECT '955','454215' UNION ALL

    SELECT '961','454216' UNION ALL

    SELECT '962','454216' UNION ALL

    SELECT '963','454217' UNION ALL

    SELECT '964','454217' UNION ALL

    SELECT '965','454217'

    )

    SELECT * FROM MyCTE;

    I disagree. The GROUP BY causes approximately twice as many reads as the ROW_NUMBER in the general case, and in this specific case, he is creating duplicates on id 999 that he is not accounting for, so he will run into problems as he finds more duplicates.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • hey, do you get solution or not.

    i have updated my query with example and sample data on winmilestone com forum database queries

  • @drew,Lowel,prafuljemail,Con

    Thanks guys,

    The Group By Clause will definitely work in this case because I know there will only be 1 Duplicate OTHERSYSTEMID Per RequestID at any give time.

    I'll test the solution(s) given above on a test DB to verify.

  • Id use rownumber.

    its so flexible and powerful for when you have duplicate records

    try the following.

    CREATE TABLE ##X

    (COL1 NVARCHAR(10),COL2 NVARCHAR(10))

    INSERT INTO ##X(COL1,COL2)

    SELECT

    '951' AS COL1,'454215' AS COL2

    UNION ALL

    SELECT '952','454215' UNION ALL

    SELECT '953','454215' UNION ALL

    SELECT '954','454215' UNION ALL

    SELECT '955','454215' UNION ALL

    SELECT '961','454216' UNION ALL

    SELECT '962','454216' UNION ALL

    SELECT '963','454217' UNION ALL

    SELECT '964','454217' UNION ALL

    SELECT '965','454217'

    UPDATE A

    SET COL2=COL2+'_D'

    FROM (

    SELECT *,ROW_NUMBER () OVER (PARTITION BY COL2 ORDER BY COL2) NUM FROM ##X) AS a

    WHERE NUM>1

    SELECT * FROM ##X

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply