Removing Duplicates DueTo Unique Rows

  • Good Afternoon,

    I am working with a dataset that has duplicate ID's due to unique records. The challenge is any number of column(s) can cause the uniqueness therefore there is no way to issolate a single column. The goal is to drop all dups but one and it doesnt matter which records is left.

    Below is a sample dataset.

    Thanks,

    CREATE TABLE [dbo].[DupsDueToUniqueRows](

    [ID] [int] NULL,

    [Name] [nvarchar](30) NULL,

    [Contact] [nvarchar] (30) NULL,

    [Status] [char] (1) NULL

    ) ON [PRIMARY]

    INSERT INTO [DupsDueToUniqueRows] ([ID], [Name], [Contact], [Status])

    SELECT 1, 'Jon' ,'Mobile', 'A'

    UNION ALL

    SELECT 1, 'Jon' ,'Mobile', 'B'

    UNION ALL

    SELECT 1, 'Jon' ,'Pager', 'A'

    UNION ALL

    SELECT 2, 'Paul' ,'Mobile', 'A'

    UNION ALL

    SELECT 2, 'Paul' ,'Mobile', 'B'

    UNION ALL

    SELECT 3, 'Art' ,'Mobile', 'B'

    UNION ALL

    SELECT 3, 'Art' ,'Mobile', 'A'

    UNION ALL

    SELECT 4, 'Mike' ,'Mobile', 'A'

    UNION ALL

    SELECT 4, 'Mike' ,'Pager', 'A'

    UNION ALL

    SELECT 4, 'Mike' ,'Mobile', 'B'

    UNION ALL

    SELECT 4, 'Mike' ,'Cell', 'A'

    UNION ALL

    SELECT 5, 'Art' ,'Mobile', 'A'

    GO

    MCTS: BI 2008, MCITP: BI 2008
    Stay Thirsty My Friends

  • Here is a bare bones skeleton script. If you modify it, you should be able to accomplish your goals.

    WITH dedupeCTE (COl1,Col2, DuplicateCount)

    AS

    (

    SELECT COl1,Col2,

    ROW_NUMBER() OVER(PARTITION BY COl1,Col2 ORDER BY Col1) AS DuplicateCount

    FROM DuplicateRcordTable

    )

    DELETE

    FROM dedupeCTE

    WHERE DuplicateCount > 1

    GO

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Jason,

    Thanks for your assistance. I get an error message as below:

    WITH dedupeCTE (ID, COUNT(*) AS DuplicateCount)

    AS

    (

    SELECT ID, DuplicateCount,

    ROW_NUMBER() OVER(PARTITION BY ID, DuplicateCount ORDER BY ID) AS DuplicateCount

    FROM DupsDueToUniqueRows

    )

    DELETE

    FROM dedupeCTE

    WHERE DuplicateCount > 1

    GO

    =================================

    Msg 207, Level 16, State 1, Line 4

    Invalid column name 'DuplicateCount'.

    Msg 207, Level 16, State 1, Line 5

    Invalid column name 'DuplicateCount'.

    Msg 8158, Level 16, State 1, Line 5

    'dedupeCTE' has more columns than were specified in the column list

    MCTS: BI 2008, MCITP: BI 2008
    Stay Thirsty My Friends

  • cos_ta393

    Other than creating and using a temp table (solely to make it easy for me to test). Jason's code ran fine - now I did use the column names as you specified in your post. Notice neither your post or Jasons showed that the

    T-SQL WITH is preceded by a ; be careful it is a requirement.

    ;WITH dedupeCTE (Id,Name, DuplicateCount)

    AS (SELECT Id,Name,

    ROW_NUMBER() OVER(PARTITION BY Id,Name ORDER BY Id) AS DuplicateCount

    FROM #DupsDueToUniqueRows)

    SELECT * --test before deleting

    FROM dedupeCTE -- use after testing WHERE DuplicateCount > 1

    Results from above are:

    IdNameDuplicateCount

    1Jon1

    1Jon2

    1Jon3

    2Paul1

    2Paul2

    3Art1

    3Art2

    4Mike1

    4Mike2

    4Mike3

    4Mike4

    5Art1

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • bitbucket-25253 (12/4/2009)


    cos_ta393

    ... Notice neither your post or Jasons showed that the

    T-SQL WITH is preceded by a ; be careful it is a requirement.

    Not quite correct, actually. It isn't required that the CTE start with a ; as it is that the previous statement must be treminated by a ;. Since the use of ; is not manditory to end all statements yet, many SQL programmers have simply gotten into the habit of prefacing CTE's with a ; to eliminate one source of errors when using them.

  • Lynn Pettis

    Hate to be picky but if the ; is at the end of the last T-SQL code line (not a comment line) before the WITH, then doesn't that semicolon precede (Come before) the WITH statement?

    Ah the naunces (fine distinctions) of the american language

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • bitbucket-25253 (12/4/2009)


    Lynn Pettis

    Hate to be picky but if the ; is at the end of the last T-SQL code line (not a comment line) before the WITH, then doesn't that semicolon precede (Come before) the WITH statement?

    Ah the naunces (fine distinctions) of the american language

    If there is a statement preceding the CTE, then yes that is accurate.

    If there is no statement preceding the CTE, then it may be executed without the ";"

    Thanks for bringing up the semicolon.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • cos_ta393 (12/4/2009)


    Jason,

    Thanks for your assistance. I get an error message as below:

    WITH dedupeCTE (ID, COUNT(*) AS DuplicateCount)

    AS

    (

    ...

    )

    ...

    Change your cte declaration just a little

    WITH dedupeCTE (ID, DuplicateCount)

    AS

    (

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • For semantics, code in previous post would be:

    ;

    WITH dedupeCTE (ID, DuplicateCount)

    AS

    (

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Semantics can be a pain, but just going by what is in BOL:

    When a CTE is used in a statement that is part of a batch, the statement before it must be followed by a semicolon.

    😉

  • Lynn Pettis (12/4/2009)


    Semantics can be a pain, but just going by what is in BOL:

    When a CTE is used in a statement that is part of a batch, the statement before it must be followed by a semicolon.

    😉

    True, and I always omit the preceding semi-colon if the CTE is a stand-alone. It might be a good practice for forum posts to just automatically include it though.

    Then again, it would not provide learning points...hmmm.:cool:

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Works perfectly. Jason, bitbucket, Lynn thanks for all of your inputs - really appreciated.

    costa

    MCTS: BI 2008, MCITP: BI 2008
    Stay Thirsty My Friends

  • Cos_ta393

    Your welcome.

    It makes me feel good that in some small way I may have assisted someone solve a problem for in so doing I find myself learning something new, and I am sure that Jason and Lynn Pettis feel the same way.

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • You're welcome.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Yes, you are welcome. It is always good to hear when we have been able to help.

    😀

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply