Duplicate in multiple columns

  • Hello,

    I have a table with 5 columns

    A B C D E

    1 test test0 test2 test3

    1 test1 test1 test3 test4

    1 test1 test2 test3 test5

    1 test1 test3 test3 test4

    I need to find the duplicates from the above excluding column C

    so I would need output as :

    A B C D E

    1 test1 test1 test3 test4

    1 test1 test3 test3 test4

    Thanks.

  • --Removed

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • CREATE TABLE #SampleDuplicates(

    A int,

    B varchar(10),

    C varchar(10),

    D varchar(10),

    E varchar(10)

    );

    INSERT INTO #SampleDuplicates

    VALUES

    (1, 'test ', 'test0', 'test2', 'test3'),

    (1, 'test1', 'test1', 'test3', 'test4'),

    (1, 'test1', 'test2', 'test3', 'test5'),

    (1, 'test1', 'test3', 'test3', 'test4');

    GO

    WITH cteSampleDuplicates AS(

    SELECT *, COUNT(*) OVER(PARTITION BY A,B,D,E) rowcnt

    FROM #SampleDuplicates o

    )

    SELECT A,B,C,D,E

    FROM cteSampleDuplicates

    WHERE rowcnt > 1

    GO

    DROP TABLE #SampleDuplicates

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Here's an alternative – not as elegant, but may perform better:

    IF OBJECT_ID('tempdb..#tab1', 'U') IS NOT NULL

    DROP TABLE #tab1;

    CREATE TABLE #tab1

    (

    A INT

    ,B VARCHAR(10)

    ,C VARCHAR(10)

    ,D VARCHAR(10)

    ,E VARCHAR(10)

    );

    INSERT #tab1

    (A, B, C, D, E)

    VALUES (1, 'test', 'test0', 'test2', 'test3'),

    (1, 'test1', 'test1', 'test3', 'test4'),

    (1, 'test1', 'test2', 'test3', 'test5'),

    (1, 'test1', 'test3', 'test3', 'test4');

    SELECT *

    FROM #tab1 t

    WHERE EXISTS ( SELECT 1

    FROM #tab1 tx

    WHERE tx.A = t.A

    AND tx.B = t.B

    AND tx.D = t.D

    AND tx.E = t.E

    GROUP BY tx.A

    , tx.B

    , tx.D

    , tx.E

    HAVING COUNT(1) > 1 );

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Thank you this worked.

    Other question would be how to find the count of unique values

  • That should be a lot simpler.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Got it. I can just do distinct count.

    Thank you.

  • Piling on, one of the fastest ways of doing this is to use Window Function with the appropriate POC index

    😎

    IF OBJECT_ID('tempdb..#tab1', 'U') IS NOT NULL

    DROP TABLE #tab1;

    CREATE TABLE #tab1

    (

    A INT

    ,B VARCHAR(10)

    ,C VARCHAR(10)

    ,D VARCHAR(10)

    ,E VARCHAR(10)

    );

    INSERT #tab1

    (A, B, C, D, E)

    VALUES (1, 'test', 'test0', 'test2', 'test3'),

    (1, 'test1', 'test1', 'test3', 'test4'),

    (1, 'test1', 'test2', 'test3', 'test5'),

    (1, 'test1', 'test3', 'test3', 'test4');

    CREATE NONCLUSTERED INDEX NCLIDX#tab1_A_B_D_E_INCL_C ON #tab1(A ASC, B ASC, D ASC, E ASC) INCLUDE (C);

    SELECT

    SD.A

    ,SD.B

    ,SD.C

    ,SD.D

    ,SD.E

    ,ROW_NUMBER() OVER

    (

    PARTITION BY SD.A

    ,SD.B

    ,SD.D

    ,SD.E

    ORDER BY SD.A

    )

    FROM #tab1 SD;

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply