Duplicate in multiple columns

Question

Post reply

Duplicate in multiple columns

PJ_SQL

SSCertifiable

Points: 6707
More actions
July 25, 2016 at 10:37 am

#309433

Hello,
I have a table with 5 columns
A B C D E
1 test test0 test2 test3
1 test1 test1 test3 test4
1 test1 test2 test3 test5
1 test1 test3 test3 test4
I need to find the duplicates from the above excluding column C
so I would need output as :
A B C D E
1 test1 test1 test3 test4
1 test1 test3 test3 test4
Thanks.

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Phil Parkin SSC Guru Points: 247216 More actions · Answer 1

Phil Parkin

SSC Guru

Points: 247216

July 25, 2016 at 10:54 am

#1891853

--Removed

Luis Cazares SSC Guru Points: 183706 More actions · Answer 2

CREATE TABLE #SampleDuplicates(

A int,

B varchar(10),

C varchar(10),

D varchar(10),

E varchar(10)

);

INSERT INTO #SampleDuplicates

VALUES

(1, 'test ', 'test0', 'test2', 'test3'),

(1, 'test1', 'test1', 'test3', 'test4'),

(1, 'test1', 'test2', 'test3', 'test5'),

(1, 'test1', 'test3', 'test3', 'test4');

GO

WITH cteSampleDuplicates AS(

SELECT *, COUNT(*) OVER(PARTITION BY A,B,D,E) rowcnt

FROM #SampleDuplicates o

)

SELECT A,B,C,D,E

FROM cteSampleDuplicates

WHERE rowcnt > 1

GO

DROP TABLE #SampleDuplicates

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Phil Parkin SSC Guru Points: 247216 More actions · Answer 3

Here's an alternative – not as elegant, but may perform better:

IF OBJECT_ID('tempdb..#tab1', 'U') IS NOT NULL

DROP TABLE #tab1;

CREATE TABLE #tab1

(

A INT

,B VARCHAR(10)

,C VARCHAR(10)

,D VARCHAR(10)

,E VARCHAR(10)

);

INSERT #tab1

(A, B, C, D, E)

VALUES (1, 'test', 'test0', 'test2', 'test3'),

(1, 'test1', 'test1', 'test3', 'test4'),

(1, 'test1', 'test2', 'test3', 'test5'),

(1, 'test1', 'test3', 'test3', 'test4');

SELECT *

FROM #tab1 t

WHERE EXISTS ( SELECT 1

FROM #tab1 tx

WHERE tx.A = t.A

AND tx.B = t.B

AND tx.D = t.D

AND tx.E = t.E

GROUP BY tx.A

, tx.B

, tx.D

, tx.E

HAVING COUNT(1) > 1 );

PJ_SQL SSCertifiable Points: 6707 More actions · Answer 4

Thank you this worked.

Other question would be how to find the count of unique values

Luis Cazares SSC Guru Points: 183706 More actions · Answer 5

That should be a lot simpler.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

PJ_SQL SSCertifiable Points: 6707 More actions · Answer 6

PJ_SQL

SSCertifiable

Points: 6707

July 25, 2016 at 1:15 pm

#1891906

Got it. I can just do distinct count.

Thank you.

Eirikur Eiriksson SSC Guru Points: 182951 More actions · Answer 7

Piling on, one of the fastest ways of doing this is to use Window Function with the appropriate POC index

😎

IF OBJECT_ID('tempdb..#tab1', 'U') IS NOT NULL

DROP TABLE #tab1;

CREATE TABLE #tab1

(

A INT

,B VARCHAR(10)

,C VARCHAR(10)

,D VARCHAR(10)

,E VARCHAR(10)

);

INSERT #tab1

(A, B, C, D, E)

VALUES (1, 'test', 'test0', 'test2', 'test3'),

(1, 'test1', 'test1', 'test3', 'test4'),

(1, 'test1', 'test2', 'test3', 'test5'),

(1, 'test1', 'test3', 'test3', 'test4');

CREATE NONCLUSTERED INDEX NCLIDX#tab1_A_B_D_E_INCL_C ON #tab1(A ASC, B ASC, D ASC, E ASC) INCLUDE (C);

SELECT

SD.A

,SD.B

,SD.C

,SD.D

,SD.E

,ROW_NUMBER() OVER

(

PARTITION BY SD.A

,SD.B

,SD.D

,SD.E

ORDER BY SD.A

)

FROM #tab1 SD;