Finding duplicated values in multiple columns (Not the one we all know of)

Question

Finding duplicated values in multiple columns (Not the one we all know of)

derrickmeister

SSC Enthusiast

Points: 125
More actions
April 22, 2019 at 11:07 pm

#3519217
This isn't specifically related to SQL Server 2017 but more a general T-SQL question. I couldn't find the right place for this question.
You may though this is another "duplicated" question that can be solved with Group By and Count > 1 but it's not.
I have been googling for days and pulling my hair for past a couple of days. I hope I can get some guidance from you guys.
Let's say there are 5 columns in a table. (ID, V1 to V4)
I would like to find rows with duplicated values with any other rows in any of the columns (V1 to 4).
Here's a temporary table I would create for this.
```
Declare @TheTable table (ID int, V1 varchar(1), V2 varchar(2), V3 varchar(1), V4 varchar(1))
Insert into @TheTable values (1, 'A', NULL, NULL, NULL)
Insert into @TheTable values (2, 'B', NULL, 'A', NULL)
Insert into @TheTable values (3, 'C', 'B', NULL, NULL)
Insert into @TheTable values (4, 'D', NULL, NULL, 'F')
Insert into @TheTable values (5, NULL, NULL, NULL, NULL)
Insert into @TheTable values (6, 'E', NULL, 'B', NULL)

/* Expected Results (All the rows containing dupcliated values in any of the columns)
IDV1V2V3V4
1ANULLNULLNULL
2BNULLANULL
3CBNULLNULL
6ENULLBNULL

Only row 4, 5 would be valid rows.
*/
```
Any help would be very appreciated!! Thanks 🙂

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

pietlinden SSC Guru Points: 63503 More actions · Answer 1

Why do you have 4 columns that, in theory, contain potential duplicates and lots of nulls? Either I'm missing something, or this is a nonsensical question.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 2

This is one way, not sure if it is the best way:

Declare @TheTable table (ID int, V1 varchar(1), V2 varchar(2), V3 varchar(1), V4 varchar(1)); -- End all your statements with a semicolon
Insert into @TheTable values (1, 'A', NULL, NULL, NULL);
Insert into @TheTable values (2, 'B', NULL, 'A', NULL);
Insert into @TheTable values (3, 'C', 'B', NULL, NULL);
Insert into @TheTable values (4, 'D', NULL, NULL, 'F');
Insert into @TheTable values (5, NULL, NULL, NULL, NULL);
Insert into @TheTable values (6, 'E', NULL, 'B', NULL);
/* Expected Results (All the rows containing dupcliated values in any of the columns)
IDV1V2V3V4
1ANULLNULLNULL
2BNULLANULL
3CBNULLNULL
6ENULLBNULL
Only row 4, 5 would be valid rows.
*/

with BaseData as (
select
tt.ID
, oa1.DataCol
, cnt = case when DataCol is null then 1 else count(*) over (partition by DataCol)end
from
@TheTable as tt
outer apply (select DataCol from (values (tt.V1),(tt.V2),(tt.V3),(tt.V4))dt(DataCol))oa1(DataCol)
)
select
tt.*
from
@TheTable as tt
where
exists(select 1 from Basedata as bd where tt.ID = bd.ID group by bd.ID having max(bd.cnt) > 1);

Jonathan AC Roberts SSCoach Points: 18354 More actions · Answer 3

Declare @TheTable table (ID int, V1 varchar(1), V2 varchar(2), V3 varchar(1), V4 varchar(1))
Insert into @TheTable values (1, 'A', NULL, NULL, NULL)
Insert into @TheTable values (2, 'B', NULL, 'A', NULL)
Insert into @TheTable values (3, 'C', 'B', NULL, NULL)
Insert into @TheTable values (4, 'D', NULL, NULL, 'F')
Insert into @TheTable values (5, NULL, NULL, NULL, NULL)
Insert into @TheTable values (6, 'E', NULL, 'B', NULL)

;with cte as
(
    select *
      from @TheTable t
     cross apply(values (t.V1),(t.V2),(t.V3),(t.V4)) u(col)
)
select DISTINCT a.ID, a.V1, a.V2, a.V3, a.V4
  from cte a
 where exists(select * 
                from cte b
               where b.ID <> a.ID
                 and b.col = a.col)

derrickmeister SSC Enthusiast Points: 125 More actions · Answer 4

pietlinden wrote:

Why do you have 4 columns that, in theory, contain potential duplicates and lots of nulls? Either I'm missing something, or this is a nonsensical question.

I understand this might be kind of unusual. However, there could a case where a user can put any values into 4 extra custom columns whatever they want to put. And at some point, they want to clean up duplicated values or pull some sort of report based on the duplicated values among those extra columns.

Thanks!

This reply was modified 5 years, 7 months ago by derrickmeister.

derrickmeister SSC Enthusiast Points: 125 More actions · Answer 5

derrickmeister

SSC Enthusiast

Points: 125

April 23, 2019 at 3:17 pm

#3524897

Thanks! Lynn and Jonathan.

autoexcrement SSCertifiable Points: 5956 More actions · Answer 6

Lynn-

I'm really fascinated by your solution. I don't think I've come across that concept of outer joining to the FROM table using the VALUES constructor. Is that functionally similar to doing an UNPIVOT in this case?

"If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 7

autoexcrement wrote:

Lynn- I'm really fascinated by your solution. I don't think I've come across that concept of outer joining to the FROM table using the VALUES constructor. Is that functionally similar to doing an UNPIVOT in this case?

Thinking about it, I think it is. It is making multiple rows from multiple columns.