How to list all duplicated rows which may include NULL columns?

Question

Post reply

How to list all duplicated rows which may include NULL columns?

ogrishman

SSC Enthusiast

Points: 145
More actions
June 9, 2010 at 11:52 pm

#387121

Hi guys,
I have a problem of listing duplicated rows that include NULL columns. Lemme show my problem first.
USE [tempdb];
GO
IF OBJECT_ID(N'dbo.t') IS NOT NULL
BEGIN
DROP TABLE dbo.t
END
GO
CREATE TABLE dbo.t
(
a NVARCHAR(8),
b NVARCHAR(8)
);
GO
INSERT t VALUES ('a', 'b');
INSERT t VALUES ('a', 'b');
INSERT t VALUES ('a', 'b');
INSERT t VALUES ('c', 'd');
INSERT t VALUES ('c', 'd');
INSERT t VALUES ('c', 'd');
INSERT t VALUES ('c', 'd');
INSERT t VALUES ('e', NULL);
INSERT t VALUES (NULL, NULL);
INSERT t VALUES (NULL, NULL);
INSERT t VALUES (NULL, NULL);
INSERT t VALUES (NULL, NULL);
GO
Now I want to show all rows that have other rows duplicated with them, I use the following query.
SELECT a, b
FROM dbo.t
GROUP
BY a, b
HAVING count(*) > 1
which will give us the result:
a b
-------- --------
NULL NULL
a b
c d
Now if I want to list all rows that make contribution to duplication, I use this query:
WITH
duplicate (a, b) AS
(
SELECT a, b
FROM dbo.t
GROUP
BY a, b
HAVING count(*) > 1
)
SELECT dbo.t.a, dbo.t.b
FROM dbo.t
INNER JOIN duplicate
ON (dbo.t.a = duplicate.a
AND dbo.t.b = duplicate.b)
Which will give me the result:
a b
-------- --------
a b
a b
a b
c d
c d
c d
c d
As you can see, all rows include NULLs are filtered. The reason I thought is that I use equal sign to test the condition(dbo.t.a = duplicate.a AND dbo.t.b = duplicate.b), and NULLs cannot be compared use equal sign. So, in order to include rows that include NULLs in it in the last result, I have change the aforementioned query to
WITH
duplicate (a, b) AS
(
SELECT a, b
FROM dbo.t
GROUP
BY a, b
HAVING count(*) > 1
)
SELECT dbo.t.a, dbo.t.b
FROM dbo.t
INNER JOIN duplicate
ON (dbo.t.a = duplicate.a
AND dbo.t.b = duplicate.b)
OR
(dbo.t.a IS NULL
AND duplicate.a IS NULL
AND dbo.t.b = duplicate.b)
OR
(dbo.t.b IS NULL
AND duplicate.b IS NULL
AND dbo.t.a = duplicate.a)
OR
(dbo.t.a IS NULL
AND duplicate.a IS NULL
AND dbo.t.b IS NULL
AND duplicate.b IS NULL)
And this query will give me the answer as I wanted:
a b
-------- --------
NULL NULL
NULL NULL
NULL NULL
NULL NULL
a b
a b
a b
c d
c d
c d
c d
Now my question is, as you can see, this query just include two columns, in order to include NULLs in the last result, you have to use many condition testing statements in the query. As the column number increasing, the condition testing statements you need in your query is increasing astonishingly. How can I solve this problem?
Great thanks.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply

ColdCoffee SSC-Dedicated Points: 39972 More actions · Answer 1

Hey there, try this:

;WITH

duplicate (a, b,c) AS

(

SELECT isnull(a, '*') a , isnull(b, '*') b , count(*) c

FROM dbo.t

GROUP

BY a, b

)

SELECT t.a, t.b

FROM dbo.t t

left join duplicate d

on isnull(t.a, '*') = d.a and isnull(t.b, '*') = d.b

where d.c > 1

Is this what you wanted?

Mark Cowne One Orange Chip Points: 26954 More actions · Answer 2

WITH CTE AS (

SELECT a, b, COUNT(*) OVER(PARTITION BY a,b) AS cn

FROM dbo.t)

SELECT a,b

FROM CTE

WHERE cn>1;

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

ogrishman SSC Enthusiast Points: 145 More actions · Answer 3

You guys are all genius!

Thanks. Your solution both works like a charm.:-D