June 9, 2010 at 11:52 pm
Hi guys,
I have a problem of listing duplicated rows that include NULL columns. Lemme show my problem first.
USE [tempdb];
GO
IF OBJECT_ID(N'dbo.t') IS NOT NULL
BEGIN
DROP TABLE dbo.t
END
GO
CREATE TABLE dbo.t
(
a NVARCHAR(8),
b NVARCHAR(8)
);
GO
INSERT t VALUES ('a', 'b');
INSERT t VALUES ('a', 'b');
INSERT t VALUES ('a', 'b');
INSERT t VALUES ('c', 'd');
INSERT t VALUES ('c', 'd');
INSERT t VALUES ('c', 'd');
INSERT t VALUES ('c', 'd');
INSERT t VALUES ('e', NULL);
INSERT t VALUES (NULL, NULL);
INSERT t VALUES (NULL, NULL);
INSERT t VALUES (NULL, NULL);
INSERT t VALUES (NULL, NULL);
GO
Now I want to show all rows that have other rows duplicated with them, I use the following query.
SELECT a, b
FROM dbo.t
GROUP
BY a, b
HAVING count(*) > 1
which will give us the result:
a b
-------- --------
NULL NULL
a b
c d
Now if I want to list all rows that make contribution to duplication, I use this query:
WITH
duplicate (a, b) AS
(
SELECT a, b
FROM dbo.t
GROUP
BY a, b
HAVING count(*) > 1
)
SELECT dbo.t.a, dbo.t.b
FROM dbo.t
INNER JOIN duplicate
ON (dbo.t.a = duplicate.a
AND dbo.t.b = duplicate.b)
Which will give me the result:
a b
-------- --------
a b
a b
a b
c d
c d
c d
c d
As you can see, all rows include NULLs are filtered. The reason I thought is that I use equal sign to test the condition(dbo.t.a = duplicate.a AND dbo.t.b = duplicate.b), and NULLs cannot be compared use equal sign. So, in order to include rows that include NULLs in it in the last result, I have change the aforementioned query to
WITH
duplicate (a, b) AS
(
SELECT a, b
FROM dbo.t
GROUP
BY a, b
HAVING count(*) > 1
)
SELECT dbo.t.a, dbo.t.b
FROM dbo.t
INNER JOIN duplicate
ON (dbo.t.a = duplicate.a
AND dbo.t.b = duplicate.b)
OR
(dbo.t.a IS NULL
AND duplicate.a IS NULL
AND dbo.t.b = duplicate.b)
OR
(dbo.t.b IS NULL
AND duplicate.b IS NULL
AND dbo.t.a = duplicate.a)
OR
(dbo.t.a IS NULL
AND duplicate.a IS NULL
AND dbo.t.b IS NULL
AND duplicate.b IS NULL)
And this query will give me the answer as I wanted:
a b
-------- --------
NULL NULL
NULL NULL
NULL NULL
NULL NULL
a b
a b
a b
c d
c d
c d
c d
Now my question is, as you can see, this query just include two columns, in order to include NULLs in the last result, you have to use many condition testing statements in the query. As the column number increasing, the condition testing statements you need in your query is increasing astonishingly. How can I solve this problem?
Great thanks.
June 10, 2010 at 12:43 am
Hey there, try this:
;WITH
duplicate (a, b,c) AS
(
SELECT isnull(a, '*') a , isnull(b, '*') b , count(*) c
FROM dbo.t
GROUP
BY a, b
)
SELECT t.a, t.b
FROM dbo.t t
left join duplicate d
on isnull(t.a, '*') = d.a and isnull(t.b, '*') = d.b
where d.c > 1
Is this what you wanted?
June 10, 2010 at 2:01 am
WITH CTE AS (
SELECT a, b, COUNT(*) OVER(PARTITION BY a,b) AS cn
FROM dbo.t)
SELECT a,b
FROM CTE
WHERE cn>1;
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537June 10, 2010 at 6:56 am
You guys are all genius!
Thanks. Your solution both works like a charm.:-D
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply