February 23, 2010 at 12:15 pm
I know that by default, the MIN function ignores any NULL values. Is there any way to cause it to not ignore NULLs? Basically I'm trying to find a MIN date from a rowset. However, if any of the dates are NULL I want the MIN value to be NULL. Only if all dates are not NULL do I want an actual date value.
Possible?
February 23, 2010 at 12:50 pm
DECLARE @table TABLE (d date)
INSERT INTO @table
VALUES (GETDATE()), (GETDATE() + 10), (GETDATE() - 50000), (NULL)
SELECT
DATEADD(dd, 0, NULLIF(MIN(COALESCE(DATEDIFF(dd, 0, d), -2147483648)), -2147483648))
FROM
@table
February 23, 2010 at 1:00 pm
Nest NullIf and IsNull.
If your datetime column is called MyDate and it's in table MyTable:
select IsNull(MyDate, 0)
from dbo.MyTable;
That will replace the nulls with 0s (datetime 0 = 1 Jan 1900).
Then:
select min(IsNull(MyDate, 0))
from dbo.MyTable;
Will get 0 if there's a null in the list.
And:
select NullIf(Min(IsNull(MyDate, 0)), 0)
from dbo.MyTable;
Will null out the zero.
If you aren't familiar with IsNull or NullIf (most people know IsNull, it seems very few know NullIf), you can look those up in Books Online.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
February 23, 2010 at 1:18 pm
GSquared (2/23/2010)
Nest NullIf and IsNull.If your datetime column is called MyDate and it's in table MyTable:
select IsNull(MyDate, 0)
from dbo.MyTable;
That will replace the nulls with 0s (datetime 0 = 1 Jan 1900).
Then:
select min(IsNull(MyDate, 0))
from dbo.MyTable;
Will get 0 if there's a null in the list.
And:
select NullIf(Min(IsNull(MyDate, 0)), 0)
from dbo.MyTable;
Will null out the zero.
If you aren't familiar with IsNull or NullIf (most people know IsNull, it seems very few know NullIf), you can look those up in Books Online.
I considered this simpler solution too. However the OP didn't mentioned the data type to be used (remember this is a SQL2K8 forum):
DECLARE @table TABLE (d date)
INSERT INTO @table
VALUES (GETDATE()), (GETDATE() + 10), (GETDATE() - 50000), (NULL)
select NullIf(Min(IsNull(d, 0)), 0)
from @table;
results in 'Operand type clash: int is incompatible with date'
DECLARE @table TABLE (d datetime)
INSERT INTO @table
VALUES (GETDATE()), (GETDATE() + 10), (GETDATE() - 50000), (NULL)
select NullIf(Min(IsNull(d, 0)), 0)
from @table;
results in '1873-04-02 21:14:13.823'. Remember that 0 is the numeric representation of 1/1/1900. It all depends on the range of dates being processed. I used -2147483648 as the numeric representation of a NULL date to be on the safe side.
Peter
February 23, 2010 at 2:05 pm
Makes sense.
You posted while I was writing my answer, so I missed your post completely.
As a second thought, it might be easier to select top 1 and order by the date column.
I just tested that on DateTime, SmallDateTime, DateTime2, and Date, and it worked in all cases.
Would depend on what else you need it for.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
February 23, 2010 at 2:32 pm
GSquared (2/23/2010)
Makes sense....
As a second thought, it might be easier to select top 1 and order by the date column.
Makes sense too. But looking at the query plan of the simple examples/data in this thread the top 1/order by date requires a sort which makes it more expensive. An index on the date column will changes that I suppose.
Peter
February 23, 2010 at 3:09 pm
Thanks guys. This is what I finally got to work (I'm using DATE datatypes)
NULLIF(MIN(ISNULL(MyDateField, '1/1/1900')), '1/1/1900')
I was getting "int" conversion errors using values other than '1/1/1900'. I actually declared a variable for '1/1/1900' and used that versus multiple literals.
February 25, 2010 at 6:24 am
milzs (2/23/2010)
Thanks guys. This is what I finally got to work (I'm using DATE datatypes)
NULLIF(MIN(ISNULL(MyDateField, '1/1/1900')), '1/1/1900')
I was getting "int" conversion errors using values other than '1/1/1900'. I actually declared a variable for '1/1/1900' and used that versus multiple literals.
Makes sense.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
February 25, 2010 at 6:52 am
Most of the methods shown are problematic - some because they're not guaranteed to work, others because performing an aggregate over a function is always bad news for performance. TOP with an ORDER BY is probably reliable - if it is documented somewhere that NULLs always sort first. I can't say I'd be entirely happy even with that though - it doesn't seem very nice at all to rely on the sort order in that way.
The fact is that the built-in aggregates ignore NULLs, so you have at least three choices:
1. Avoid NULLs
2. Write the code to handle NULLs explicitly and obviously, an example is below
3. Write a dozen or so lines in your favourite .NET language to create a custom aggregate named something like NULL_MIN()
SELECT date_value =
CASE
WHEN EXISTS (SELECT * FROM @Table T WHERE d IS NULL) THEN NULL
ELSE MIN(d)
END
FROM @Table;
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
February 26, 2010 at 6:55 am
This seems a bit more efficient:
SELECT
date_value = CASE WHEN (COUNT(*) > COUNT(date)) THEN NULL ELSE MIN(date) END
FROM test
🙂
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
February 26, 2010 at 8:17 am
I just performed this test:
SET NOCOUNT ON;
CREATE TABLE #T (
ID INT IDENTITY PRIMARY KEY,
Section INT NOT NULL,
DT DATE);
INSERT INTO #T (Section, DT)
SELECT N1.Number, DATEADD(day, CHECKSUM(NEWID())%36525, '1/1/2000')
FROM Common.dbo.Numbers N1
CROSS JOIN Common.dbo.Numbers N2
WHERE N1.Number BETWEEN 1 AND 1000
AND N2.Number BETWEEN 1 AND 1000;
;WITH CTE AS
(SELECT DT, ROW_NUMBER() OVER (ORDER BY NEWID()) AS Row
FROM #T)
UPDATE CTE
SET DT = NULL
WHERE Row <= 100;
SET STATISTICS TIME ON;
SELECT NULLIF(MIN(ISNULL(DT,'1/1/1900')), '1/1/1900')
FROM #T AS T
GROUP BY Section;
SELECT CASE
WHEN EXISTS (SELECT * FROM #T AS T2 WHERE DT IS NULL AND T2.Section = T1.Section) THEN NULL
ELSE MIN(T1.DT)
END
FROM #T T1
GROUP BY T1.Section;
SELECT CASE
WHEN (COUNT(*) > COUNT(DT)) THEN NULL
ELSE MIN(DT)
END
FROM #T AS T1
GROUP BY Section;
The results are:
SQL Server Execution Times:
CPU time = 374 ms, elapsed time = 196 ms.
Warning: Null value is eliminated by an aggregate or other SET operation.
SQL Server Execution Times:
CPU time = 3761 ms, elapsed time = 3626 ms.
Warning: Null value is eliminated by an aggregate or other SET operation.
SQL Server Execution Times:
CPU time = 453 ms, elapsed time = 248 ms.
The first, simplest solution, is also the fastest.
As an afterthought, I also tested:
;WITH CTE AS
(SELECT DT, ROW_NUMBER() OVER (PARTITION BY Section ORDER BY DT) AS Row
FROM #T T1)
SELECT DT
FROM CTE
WHERE Row = 1;
SQL Server Execution Times:
CPU time = 1904 ms, elapsed time = 1462 ms.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
February 26, 2010 at 10:03 am
I have performed the test too with similar results. All solutions benefit from an index on columns Section and DT. Paul's solution benefits the most from the index and becomes comparable to Gsquared's solution (in terms of CPU time, the number of logical reads is more than twice as high).
Peter
February 26, 2010 at 7:51 pm
Huh, thats odd. When I tested it on my system this morning, with a million rows, no GROUP BY (did I miss that requirement?) and an index on the date column, mine was the fastest, though not by much.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
February 26, 2010 at 8:21 pm
Before moving on to performance, I'd just like to get correctness out of the way 😛
The NULLIF solution bothers me (not Peter's refined version):
DECLARE @T
TABLE (section INTEGER NOT NULL, date_time DATETIME NULL);
INSERT @T (section, date_time) VALUES (1, '19000101');
INSERT @T (section, date_time) VALUES (1, '20010101');
INSERT @T (section, date_time) VALUES (2, '19000101');
INSERT @T (section, date_time) VALUES (2, NULL);
INSERT @T (section, date_time) VALUES (2, '20010101');
SELECT section,
date_time= NULLIF(MIN(ISNULL(date_time, '19000101')), '19000101')
FROM @T AS T1
GROUP BY T1.section;
Results:
section date_time
1 NULL
2 NULL
Not quite right 😀
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
February 26, 2010 at 8:35 pm
On to Performance then.
Result: It depends :w00t:
Well it might. The original test rig had one row per section, which is probably not the most common scenario in the real world.
Most tables like this would have a many more than one row per group, and many fewer groups.
All the solutions that scan the index once should have broadly similar performance.
Other solutions might be an order of magnitude faster (or slower) depending on how many groups there are, how many rows per group, and how common a NULL date_time value is.
For example, on a test rig of one million rows, with fifty sections, and ten thousand randomly-placed NULL values:
Test 1: NULLIF
Table 'Test'. Scan count 1, logical reads 3231
CPU time = 421 ms, elapsed time = 439 ms.
Test 2: EXISTS with MIN
Table 'Test'. Scan count 51, logical reads 3417
CPU time = 469 ms, elapsed time = 930 ms.
Test 3: EXISTS with TOP
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Test'. Scan count 51, logical reads 3417
CPU time = 281 ms, elapsed time = 619 ms.
Test 4: DISTINCT with TOP
Table 'Test'. Scan count 51, logical reads 3417
CPU time = 266 ms, elapsed time = 276 ms.
Test 5: COUNTs with MIN
Table 'Test'. Scan count 1, logical reads 3231
CPU time = 484 ms, elapsed time = 484 ms.
Test 6: ROW_NUMBER
Table 'Test'. Scan count 1, logical reads 3231
CPU time = 453 ms, elapsed time = 446 ms.
Test 7: Recursive CTE with APPLY
Table 'Worktable'. Scan count 2, logical reads 301
Table 'Test'. Scan count 101, logical reads 303
CPU time = 16 ms, elapsed time = 2 ms.
Check out the result for test 7.
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply