Any way to have MIN take NULLS into account?

Question

Post reply

Any way to have MIN take NULLS into account?

lionfan91

SSCrazy Eights

Points: 8794
More actions
February 23, 2010 at 12:15 pm

#219437

I know that by default, the MIN function ignores any NULL values. Is there any way to cause it to not ignore NULLs? Basically I'm trying to find a MIN date from a rowset. However, if any of the dates are NULL I want the MIN value to be NULL. Only if all dates are not NULL do I want an actual date value.
Possible?

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

matzer SSCrazy Eights Points: 9727 More actions · Answer 1

DECLARE @table TABLE (d date)

INSERT INTO @table

VALUES (GETDATE()), (GETDATE() + 10), (GETDATE() - 50000), (NULL)

SELECT

DATEADD(dd, 0, NULLIF(MIN(COALESCE(DATEDIFF(dd, 0, d), -2147483648)), -2147483648))

FROM

@table

GSquared SSC Guru Points: 260824 More actions · Answer 2

Nest NullIf and IsNull.

If your datetime column is called MyDate and it's in table MyTable:

select IsNull(MyDate, 0)

from dbo.MyTable;

That will replace the nulls with 0s (datetime 0 = 1 Jan 1900).

Then:

select min(IsNull(MyDate, 0))

from dbo.MyTable;

Will get 0 if there's a null in the list.

And:

select NullIf(Min(IsNull(MyDate, 0)), 0)

from dbo.MyTable;

Will null out the zero.

If you aren't familiar with IsNull or NullIf (most people know IsNull, it seems very few know NullIf), you can look those up in Books Online.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

matzer SSCrazy Eights Points: 9727 More actions · Answer 3

GSquared (2/23/2010)
Nest NullIf and IsNull.
If your datetime column is called MyDate and it's in table MyTable:
select IsNull(MyDate, 0)
from dbo.MyTable;
That will replace the nulls with 0s (datetime 0 = 1 Jan 1900).
Then:
select min(IsNull(MyDate, 0))
from dbo.MyTable;
Will get 0 if there's a null in the list.
And:
select NullIf(Min(IsNull(MyDate, 0)), 0)
from dbo.MyTable;
Will null out the zero.
If you aren't familiar with IsNull or NullIf (most people know IsNull, it seems very few know NullIf), you can look those up in Books Online.

I considered this simpler solution too. However the OP didn't mentioned the data type to be used (remember this is a SQL2K8 forum):

DECLARE @table TABLE (d date)

INSERT INTO @table

VALUES (GETDATE()), (GETDATE() + 10), (GETDATE() - 50000), (NULL)

select NullIf(Min(IsNull(d, 0)), 0)

from @table;

results in 'Operand type clash: int is incompatible with date'

DECLARE @table TABLE (d datetime)

INSERT INTO @table

VALUES (GETDATE()), (GETDATE() + 10), (GETDATE() - 50000), (NULL)

select NullIf(Min(IsNull(d, 0)), 0)

from @table;

results in '1873-04-02 21:14:13.823'. Remember that 0 is the numeric representation of 1/1/1900. It all depends on the range of dates being processed. I used -2147483648 as the numeric representation of a NULL date to be on the safe side.

Peter

GSquared SSC Guru Points: 260824 More actions · Answer 4

Makes sense.

You posted while I was writing my answer, so I missed your post completely.

As a second thought, it might be easier to select top 1 and order by the date column.

I just tested that on DateTime, SmallDateTime, DateTime2, and Date, and it worked in all cases.

Would depend on what else you need it for.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

matzer SSCrazy Eights Points: 9727 More actions · Answer 5

GSquared (2/23/2010)
Makes sense.
...
As a second thought, it might be easier to select top 1 and order by the date column.

Makes sense too. But looking at the query plan of the simple examples/data in this thread the top 1/order by date requires a sort which makes it more expensive. An index on the date column will changes that I suppose.

Peter

lionfan91 SSCrazy Eights Points: 8794 More actions · Answer 6

Thanks guys. This is what I finally got to work (I'm using DATE datatypes)

NULLIF(MIN(ISNULL(MyDateField, '1/1/1900')), '1/1/1900')

I was getting "int" conversion errors using values other than '1/1/1900'. I actually declared a variable for '1/1/1900' and used that versus multiple literals.

GSquared SSC Guru Points: 260824 More actions · Answer 7

milzs (2/23/2010)
Thanks guys. This is what I finally got to work (I'm using DATE datatypes)
NULLIF(MIN(ISNULL(MyDateField, '1/1/1900')), '1/1/1900')
I was getting "int" conversion errors using values other than '1/1/1900'. I actually declared a variable for '1/1/1900' and used that versus multiple literals.

Makes sense.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Paul White SSC Guru Points: 150467 More actions · Answer 8

Most of the methods shown are problematic - some because they're not guaranteed to work, others because performing an aggregate over a function is always bad news for performance. TOP with an ORDER BY is probably reliable - if it is documented somewhere that NULLs always sort first. I can't say I'd be entirely happy even with that though - it doesn't seem very nice at all to rely on the sort order in that way.

The fact is that the built-in aggregates ignore NULLs, so you have at least three choices:

1. Avoid NULLs

2. Write the code to handle NULLs explicitly and obviously, an example is below

3. Write a dozen or so lines in your favourite .NET language to create a custom aggregate named something like NULL_MIN()

SELECT date_value =

CASE

WHEN EXISTS (SELECT * FROM @Table T WHERE d IS NULL) THEN NULL

ELSE MIN(d)

END

FROM @Table;

Paul

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

RBarryYoung SSC Guru Points: 143329 More actions · Answer 9

This seems a bit more efficient:

SELECT

date_value = CASE WHEN (COUNT(*) > COUNT(date)) THEN NULL ELSE MIN(date) END

FROM test

🙂

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

GSquared SSC Guru Points: 260824 More actions · Answer 10

I just performed this test:

SET NOCOUNT ON;

CREATE TABLE #T (

ID INT IDENTITY PRIMARY KEY,

Section INT NOT NULL,

DT DATE);

INSERT INTO #T (Section, DT)

SELECT N1.Number, DATEADD(day, CHECKSUM(NEWID())%36525, '1/1/2000')

FROM Common.dbo.Numbers N1

CROSS JOIN Common.dbo.Numbers N2

WHERE N1.Number BETWEEN 1 AND 1000

AND N2.Number BETWEEN 1 AND 1000;

;WITH CTE AS

(SELECT DT, ROW_NUMBER() OVER (ORDER BY NEWID()) AS Row

FROM #T)

UPDATE CTE

SET DT = NULL

WHERE Row <= 100;

SET STATISTICS TIME ON;

SELECT NULLIF(MIN(ISNULL(DT,'1/1/1900')), '1/1/1900')

FROM #T AS T

GROUP BY Section;

SELECT CASE

WHEN EXISTS (SELECT * FROM #T AS T2 WHERE DT IS NULL AND T2.Section = T1.Section) THEN NULL

ELSE MIN(T1.DT)

END

FROM #T T1

GROUP BY T1.Section;

SELECT CASE

WHEN (COUNT(*) > COUNT(DT)) THEN NULL

ELSE MIN(DT)

END

FROM #T AS T1

GROUP BY Section;

The results are:

SQL Server Execution Times:
CPU time = 374 ms, elapsed time = 196 ms.
Warning: Null value is eliminated by an aggregate or other SET operation.
SQL Server Execution Times:
CPU time = 3761 ms, elapsed time = 3626 ms.
Warning: Null value is eliminated by an aggregate or other SET operation.
SQL Server Execution Times:
CPU time = 453 ms, elapsed time = 248 ms.

The first, simplest solution, is also the fastest.

As an afterthought, I also tested:

;WITH CTE AS

(SELECT DT, ROW_NUMBER() OVER (PARTITION BY Section ORDER BY DT) AS Row

FROM #T T1)

SELECT DT

FROM CTE

WHERE Row = 1;

SQL Server Execution Times:
CPU time = 1904 ms, elapsed time = 1462 ms.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

matzer SSCrazy Eights Points: 9727 More actions · Answer 11

I have performed the test too with similar results. All solutions benefit from an index on columns Section and DT. Paul's solution benefits the most from the index and becomes comparable to Gsquared's solution (in terms of CPU time, the number of logical reads is more than twice as high).

Peter

RBarryYoung SSC Guru Points: 143329 More actions · Answer 12

Huh, thats odd. When I tested it on my system this morning, with a million rows, no GROUP BY (did I miss that requirement?) and an index on the date column, mine was the fastest, though not by much.

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Paul White SSC Guru Points: 150467 More actions · Answer 13

Before moving on to performance, I'd just like to get correctness out of the way 😛

The NULLIF solution bothers me (not Peter's refined version):

DECLARE @T

TABLE (section INTEGER NOT NULL, date_time DATETIME NULL);

INSERT @T (section, date_time) VALUES (1, '19000101');

INSERT @T (section, date_time) VALUES (1, '20010101');

INSERT @T (section, date_time) VALUES (2, '19000101');

INSERT @T (section, date_time) VALUES (2, NULL);

INSERT @T (section, date_time) VALUES (2, '20010101');

SELECT section,

date_time= NULLIF(MIN(ISNULL(date_time, '19000101')), '19000101')

FROM @T AS T1

GROUP BY T1.section;

Results:

section date_time

1 NULL

2 NULL

Not quite right 😀

Paul

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Paul White SSC Guru Points: 150467 More actions · Answer 14

On to Performance then.

Result: It depends :w00t:

Well it might. The original test rig had one row per section, which is probably not the most common scenario in the real world.

Most tables like this would have a many more than one row per group, and many fewer groups.

All the solutions that scan the index once should have broadly similar performance.

Other solutions might be an order of magnitude faster (or slower) depending on how many groups there are, how many rows per group, and how common a NULL date_time value is.

For example, on a test rig of one million rows, with fifty sections, and ten thousand randomly-placed NULL values:

Test 1: NULLIF

Table 'Test'. Scan count 1, logical reads 3231

CPU time = 421 ms, elapsed time = 439 ms.

Test 2: EXISTS with MIN

Table 'Test'. Scan count 51, logical reads 3417

CPU time = 469 ms, elapsed time = 930 ms.

Test 3: EXISTS with TOP

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Test'. Scan count 51, logical reads 3417

CPU time = 281 ms, elapsed time = 619 ms.

Test 4: DISTINCT with TOP

Table 'Test'. Scan count 51, logical reads 3417

CPU time = 266 ms, elapsed time = 276 ms.

Test 5: COUNTs with MIN

Table 'Test'. Scan count 1, logical reads 3231

CPU time = 484 ms, elapsed time = 484 ms.

Test 6: ROW_NUMBER

Table 'Test'. Scan count 1, logical reads 3231

CPU time = 453 ms, elapsed time = 446 ms.

Test 7: Recursive CTE with APPLY

Table 'Worktable'. Scan count 2, logical reads 301

Table 'Test'. Scan count 101, logical reads 303

CPU time = 16 ms, elapsed time = 2 ms.

Check out the result for test 7.

Paul

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi