NULL Versus NULL
In one of the first articles I wrote for SQL Server Central, I talked about SQL NULLs
and three-valued logic (Four Rules For NULL).
In this article I take it all back...
No, not really, but stay tuned as we talk about the darker side of ANSI NULLs.
The Original Four Rules
The original four rules I proposed for NULL-handling are all reproduced here in
Figure 1.
Figure 1. The original "Four Rules"
The rules are handy guides for handling NULLs in T-SQL, but NULL-handling isn't always
so cut-and-dried. In this article we'll take a look at where Rule #2 - the basis of the ANSI
SQL three-valued logic we discussed in the first article - breaks down.
No Two NULLs Are Created Equal...
If you recall from the original Four Rules
article, the basis of ANSI SQL three-valued logic (3VL) is that NULL is not equal to
anything else. It is not less than, greater than, or even unequal to anything else either. Because NULL
is not an actual value, but rather a placeholder for an unknown value, all comparisons with NULL result in UNKNOWN.
Even comparing a NULL to another NULL is just comparing two placeholders for unknown values,
so the result again is UNKNOWN.
Tip: In reference to NULL comparisons, be sure to keep Rule #3 in mind. Microsoft has deprecated SET ANSI_NULLS, and according to Books Online it will be removed in a future version of SQL Server. If you currently have code that relies on SET ANSI_NULLS OFF, it might be a good time to start considering what it will take to make that code ANSI SQL-92 NULL-compliant. |
We even generated some samples to demonstrate this. One of these samples is reproduced
here in Listing 1.
Listing 1. Demonstrating that NULL is not equal to NULL
SET ANSI_NULLS ON DECLARE @val CHAR(4) SET @val = NULL SET ANSI_NULLS ON IF @val = NULL PRINT 'TRUE' ELSE IF NOT(@val = NULL) PRINT 'FALSE' ELSE PRINT 'UNKNOWN' |
NULL: Confusing the Smartest People in the World Since (at least) 1986
If all this doesn't hit home immediately, don't take it too hard. Even Microsoft seems
to have difficulty sorting through it. Point in fact: SQL Server 2005 Books Online
(BOL) still has bad information concerning NULL comparisons. In fact,
as of the time of this writing I counted no less than ten pages in BOL that stated the result of a comparison
with NULL is either FALSE or NULL. Only two pages that I found
(the pages describing "IS [NOT] NULL" and "SET
ANSI_NULLS") actually got it right. Fortunately we know better: the result of comparing
NULL with anything is UNKNOWN.
So why all the confusion? Most likely it's because in queries only rows for which the
WHERE clause condition evaluates to TRUE are returned. Rows that evaluate
to FALSE or UNKNOWN are not returned. For some folks this might seem to indicate FALSE and UNKNOWN are
equivalent. They're not, as we'll see in Listings 2 and 3.
Listing 2. Sample SELECT with NULL comparison in the WHERE clause
SELECT TOP 100 * FROM sys.syscomments WHERE id = NULL |
The result above of course returns no rows. According to Books Online this is because
"id = NULL" evaluates to FALSE. If this is true, however, Listing 3 below should
return all rows.
Listing 3. The "opposite" of Listing 2
SELECT TOP 100 * FROM sys.syscomments WHERE NOT(id = NULL) |
If "id = NULL" really evaluates to FALSE for every row, then
"NOT(id = NULL)" should evaluate to TRUE for every row. Of course it doesn't,
and again no rows are returned. And we already know the reason: it's because
"id = NULL" evaluates to UNKNOWN, and "NOT(id = NULL)"
also evaluates to UNKNOWN.
Microsoft has already been notified of this problem in BOL and hopefully it will be
fixed soon.
...All NULLs Are Created Not Distinct
So now we've firmly established that comparisons with NULL never evaluate to TRUE
or FALSE, that NULL is never equal to NULL, and that NULL comparisons always result in
UNKNOWN... Now it's time to list the exceptions. (You didn't think it would be that simple did you?)
I really wanted to call this section All NULLs Are Created Equal, but that just happens to be wrong.
In order to simulate NULL equality, and to keep from contradicting themselves
in the process, the ANSI SQL-92 standard decreed that two NULL values
should be considered "not distinct". The definition of not distinct in the
ANSI standard includes any two values that return TRUE for an equality test (e.g., 3 = 3, 4 = 4, etc.), or any
two NULLs.
This simulated NULL equality is probably most used in the
GROUP BY clause, which groups all NULL values into
a single partition. SQL-92 defines a partition as a grouping of not distinct values.
Listing 4 below shows GROUP BY handling of NULL.
Listing 4. GROUP BY and NULL
CREATE TABLE #test (val INT); INSERT INTO #test (val) VALUES (NULL); INSERT INTO #test (val) VALUES (NULL); INSERT INTO #test (val) VALUES (NULL); INSERT INTO #test (val) VALUES (NULL); INSERT INTO #test (val) VALUES (1); INSERT INTO #test (val) VALUES (2); INSERT INTO #test (val) VALUES (3); INSERT INTO #test (val) VALUES (3); SELECT COUNT(*) AS num, val FROM #test GROUP BY val; DROP TABLE #test; |
Figure 2 shows the result.
Figure 2. Result of GROUP BY with NULLs
Notice the NULL values are all treated as not distinct by
GROUP BY, and are all grouped together. Unique constraints also
use the ANSI definition of not distinct as opposed to equal since
you can only insert one NULL in a column with a unique constraint. Consider
Listing 5 which shows this.
Listing 5. Unique Constraint and NULL
CREATE TABLE #test (val INT CONSTRAINT unq_val UNIQUE); INSERT INTO #test (val) VALUES (NULL); INSERT INTO #test (val) VALUES (NULL); DROP TABLE #test; |
This example throws an exception when it tries to insert the second NULL
in the val column:
(1 row(s) affected) Msg 2627, Level 14, State 1, Line 4 Violation of UNIQUE KEY constraint 'unq_val'. Cannot insert duplicate key in object 'dbo.#test'. The statement has been terminated. |
Other statements and operators that use the concept of not distinct to simulate
NULL equality include:
- PARTITION BY clause of OVER()
- UNION operator
- DISTINCT keyword
- INTERSECT operator
- EXCEPT operator
NULLs Flock Together
The ORDER BY clause in SELECT
queries places all NULL values together when it orders
your results. SQL Server treats NULLs as the "lowest possible
values" in your results. What this means is NULL will always come before
your non-NULL results when you sort in ascending order, and after your
non-NULL results when you sort in descending order. Listing 6
shows ORDER BY and NULL in action.
Listing 6. ORDER BY and NULL
CREATE TABLE #test (val INT); INSERT INTO #test (val) VALUES (NULL); INSERT INTO #test (val) VALUES (NULL); INSERT INTO #test (val) VALUES (1); INSERT INTO #test (val) VALUES (2); SELECT val FROM #test ORDER BY val; DROP TABLE #test; |
The results are shown in Figure 3.
Figure 3. Result of ORDER BY with NULL
The same holds true for the ORDER BY clause of OVER,
which is used to order your results when used with ranking functions like ROW_NUMBER
and aggregate functions like SUM.
And Now For Something Entirely Different
Now that we've established the "exceptions" for NULL comparisons,
let's look at something entirely different. When a NULL value is inserted into a
nullable column with a check constraint that doesn't check for IS NOT NULL, something
strange seems to happen. Consider Listing 7.
Listing 7. Check constraints and NULL
CREATE TABLE #test (val INT CONSTRAINT ck_val CHECK(val AND val = 0 AND val > 0)); INSERT INTO #test (val) VALUES (NULL); INSERT INTO #test (val) VALUES (NULL); SELECT val FROM #test ORDER BY val; DROP TABLE #test; |
In this example we've added a check constraint to the sample table that
enforces the following rule:
- The value inserted must be less than zero
- *and* the value inserted must be equal to zero
- *and* the value inserted must be greater than zero
You and I know from 4th grade math (remember number lines?) that there is no value
that can ever fulfill these requirements. No value can be less than zero, equal to
zero, and greater than zero all at the same time. Also based on what we've already
talked about, any comparisons with NULL result in UNKNOWN.
You might expect an attempt to insert any value into the table would fail.
However, check constraints operate under a different set of rules from the SELECT,
INSERT, UPDATE, and DELETE
DML statements. The DML statements, when combined with a WHERE clause, perform
their action only on rows for which the WHERE clause condition evaluates to
TRUE. The DML statements will exclude rows that evaluate to FALSE or UNKNOWN.
Check constraints, on the other hand, cause INSERT and
UPDATE statements to fail only if the check constraint condition evaluates
to FALSE. This means that the checks will succeed if the condition evaluates to
either UNKNOWN or TRUE.
Of course you'd probably never create a check constraint as restrictive as the one in the example, and
if you want to prevent NULLs from being inserted into a column, either declare
the column NOT NULL or add "val IS NOT NULL"
as a check constraint condition. Don't expect a check constraint that evaluates to UNKNOWN to
cause an INSERT or UPDATE to fail.
Conclusion
NULL handling hasn't gotten any easier since the Four Rules
article, but it helps to know the exceptions as well as the rules. This article was written to demonstrate
those common exceptions.
Michael Coles is a regular contributor to SQL Server Central, and author of the upcoming book Pro T-SQL 2005 Programmer's Guide from Apress (in bookstores everywhere April 2007). |