Null values in SQL Server have long been a cause of grief for many query designers, as they realize along the way that NULL values require special handling. The main cause of confusion, I would say, is thinking that NULL means blank or empty. In this article we summarize the semantics of NULL and provide some examples of its use and its consequences for query designers.
The nullability of a column determines whether the rows in the table can contain a null value for that column. For example, in the following table-creation script, column FirstName does not allow NULLs, but LastName does:
CREATE TABLE [dbo].[Person]( [FirstName] [varchar](50) NOT NULL, [LastName] [varchar](50) NULL ) ON [PRIMARY]
A NULL value in SQL Server is (by design) meant to indicate an unknown or undefined value, and as such, is different from any other value, including another NULL. Let's look at some examples to see what this all means.
Any comparisons involving NULLs will have a NULL result, like these ones:
NULL <> NULL NULL > 5
Also, any non-logical expressions involving NULLs have an unkown, or NULL, result. For example:
NULL + 4
returns a result of NULL.
AND and OR tables with NULL/UNKNOWN values
This next table summarizes the effect of NULLs in AND expressions:
AND | True | False | NULL |
True | True | False | NULL |
False | False | False | False |
NULL | NULL | False | NULL |
For example, the result of the expression
NULL and True
is NULL. But note that
NULL and False
is False, since for an entire AND expression to result in True all of its operands must be true, and in this case we are certain that at least of them is False.
This second table summarizes the effect of NULLs in OR expressions:
OR | True | False | NULL |
True | True | True | True |
False | True | False | NULL |
NULL | True | NULL | NULL |
For example, the result of the expression
NULL or False
is NULL. But note that
NULL or True
is True, since for an entire OR expression to result in True at least one of its operands must be true, and in this case we are certain that at least of them is True.
How to account for NULL values in expressions
If you write expressions using columns that may have NULL values, you must explicitly program for the possibility of NULLs. The way to do this is with the IS NULL and IS NOT NULL operators and with the ISNULL and COLASCE functions. For example, if you want to explicitly test whether some value is NULL, the correct way of doing it is
MyValue IS NULL
The incorrect way would be
MyValue = NULL
because, as we mentioned above, any logical expression involving NULLs results in NULL, which is not True. So, if MyValue were in fact NULL, this second expression would not be true!
The ISNULL function takes two arguments. It returns the value of the first one if it is not NULL, and the value of the second one if the first one is NULL. For example,
ISNULL('jane', 'john')
returns 'jane', and
ISNULL(NULL, 'john')
returns 'john'.
Finally, the COALESCE function takes one or more parameters and returns the first one that is not null. For instance:
COALESCE('john', 'jane', NULL)
evaluates to 'john', and
COALESCE(NULL, 'john', 'jane')
also evaluates to 'john'. To see the relationship between ISNULL and COALESCE, note that these two expressions are equivalent:
COALESCE(@MyVariable, 0) ISNULL(@MyVariable, 0)
because if @MyVariable is NULL, both will evaluate to 0, and if it's not, both will evaluate to the value of @MyVariable.
A more concrete example
Let's do a concrete example, to see the effects of all of these rules. Consider this table:
CREATE TABLE [dbo].[tbl]( [number] [int] NULL, [string] [varchar](50) NULL, [boolean] [bit] NULL ) ON [PRIMARY]
Note that all of the columns allow NULLs. Now we this data in the table:
number | string | boolean |
5 | jane | 1 |
3 | NULL | |
NULL | NULL | NULL |
Let's say we wanted to know all of the rows in which the number column is null. Consider this query to accomplish this goal:
select * from tbl where number = null
Let's think: any logical expression involving NULL results in NULL, which is not True. Thus, the WHERE clause fails to be True for every row, and our result is 0 rows. A correct query would be:
select * from tbl where number is null
which results in:
number | string | boolean |
NULL | NULL | NULL |
as expected.
As a second example, let's say we wanted to know all of the people whose name is not 'jane'. One would think to do it like this:
select * from tbl wherestring <> 'jane'
but it returns
number | string | boolean |
3 | NULL |
This result is missing the row where string is NULL, because the expression
NULL <> 'jane'
evaluates to NULL . A correct query would be:
select * from tbl where (string is NULL) or(string <> 'jane')
but which could be shortened using the ISNULL function:
select * from tbl where ISNULL(string,'') <>'jane'
Now one last example, using the boolean column. If we wanted to know all of the rows with a True value in this column, this would be correct:
select * from tbl whereboolean = 1
but don’t be tricked into thinking that this query gives you all of the non-true rows:
select * from tbl whereboolean = 0
Since the column allows NULLs, this would be a way of accomplishing this task:
select * from tbl where isnull(boolean,1)= 0
In summary: if a column allows NULL values, your queries need to account for them.
Keys and indexing
When defining table schemas and deciding whether or not to have columns that allow NULL values, you need to take into consideration the keys and indexes you will need. This list summarizes how NULL values impact key and index definitions in SQL Server 2008:
- Nulls cannot be used in PK columns.
- Nulls can be used in FK columns. (We have quite often encountered designs where we must allow NULLs in FK columns to indicate an unknown or not-yet-established relationship.)
- Nulls can be used in unique indexes and they count as a unique value. That is, one row, but not two, may have NULL as its value for the indexed column.
- Nulls can be used in non-unique indexes.
Design considerations
One way to avoid worrying about NULLs is never to use them, always declaring columns as not allowing NULLs and designating default values for "empty" or "unknown". This will save you keystrokes, especially when you want to check whether a column does not have a certain value. However, you will anyway probably have to write logic for when columns have the designated default values.
One last note
The handling of NULLs as we have summarized follows the ANSI standard. However, Transact-SQL offers an extension for null processing: If the option ANSI_NULLS is set to OFF, comparisons between nulls, such as NULL = NULL, evaluate to TRUE.
References
NULL Values (http://msdn.microsoft.com/en-us/library/ms191504.aspx), SQL Server 2008 Online Help.
About the author
Adolfo J. Socorro, Ph.D., leads eSolutions, a firm specialized in developing custom-made software.