One of the fun facts about SQL Server and the relation model is the
whole concept of three valued logic. Now I’m not breaking any new
ground here I am just touching on something that trips people up when
they expect a result and don’t get it due to the magic of NULL’s. To be
honest, I’m no exception to falling into the unknown from time to time.
Codd laid out 12 fundamental rules of what a relational database system should conform to if it is to be considered truly relational.
Rule 3: Systematic treatment of null values:
The
DBMS must allow each field to remain null (or empty). Specifically, it
must support a representation of "missing information and inapplicable
information" that is systematic,
distinct from all regular values (for example, "distinct from zero or
any other number", in the case of numeric values), and independent of data type. It is also implied that such representations must be manipulated by the DBMS in a systematic way.
This rule, above all others has probably caused me the most heartburn over the years.
I’ve
spent more time that I like to admit reading papers and books from
pioneers of the relational model, E.F. Codd, C.J. Date. One of the
great debates that carried on until the passing of Codd was over NULL
and three valued logic in general. To give you an idea of how big and
problematic NULL’s are, It’s been put forth that maybe we need to get
rid of them all together or move deeper into the rabbit hole and make
it 4 valued logic breaking NULL into different kinds of place holders
for unknown.
Understand that every piece of data falls into a
domain of some sort. 0 is a integer type. ‘’ is a empty string. NULL
isn’t in any domain, or value type. It doesn’t represent something at
all it is a place older period.
I’ve heard 3 valued logic described as yes/no/maybe but that isn’t accurate, it is true/false/UNKNOWN.
So
the only logical thing that can happen to UNKNOWN is unknown. What’s
even worse is UNKNOWN technically can’t be equal to UNKNOWN or NULL =
NULL.
How do you know they are equal if they are both unknown?
For example:
select 1 where NULL = NULL
returns nothing since NULL can’t be equal to anything including NULL we don’t get a NULL back or the 1 back we tried to select.
select 6227 * 453 / 238 + NULL
returns NULL
which makes since on the surface to almost everyone I work with.
select NULL / 0
returns NULL
To some folks this is confusing in a traditional programming since
anything divided by zero gives us an error of cannot divide by zero.
Since NULL is the place holder for UNKNOWN there is no way to evaluate the statement other than UNKNOWN or NULL!
This must also carry through for string manipulation as well.
For example:
select 'here' + 'is ' + NULL
returns NULL.
Again it is the old how can you concatenate something to the unknown problem.
Now with all this in our little busy heads we finally think we
understand the problem in it’s fullness, but we don’t (or I always
don’t).
Where things can get really sticky is in any kind of aggregate
situation SUM(), AVG(). Generally, all aggregations have a NULL
elimination step built into them.
So lets say we have a table that looks like this:
Col001 | Col002 |
100 | 100 |
200 | 200 |
300 | 300 |
NULL | 0 |
(this isn’t an article on table design so don’t sweat the lack of a key or the duplicate data in both columns)
create table myNumbers
(
Col001 int,
Col002 int
)
go
insert into myNumbers (Col001,Col002) VALUES (100,100)
insert into myNumbers (Col001,Col002) VALUES (200,200)
insert into myNumbers (Col001,Col002) VALUES (300,300)
insert into myNumbers (Col001,Col002) VALUES (NULL,300)
select avg(Col001) from myNumbers
select avg(Col002) from myNumbers
We get:
-----------
200
Warning: Null value is eliminated by an aggregate or other SET operation.
(1 row(s) affected)
-----------
225
(1 row(s) affected)
What happens when the evaluation of the aggregation occurs there is
no way to evaluate the NULL so that row is dropped and all the sudden
you numbers look horribly wrong.
If I hadn’t put this to output to text we may have merrily trucked
along and accepted that 200 was what we were expecting and not the 255
we really wanted due to the treatment of the NULL.
Since it is a warning and not an error our program won’t complain
ether it will be more than happy to accept the result since it is valid
one.
The only exception to this in general is using COUNT() since you can
count a place holder just like you would a value type it isn’t
evaluating the data held just that a row exists whether we know what is
is in it or not.
I’ve only just scratched the surface of the unknown with this little
post. I haven’t covered grouping or JOIN’s and I may in a later post.
My goal is simply to remind myself that dealing with NULL is never a trivial matter.
-Wes