Database Design Follies: NULL vs. NOT NULL

  • JediSQL (10/31/2014)

    In most conditional statements, like WHERE clauses and IF statements, the conditional expression will return TRUE if and only if SQL Server considers the assertion provably true. When one of the players in a conditional expression is a NULL, SQL Server says it can not prove anything, and [SQL] returns false.

    That is incorrect, but a common misunderstanding also. SQL does not return "false" from a null comparison, it returns null ("unknown"). SQL does not know if it is false, just like it doesn't know if it is true. Also, NOT false yields true, but NOT NULL does not yield true.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • I'm in the camp of "Nullable columns are a dumb idea" with one glaring exception. If you have a "notes" field that is NORMALLY left empty AND defined as [n]varchar(max) AND null and empty strings are semantically identical for that column, then you might save disk space using it. Certainly there's no harm and perhaps even a shred of documentary benefit.

    For everything else Null is bad because A) it takes a LOT of extra code everywhere to handle the tri-value logic Null brings with it and B) Null actually has AT LEAST three different meanings.

    Null can mean "to be determined" (TBD), "not applicable" (N/A) or "truly unknown (we checked!)". All three meanings are valuable, much more so than the ambiguous null.

    Of course when you decide to use TBD et al you have to carefully consider data domains and the values you use for flagging them (the magic values). Obviously some domains can't support this, bit fields come to mind. Others don't have an "empty" space in the domain for them, such as currency. But in currency's case 0 can at least be semantically identical to null if you squint hard enough.

    Everything else (including dates) generally do have the domain space to spare. Personally, I use 12/30/1899, 12/31/1899, and 1/1/1900 as my flag dates since I have the luxury of knowing any valid date must be > 1/1/1900. (YMMV) As someone pointed out it can make math a bit tricky, but careful definition of the semantic meaning of date ranges eliminates the problem.

    The tinyint domain has its own issues, but you can (usually) define it such that 0, 1, and 2 are reserved since tinyints are mainly used for the same purpose foreign keys are, ie translation of a code into a human-readable label.

    I used to believe Null had a useful purpose (like a date that hasn't been decided yet) but bitter experience has shown me the error of my ways. 😛

  • there is another very negative consequence of allowing nulls. It is a different kind of performance problem, but a very real one in my experience. You may call it "programming performance".

    It is the need to type isnull(mynullablecolumn,somesubstitutionvalue) .... a million times in my code, to avoid problems performing all kinds of selects, joins, arithmetic, etc. etc. etc. in sql

    I am so sick of typing that stupid thing a million times.

  • kenambrose (2/26/2016)


    there is another very negative consequence of allowing nulls. It is a different kind of performance problem, but a very real one in my experience. You may call it "programming performance".

    It is the need to type isnull(mynullablecolumn,somesubstitutionvalue) .... a million times in my code, to avoid problems performing all kinds of selects, joins, arithmetic, etc. etc. etc. in sql

    I am so sick of typing that stupid thing a million times.

    You should never use ISNULL() in a JOIN (or WHERE). That could really hurt your performance.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher (5/21/2015)


    JediSQL (10/31/2014)

    In most conditional statements, like WHERE clauses and IF statements, the conditional expression will return TRUE if and only if SQL Server considers the assertion provably true. When one of the players in a conditional expression is a NULL, SQL Server says it can not prove anything, and [SQL] returns false.

    That is incorrect, but a common misunderstanding also. SQL does not return "false" from a null comparison, it returns null ("unknown"). SQL does not know if it is false, just like it doesn't know if it is true. Also, NOT false yields true, but NOT NULL does not yield true.

    <Slight objection>

    I'm not really into this "SQL does not know" business. If SQL doesn't know if a conditional value is true or not, HOW CAN IT CLEARLY DETERMINE THAT IT IS NOT TRUE?

    Null doesn't mean "unknown".

    There is no value, and the operation that acts upon values therefore cannot subsequently return a value. So likewise, SQL DOES NOT RETURN NULL FROM AN OPERATION THAT WASN'T SUPPLIED OPERANDS. Null is just the representation that SQL DID NOT RETURN A VALUE.

    SQL "knows" its not true, SQL also "knows" its not false, SQL "knows" there is no value because the null bit is set. SQL determines that the column has never been set to a value because the null bit says so. If we didn't have a bit that represents a lack of value, we would have to misappropriate an existing value to represent a lack of value and the operands would both have to be tested for this value before the operation could be attempted to correctly process situations that include the possibility of values not being available.

    Some programmers actually do appropriate a value to represent the lack of value. Say for instance, '1000-01-01' has been picked to represent the information that this value does not exist. Before doing date differences on a pair of date variables, you need to check to make sure both variables are unequal to 1000-01-01 or else your date diff will return nonsense. If either of your dates are '1000-01-01', ie., the chosen representation for "no value has been stored", then your expression has to likewise return something that represents "no date diff has been calculated".

    Its just that in SQL, the null bit is appropriate across all types, otherwise you'd have to pick a value from each type to represent that "the value doesn't actually exist."

    Say someone is still employed. His end employment date doesn't exist because the end of employment doesn't exist. There is no "unknown" here.

    Look, I have no banana. Now, is that banana I don't have ripe or not? Well I'm not going to say "I don't know" right?

    There is no uncertainty. Sure, maybe the absence of value RESULTED from lack of knowledge about an existing object we're trying to represent, but it could have JUST AS EASILY resulted from the certainty that the object doesn't even exist.

    </slight objection>

  • As both a programmer and DBA I got tired of having to deal with NULL in the code side and avoid NULLABLE columns at every turn.

    I do however use a future date to deal with 'unknown' or NULL dates of '9/9/9999'. That for sure is one date I'll never see.

    Thanks for the article.

  • JediSQL (10/31/2014)


    ...

    The one big exception is a CHECK CONSTRAINT. With a CHECK CONSTRAINT the assertion just has to be not provably FALSE. Say we have [font="Courier New"]... MyEvenInteger Int NULL CHECK MyEvenInteger % 2 = 0[/font]. This will allow NULL, because SQL Server will say that the content of the black box *could* be even, so it cannot prove the assertion false.

    This would allow rows with NULL, but if we have a query with [font="Courier New"]...WHERE MyEvenInteger % 2 = 0[/font], the query will not return the rows with NULL.

    Thanks for pointing out that one - extends down to check constraint - you're still on your own.

  • patrickmcginnis59 10839 (2/26/2016)


    ScottPletcher (5/21/2015)


    JediSQL (10/31/2014)

    In most conditional statements, like WHERE clauses and IF statements, the conditional expression will return TRUE if and only if SQL Server considers the assertion provably true. When one of the players in a conditional expression is a NULL, SQL Server says it can not prove anything, and [SQL] returns false.

    That is incorrect, but a common misunderstanding also. SQL does not return "false" from a null comparison, it returns null ("unknown"). SQL does not know if it is false, just like it doesn't know if it is true. Also, NOT false yields true, but NOT NULL does not yield true.

    <Slight objection>

    I'm not really into this "SQL does not know" business. If SQL doesn't know if a conditional value is true or not, HOW CAN IT CLEARLY DETERMINE THAT IT IS NOT TRUE?

    It can't. But it doesn't need to. Again, in a WHERE or JOIN, the condition must be true, not just "possibly true". I know that NULL is not technically "unknown", that's why I put it in quotes. But it's a reasonable way to read when testing nullable values in a WHERE clause.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • An absolute ban on NULL can impose excessive constraints in real life. Events cannot all be reserved for after-the-fact data entry.

    For instance, when drawing up a tentative schedule for court hearings - the name of the judge might not even be known at that point and might even be totally unnecessary when planning the allocation of the courtrooms. The actual end date and the decision date can certainly not be determined before the hearing has completed. Neither can the courtroom chosen for each and every hearing be known until the allocation process has been completed.

    Hearings being the key item - while the judge, courtroom, etc. are essentially constituting parts of the hearing, it would be logical to set up hearing as the master table. With NULL entries. Way simpler than having to deal with hearing_juge, hearing_courtroom linking tables with T-SQL interpreting the absence of a link as "undefined".

    Otherwise how do you publish the information if no field can be left NULL ? Real life cannot be constrained by database design requirements. Life is full of uncertainties and ultimately a database is created to address real-life needs, purists notwithstanding. The idea of storing an unknown decision date as 9999-01-01 just moves the problem to the presentation layer which must then translate the absurd-looking date into "to be announced" - thus removing part of the information value from the database which then becomes dependent on the presentation layer. The latter should handle the formatting but not the meaning.

    The general unease over NULL is probably what gave rise to the lack of consistency in the T-SQL handling of NULL:

    ...WHERE colA IS NULL

    as opposed to

    ...SET colA = NULL

    further compounded with the harebrained

    ...SET ANSI_NULLS OFF

    ...WHERE colA = NULL

    The whole part was sloppy from the beginning. As if unknowns in real life were not complicated enough, SQL Server did not really address them properly.

  • j-1064772 (2/26/2016)


    Otherwise how do you publish the information if no field can be left NULL ? Real life cannot be constrained by database design requirements. Life is full of uncertainties and ultimately a database is created to address real-life needs, purists notwithstanding. The idea of storing an unknown decision date as 9999-01-01 just moves the problem to the presentation layer which must then translate the absurd-looking date into "to be announced" - thus removing part of the information value from the database which then becomes dependent on the presentation layer. The latter should handle the formatting but not the meaning.

    For reporting purposes I use functions that understand TBD, N/A, and UNK. Works a treat!

  • roger.plowman (2/26/2016)


    j-1064772 (2/26/2016)


    Otherwise how do you publish the information if no field can be left NULL ? Real life cannot be constrained by database design requirements. Life is full of uncertainties and ultimately a database is created to address real-life needs, purists notwithstanding. The idea of storing an unknown decision date as 9999-01-01 just moves the problem to the presentation layer which must then translate the absurd-looking date into "to be announced" - thus removing part of the information value from the database which then becomes dependent on the presentation layer. The latter should handle the formatting but not the meaning.

    For reporting purposes I use functions that understand TBD, N/A, and UNK. Works a treat!

    Yes, I had occasions where I had to do that in SSRS but I prefer to handle this in the T-SQL statement like.

    SELECT ...

    CASE

    WHEN ActualStartDate IS NULL THEN 'TDB'

    ELSE CONVERT(varchar(10), ActualStartDate, XX)

    END AS ActualStartDate

    So I can keep the NULL handling with the rest of it such as

    ...WHERE ActualStartDate >= '2012-01-01' OR ActualStartDate IS NULL

    Deal with all the "special care" in one place instead of splitting it in the dataset T-SQL and in a cell formula.

  • j-1064772 (2/26/2016)


    roger.plowman (2/26/2016)


    j-1064772 (2/26/2016)


    Otherwise how do you publish the information if no field can be left NULL ? Real life cannot be constrained by database design requirements. Life is full of uncertainties and ultimately a database is created to address real-life needs, purists notwithstanding. The idea of storing an unknown decision date as 9999-01-01 just moves the problem to the presentation layer which must then translate the absurd-looking date into "to be announced" - thus removing part of the information value from the database which then becomes dependent on the presentation layer. The latter should handle the formatting but not the meaning.

    For reporting purposes I use functions that understand TBD, N/A, and UNK. Works a treat!

    Yes, I had occasions where I had to do that in SSRS but I prefer to handle this in the T-SQL statement like.

    SELECT ...

    CASE

    WHEN ActualStartDate IS NULL THEN 'TDB'

    ELSE CONVERT(varchar(10), ActualStartDate, XX)

    END AS ActualStartDate

    So I can keep the NULL handling with the rest of it such as

    ...WHERE ActualStartDate >= '2012-01-01' OR ActualStartDate IS NULL

    Deal with all the "special care" in one place instead of splitting it in the dataset T-SQL and in a cell formula.

    The problem with that is you assume Null has only one meaning, it can actually have 3 (or more). How do you handle the case of "not applicable" and "truely unknown" in addition to "to be determined?"

  • roger.plowman (2/26/2016)


    j-1064772 (2/26/2016)


    roger.plowman (2/26/2016)


    j-1064772 (2/26/2016)


    Otherwise how do you publish the information if no field can be left NULL ? Real life cannot be constrained by database design requirements. Life is full of uncertainties and ultimately a database is created to address real-life needs, purists notwithstanding. The idea of storing an unknown decision date as 9999-01-01 just moves the problem to the presentation layer which must then translate the absurd-looking date into "to be announced" - thus removing part of the information value from the database which then becomes dependent on the presentation layer. The latter should handle the formatting but not the meaning.

    For reporting purposes I use functions that understand TBD, N/A, and UNK. Works a treat!

    Yes, I had occasions where I had to do that in SSRS but I prefer to handle this in the T-SQL statement like.

    SELECT ...

    CASE

    WHEN ActualStartDate IS NULL THEN 'TDB'

    ELSE CONVERT(varchar(10), ActualStartDate, XX)

    END AS ActualStartDate

    So I can keep the NULL handling with the rest of it such as

    ...WHERE ActualStartDate >= '2012-01-01' OR ActualStartDate IS NULL

    Deal with all the "special care" in one place instead of splitting it in the dataset T-SQL and in a cell formula.

    The problem with that is you assume Null has only one meaning, it can actually have 3 (or more). How do you handle the case of "not applicable" and "truely unknown" in addition to "to be determined?"

    Either have bits to designate each of those statuses or a single "column_name_status" tinyint column.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • it's actually pretty easy to work with an absolute ban.

    You just take any column where you would otherwise need nulls, and make a related table for it and voila'.

    No data? Don't insert a row. Have data? Insert the row in the related table for the "parent" row.

    The cardinality will of course be one-to-one but that is perfectly correct to do.

    For pulling data out, you use left joins on the related tables where there might be null (no rows) for the columns you are projecting.

  • kenambrose (2/26/2016)


    it's actually pretty easy to work with an absolute ban.

    You just take any column where you would otherwise need nulls, and make a related table for it and voila'.

    No data? Don't insert a row. Have data? Insert the row in the related table for the "parent" row.

    The cardinality will of course be one-to-one but that is perfectly correct to do.

    For pulling data out, you use left joins on the related tables where there might be null (no rows) for the columns you are projecting.

    That sounds like a lot of overhead for no benefit...

Viewing 15 posts - 31 through 45 (of 137 total)

You must be logged in to reply to this topic. Login to reply