Database Design Follies: NULL vs. NOT NULL

  • kenambrose (2/26/2016)


    it's actually pretty easy to work with an absolute ban.

    You just take any column where you would otherwise need nulls, and make a related table for it and voila'.

    No data? Don't insert a row. Have data? Insert the row in the related table for the "parent" row.

    The cardinality will of course be one-to-one but that is perfectly correct to do.

    For pulling data out, you use left joins on the related tables where there might be null (no rows) for the columns you are projecting.

    I have a little difficulty with such an approach in the case of a handful of independent columns with the possibility of each having a known or unknown value. If one column does have a value yes a record could be created but what happens for the other columns whose values are in the "unknown" category ? I would have to create specific 1:1 tables for each such column or worse yet using a 1:1 common table with multiple records, one for each column.

    Isn't this way of addressing the NULL annoyance getting more complicated than dragging NULLs with ISNULL(col, ) or ...OR col IS NULL extra statements?

  • j-1064772 (2/26/2016)


    kenambrose (2/26/2016)


    it's actually pretty easy to work with an absolute ban.

    You just take any column where you would otherwise need nulls, and make a related table for it and voila'.

    No data? Don't insert a row. Have data? Insert the row in the related table for the "parent" row.

    The cardinality will of course be one-to-one but that is perfectly correct to do.

    For pulling data out, you use left joins on the related tables where there might be null (no rows) for the columns you are projecting.

    I have a little difficulty with such an approach in the case of a handful of independent columns with the possibility of each having a known or unknown value. If one column does have a value yes a record could be created but what happens for the other columns whose values are in the "unknown" category ? I would have to create specific 1:1 tables for each such column or worse yet using a 1:1 common table with multiple records, one for each column.

    Isn't this way of addressing the NULL annoyance getting more complicated than dragging NULLs with ISNULL(col, ) or ...OR col IS NULL extra statements?

    That doesn't get rid of NULLs. You'd have to LEFT JOIN to the other tables ... which produces NULL if you don't find a match!

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • That doesn't get rid of NULLs. You'd have to LEFT JOIN to the other tables ... which produces NULL if you don't find a match!

    Well you can always "fix" it this way:

    SELECT

    h.HearingNo,

    ISNULL(l.CourtroomName, 'TBD') AS CourtroomName

    ISNULL(n.JudgeName, 'TBD') AS JudgeName

    CASE

    WHEN s.StartDate IS NULL THEN 'TBD'

    ELSE CONVERT(varchar(10), s.StartDate, 120)

    END AS StartDate

    FROM Hearing h

    LEFT OUTER JOIN Hearing_Courtroom c ON c.HearingID = h.HearingID

    LEFT OUTER JOIN Hearing_Judge j ON j.HearingID = h.HearingID

    LEFT OUTER JOIN Judge n ON n.JudgeID = j.JudgeID

    LEFT OUTER JOIN Courtroom l ON l.CourtroomID = c.CourtroomID

    LEFT OUTER JOIN Hearing_StartDate s ON s.HearingID = h.HearingID

    Plus the data entry will have to throw the courtroom, judge and date data in the distinct 1:1 tables with the hearingID reference and make sure that is only one entry for each hearingID.

    Aww man ! This is going downhill so fast ...

  • well actually it does get rid of nulls. Because you are no longer persisting nulls in the database.

    Sure sometimes information will not be known. And output from DB can show that as null.

    But that is not the same thing at all as persisting a null "value" (even that term is nonsensical when it comes to discussing nulls) in a column.

  • j-1064772 (2/26/2016)


    That doesn't get rid of NULLs. You'd have to LEFT JOIN to the other tables ... which produces NULL if you don't find a match!

    Well you can always "fix" it this way:

    SELECT

    h.HearingNo,

    ISNULL(l.CourtroomName, 'TBD') AS CourtroomName

    ISNULL(n.JudgeName, 'TBD') AS JudgeName

    CASE

    WHEN s.StartDate IS NULL THEN 'TBD'

    ELSE CONVERT(varchar(10), s.StartDate, 120)

    END AS StartDate

    FROM Hearing h

    LEFT OUTER JOIN Hearing_Courtroom c ON c.HearingID = h.HearingID

    LEFT OUTER JOIN Hearing_Judge j ON j.HearingID = h.HearingID

    LEFT OUTER JOIN Judge n ON n.JudgeID = j.JudgeID

    LEFT OUTER JOIN Courtroom l ON l.CourtroomID = c.CourtroomID

    LEFT OUTER JOIN Hearing_StartDate s ON s.HearingID = h.HearingID

    Plus the data entry will have to throw the courtroom, judge and date data in the distinct 1:1 tables with the hearingID reference and make sure that is only one entry for each hearingID.

    Aww man ! This is going downhill so fast ...

    Edited:

    But aren't you also assuming then that a NULL can have only one meaning, i.e., that a missing row must mean "TBD"? Would it be possible to never have a "Judge" if you have a "Hearing_Judge" and vice versa, etc.?

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Yes, my fix did imply one specific meaning for a null - if you cannot name the judge for instance it is because it has not been done yet. Cannot think of any other reason why - can't hold a hearing without a judge.

    The main point I was making is how fast the scheme of entering "possibly" not defined fields each in its own separate side table snowballs downhill real fast.

    I did not mention other fields in the case of the hearing: a prosecutor appointed by the district attorney, a possible substitute prosecutor, a clerk, a bailiff, the lawyer designated as lead for the defense, a conference room set aside for lawyers to discuss the case, etc.

    A separate side table for each and every possible field ? "Simplifying" the database design to ban persisted nulls resulting in much larger number of 1:1 tables and heavy-handed T-SQL views - like the CRM with tons of LEFT OUTER JOIN for createdbyyominameId, createdonbehalforyominameId, modifiedbyyominameId, modifiedonbehalofyominameId, ... And these are nullable fields left in the main tables.

  • yes agreed. if you want to persist the _reason_ why you have no data, well sure, that is fine. but that is not a null, that is valid information. so in that case, same idea. you can store the reason in a related table, along with other metadata if you like, such as when the reason was entered, and as the reason or the data changes, add more related records. so you have a full history of the knowledge or lack thereof of the data.

  • Indeed, I forgot that I should have also added the metadata on top of each side table for each nullable field. An extra layer of complication yet.

  • everything should be made as simple as possible, BUT no simpler.

  • If we make sure that input data is uniquely identifiable on each table, i.e. non identity as that is generated after the input data, then we will be forced to create a non identity unique constraint on each table and by default we will have a good amount of non nullible columns as a starting point, and also force proper design.

    Catch-all queries done right [/url]
    Gail Shaw's Performance Blog[/url]

  • [[ non identity ]]

    the relational term for that is "natural key".

    I would never design a table structure without one. There ALWAYS is one, and if no-one can say what it is, they simply do not understand their data, or don't even understand what a natural key vs. surrogate key means.

    Which indicates they should not be designing tables anyway...

    In fact, I always (always) create a unique constraint on the natural key and a surrogate key (bigint) and usually make the surrogate the primary key and clustered.

  • kenambrose (2/29/2016)


    [[ non identity ]]

    the relational term for that is "natural key".

    I would never design a table structure without one. There ALWAYS is one, and if no-one can say what it is, they simply do not understand their data, or don't even understand what a natural key vs. surrogate key means.

    Which indicates they should not be designing tables anyway...

    In fact, I always (always) create a unique constraint on the natural key and a surrogate key (bigint) and usually make the surrogate the primary key and clustered.

    Terrible idea to cluster on identity by default. Applying the best clustered index is the single most critical choice for table performance, and thus it should always be done thoughtfully, not just by slapping it on an identity column. That often leads to vastly worse overall performance, including forcing creation of many extra non-clustered "covering" indexes that would otherwise not be needed.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • I guess you missed the word "usually" in my previous post? I don't do anything in database design really "by default". Not sure how you read that into my previous post.

    Anyway, I guess we will have to agree to disagree.

    Or, as an alternative, we can visit Monty's argument room 😉

  • kenambrose (2/29/2016)


    In fact, I always (always) create a unique constraint on the natural key and a surrogate key (bigint)

    If all SQL devs knew this, the world would be a better place.

    Catch-all queries done right [/url]
    Gail Shaw's Performance Blog[/url]

  • I think nulls are used more often than required, but they have their place. If we start with a rule that forbids nulls in the database, then we are doing physical design before logical design. If nulls are not allowed, then another means is required to avoid or allow the same option without null. How much extra design and code justifies not using a null? Is such a design is really less complex?

    RandyHelpdesk: Perhaps Im not the only one that does not know what you are doing. 😉

Viewing 15 posts - 46 through 60 (of 137 total)

You must be logged in to reply to this topic. Login to reply