Database Design Follies: NULL vs. NOT NULL

Question

Database Design Follies: NULL vs. NOT NULL

Viewing 15 posts - 46 through 60 (of 137 total)

You must be logged in to reply to this topic. Login to reply

j-1064772 SSCertifiable Points: 5316 More actions · Answer 1

kenambrose (2/26/2016)
it's actually pretty easy to work with an absolute ban.
You just take any column where you would otherwise need nulls, and make a related table for it and voila'.
No data? Don't insert a row. Have data? Insert the row in the related table for the "parent" row.
The cardinality will of course be one-to-one but that is perfectly correct to do.
For pulling data out, you use left joins on the related tables where there might be null (no rows) for the columns you are projecting.

I have a little difficulty with such an approach in the case of a handful of independent columns with the possibility of each having a known or unknown value. If one column does have a value yes a record could be created but what happens for the other columns whose values are in the "unknown" category ? I would have to create specific 1:1 tables for each such column or worse yet using a 1:1 common table with multiple records, one for each column.

Isn't this way of addressing the NULL annoyance getting more complicated than dragging NULLs with ISNULL(col, ) or ...OR col IS NULL extra statements?

ScottPletcher SSC Guru Points: 101130 More actions · Answer 2

j-1064772 (2/26/2016)
kenambrose (2/26/2016)
it's actually pretty easy to work with an absolute ban.
You just take any column where you would otherwise need nulls, and make a related table for it and voila'.
No data? Don't insert a row. Have data? Insert the row in the related table for the "parent" row.
The cardinality will of course be one-to-one but that is perfectly correct to do.
For pulling data out, you use left joins on the related tables where there might be null (no rows) for the columns you are projecting.
I have a little difficulty with such an approach in the case of a handful of independent columns with the possibility of each having a known or unknown value. If one column does have a value yes a record could be created but what happens for the other columns whose values are in the "unknown" category ? I would have to create specific 1:1 tables for each such column or worse yet using a 1:1 common table with multiple records, one for each column.
Isn't this way of addressing the NULL annoyance getting more complicated than dragging NULLs with ISNULL(col, ) or ...OR col IS NULL extra statements?

That doesn't get rid of NULLs. You'd have to LEFT JOIN to the other tables ... which produces NULL if you don't find a match!

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

j-1064772 SSCertifiable Points: 5316 More actions · Answer 3

That doesn't get rid of NULLs. You'd have to LEFT JOIN to the other tables ... which produces NULL if you don't find a match!

Well you can always "fix" it this way:

SELECT

h.HearingNo,

ISNULL(l.CourtroomName, 'TBD') AS CourtroomName

ISNULL(n.JudgeName, 'TBD') AS JudgeName

CASE

WHEN s.StartDate IS NULL THEN 'TBD'

ELSE CONVERT(varchar(10), s.StartDate, 120)

END AS StartDate

FROM Hearing h

LEFT OUTER JOIN Hearing_Courtroom c ON c.HearingID = h.HearingID

LEFT OUTER JOIN Hearing_Judge j ON j.HearingID = h.HearingID

LEFT OUTER JOIN Judge n ON n.JudgeID = j.JudgeID

LEFT OUTER JOIN Courtroom l ON l.CourtroomID = c.CourtroomID

LEFT OUTER JOIN Hearing_StartDate s ON s.HearingID = h.HearingID

Plus the data entry will have to throw the courtroom, judge and date data in the distinct 1:1 tables with the hearingID reference and make sure that is only one entry for each hearingID.

Aww man ! This is going downhill so fast ...

kenambrose SSCrazy Points: 2513 More actions · Answer 4

well actually it does get rid of nulls. Because you are no longer persisting nulls in the database.

Sure sometimes information will not be known. And output from DB can show that as null.

But that is not the same thing at all as persisting a null "value" (even that term is nonsensical when it comes to discussing nulls) in a column.

ScottPletcher SSC Guru Points: 101130 More actions · Answer 5

j-1064772 (2/26/2016)
That doesn't get rid of NULLs. You'd have to LEFT JOIN to the other tables ... which produces NULL if you don't find a match!
Well you can always "fix" it this way:
SELECT
h.HearingNo,
ISNULL(l.CourtroomName, 'TBD') AS CourtroomName
ISNULL(n.JudgeName, 'TBD') AS JudgeName
CASE
WHEN s.StartDate IS NULL THEN 'TBD'
ELSE CONVERT(varchar(10), s.StartDate, 120)
END AS StartDate
FROM Hearing h
LEFT OUTER JOIN Hearing_Courtroom c ON c.HearingID = h.HearingID
LEFT OUTER JOIN Hearing_Judge j ON j.HearingID = h.HearingID
LEFT OUTER JOIN Judge n ON n.JudgeID = j.JudgeID
LEFT OUTER JOIN Courtroom l ON l.CourtroomID = c.CourtroomID
LEFT OUTER JOIN Hearing_StartDate s ON s.HearingID = h.HearingID

Plus the data entry will have to throw the courtroom, judge and date data in the distinct 1:1 tables with the hearingID reference and make sure that is only one entry for each hearingID.
Aww man ! This is going downhill so fast ...

Edited:

But aren't you also assuming then that a NULL can have only one meaning, i.e., that a missing row must mean "TBD"? Would it be possible to never have a "Judge" if you have a "Hearing_Judge" and vice versa, etc.?

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

j-1064772 SSCertifiable Points: 5316 More actions · Answer 6

Yes, my fix did imply one specific meaning for a null - if you cannot name the judge for instance it is because it has not been done yet. Cannot think of any other reason why - can't hold a hearing without a judge.

The main point I was making is how fast the scheme of entering "possibly" not defined fields each in its own separate side table snowballs downhill real fast.

I did not mention other fields in the case of the hearing: a prosecutor appointed by the district attorney, a possible substitute prosecutor, a clerk, a bailiff, the lawyer designated as lead for the defense, a conference room set aside for lawyers to discuss the case, etc.

A separate side table for each and every possible field ? "Simplifying" the database design to ban persisted nulls resulting in much larger number of 1:1 tables and heavy-handed T-SQL views - like the CRM with tons of LEFT OUTER JOIN for createdbyyominameId, createdonbehalforyominameId, modifiedbyyominameId, modifiedonbehalofyominameId, ... And these are nullable fields left in the main tables.

kenambrose SSCrazy Points: 2513 More actions · Answer 7

yes agreed. if you want to persist the _reason_ why you have no data, well sure, that is fine. but that is not a null, that is valid information. so in that case, same idea. you can store the reason in a related table, along with other metadata if you like, such as when the reason was entered, and as the reason or the data changes, add more related records. so you have a full history of the knowledge or lack thereof of the data.

j-1064772 SSCertifiable Points: 5316 More actions · Answer 8

Indeed, I forgot that I should have also added the metadata on top of each side table for each nullable field. An extra layer of complication yet.

kenambrose SSCrazy Points: 2513 More actions · Answer 9

everything should be made as simple as possible, BUT no simpler.

MadAdmin SSChampion Points: 11261 More actions · Answer 10

If we make sure that input data is uniquely identifiable on each table, i.e. non identity as that is generated after the input data, then we will be forced to create a non identity unique constraint on each table and by default we will have a good amount of non nullible columns as a starting point, and also force proper design.

Catch-all queries done right [/url]
Gail Shaw's Performance Blog[/url]

kenambrose SSCrazy Points: 2513 More actions · Answer 11

[[ non identity ]]

the relational term for that is "natural key".

I would never design a table structure without one. There ALWAYS is one, and if no-one can say what it is, they simply do not understand their data, or don't even understand what a natural key vs. surrogate key means.

Which indicates they should not be designing tables anyway...

In fact, I always (always) create a unique constraint on the natural key and a surrogate key (bigint) and usually make the surrogate the primary key and clustered.

ScottPletcher SSC Guru Points: 101130 More actions · Answer 12

kenambrose (2/29/2016)
[[ non identity ]]
the relational term for that is "natural key".
I would never design a table structure without one. There ALWAYS is one, and if no-one can say what it is, they simply do not understand their data, or don't even understand what a natural key vs. surrogate key means.
Which indicates they should not be designing tables anyway...
In fact, I always (always) create a unique constraint on the natural key and a surrogate key (bigint) and usually make the surrogate the primary key and clustered.

Terrible idea to cluster on identity by default. Applying the best clustered index is the single most critical choice for table performance, and thus it should always be done thoughtfully, not just by slapping it on an identity column. That often leads to vastly worse overall performance, including forcing creation of many extra non-clustered "covering" indexes that would otherwise not be needed.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

kenambrose SSCrazy Points: 2513 More actions · Answer 13

I guess you missed the word "usually" in my previous post? I don't do anything in database design really "by default". Not sure how you read that into my previous post.

Anyway, I guess we will have to agree to disagree.

Or, as an alternative, we can visit Monty's argument room 😉

MadAdmin SSChampion Points: 11261 More actions · Answer 14

kenambrose (2/29/2016)
In fact, I always (always) create a unique constraint on the natural key and a surrogate key (bigint)

If all SQL devs knew this, the world would be a better place.

Catch-all queries done right [/url]
Gail Shaw's Performance Blog[/url]

rstone SSCertifiable Points: 6213 More actions · Answer 15

I think nulls are used more often than required, but they have their place. If we start with a rule that forbids nulls in the database, then we are doing physical design before logical design. If nulls are not allowed, then another means is required to avoid or allow the same option without null. How much extra design and code justifies not using a null? Is such a design is really less complex?

RandyHelpdesk: Perhaps Im not the only one that does not know what you are doing. 😉