September 8, 2005 at 6:30 pm
"Let me say that I am teaching classes 'relational database design'."
God help us!
"Sometime when roads are bad off-road vehicles are better then the standard ones."
Exactly my point! When you need a different solution—a Jeep instead of the Mercedes—you picked the vehicle for the terrain. In this case, you don't turn your Mercedes into a Jeep, you use the Jeep.
For your DBMS, change the tool you are using, don't try to make an RDBMS do non-relation things.... It's only a SQL DBMS after all: it needs all the help it can get, and diverging from the model on which it was based is not very stable reasoning.
Sincerely,
Anthony Thomas
P.S., and I'm never done.
September 27, 2005 at 10:48 am
Thank you for this insightful article.
I am no DBA however I do a lot of database design work as part of my consulting work. I have struggled with this issue for a long time...constantly switching back and forth in my mind as to which approach is better.
I do agree with some of the other comments that the "right" approach can be determined by the scale of the issue/task/project you're faced with. The MUCK approach is definitely faster [and dirtier ] to implement for small prototype projects.
Having implemented numerous projects using both approaches I can tell you that the MUCK seems simple and elegant on the surface but does not scale and adapt to changing business requirements. It works fine when all code sets conform to the same generic definition but breaks quickly when some code sets require different data than others. Dealing with code sets that are hierarchical in nature adds another layer of complexity that the MUCK approach does not handle very well. Sure you can add a parent CodeID column etc but it makes things complex and associates data (columns) with codes sets that don’t make use of the data and consequently have null values for the columns they don’t use. This just feels wrong.
Thanks for the frank discussion. I think the extra work maintaining separate “lookup” tables outweighs the perceived simplicity of a MUCK approach
October 10, 2005 at 12:58 pm
Someone sent me an email with the following question and I thought that both the question and answer might help someone else so I am posting it here:
Hi Don,
So there was a little flurry of activity over the latest lookup table article. As I hinted in the thread, I'm looking at writing an article clarifying the reasoning to reject the MUCK/CLUT design (upon re-reading, some of my own comments in that thread are pretty muddled!). Part of my review has been to re-read your article on the topic, as well as a couple of others. (It looks like about a 6-month cycle of one or the other of the designs being proposed).
One of my main points will be that it's not a normalization issue at all. In fact, throughout a lot of discussions on the site, I see people refer to normalization where it doesn't apply.
In that vein, I would like to try and persuade you that your own reference to the MUCK design violating normalization is in error -- or have you convince me that I'm missing the ball on that point.
Here's the relevant section and quote from your article (emphasis added -- hope your email takes HTML format
"It should work, right? Yes, but you have to be willing to ignore the elephant in the living room, our employee table is not properly normalized because CodeID is partially dependent on CodeType which is not part of the key."
Although, in a loose sense, you are correct that CodeID is dependent upon CodeType, this is a dependency that is, in fact, due to the foreign key constraint. Specifically, it is not a functional dependency (nor is it either of the more general forms of relation dependency, multivalue or join). As such, it has nothing to do with normalization.
To illustrate my point further, consider what would happen if you disallowed the interrelationship of the attributes which participate in a compound foreign key -- the natural result would be that you would not be allowed to ever have a compound primary key, because nothing could "foreign key to it".
My hypothosis on this confusion is that people take the colloquial memorization trick for normalization -- "every attribute depends on the key, the whole key, and nothing but the key", and don't think about the fact that "depends on" in that sentence is talking about functional, multivalue, and join dependencies (or, more succinctly, join dependencies, since the other two are special cases of the more general).
Does this make sense? Again, please let me know if you think I've missed the mark on this.
Thanks!
Here is my reply:
The answer is ultimately that it depends… You are correct that my article was necessarily too brief to cover the subject in detail and so was a bit muddled, and you are also correct in pointing out the general muddled nature of the forum posts; some of them due to the muddled thinking of the poster, some due to the nature of forums…
Let me see if I can make my case: First off, forget about the IDENTITY column, it isn’t a key in any logical sense. So in this example I’ll leave it out. Also, just ignore the data types, I did this quick and dirty and data types are irrelevant for this discussion.
The real key for the MUCK table has to be Code and CodeType in order to uniquely identify a record.
At first glance this looks to be “normalized” because the following.
{codeType, Code} à {Description} is the only FD for the MUCK
{Code} à {CodeType} doesn’t really exist because identical code values might exist under different CodeTypes i.e. {CA, State} and {CA, Software Vendor} etc… But it does sort of exist because there is a many to many relationship between Code and CodeType. This is exactly what is meant by a Multi-Valued Dependency and is resolved by the rules for the 4th Normal Form.
The effects of a MVD’s are not limited to the MUCK table either, they extend to any table that references it.
For Employee we have:
{EmpNo} à {FirstName}
{EmpNo} à {LastName}
{EmpNo} à {StatusCode}
But CodeType isn’t functionally dependent on EmpNo, it is actually “hard coded” into the table. The proposition “Employee {EmpNo} has a CodeType of {CodeType}” makes no sense; CodeType is an attribute of Code, not Employee. In fact, there is a kind of pseudo FD between Code and CodeType; if the Codes were unique there would be a FD between them. CodeType is only there to enforce a constraint on the Code value, which ought to have been enforced through the use of a separate EmployeeStatus table. In fact CodeType, as used here, is really meta-data as opposed to data.
Think of another example
Suspend your disbelief and just pretend that this is what your company needs…
{CompanyName} à {Address}
{CompanyName} à {ContactFirstName}
{CompanyName} à {ContactLastName}
Also note that any combination of the three still holds and makes sense. i.e. {CompanyName} à {Address, ContactLastName}
The Key in this table consists of two “independent” columns. Even though they are both about the same subject, they are not in any way dependent on each other.
Of course, you could just eliminate the CodeType from the Employee table and enforce that constraint via a trigger, but then Code has to be the key of the MUCK table. And now instead of defining a direct FD as a column in a table with a proper key, you would have just obscured it, violating the Information Principle but the FD is still there. This isn’t a violation of normalization per se, but it certainly violates good database design.
I might add that one of the most common misconceptions in the database industry is that “3NF is good enough” when in fact 3NF is pretty irrelevant to relations with composite keys, which is of course, what we have been discussing. As a result, most people, who never bother to learn anything beyond the incorrect and hopelessly fuzzy definitions of 1NF – 3NF, would find it impossible to define precisely the problem with MUCK tables in terms of normalization.
/*****************
If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek
*****************/
March 24, 2006 at 2:03 am
I've never posted a reply to any of these article bedfore BUT this is just a brilliant explaination of what is wrong with so many databses that I've seen. I've tried to explain to people what is wrong wih the design but failed to get my point across - in future I'll just print this article ( maybe cut out some of the sarcasm) and show the.
The logical progression from CODE TABLEs to the attitude that everything is an X is brilliantly explained and I've seen it so often
Well Don!!
March 24, 2006 at 6:29 am
From the point of view of someone who completely agrees with the author - that someone being me I guess - the article was enjoyable as I thought of situations where I could just refer the person on the other side of the "discussion" to it
Having said that, the flow, etc of the article could've been improved upon. But, the point is well made and the extra stuff afterwards can be ignored. Simply put, if you have a MUCK table, you cannot declare ref integrity against it - this should be enough of a reason to ring alarm bells. As for the extra "code type" field that would be required in both tables.... I would be very upset seeing this in production code Or any code
March 24, 2006 at 6:33 am
OK...so we don't like lookup tables.
So what the *bleep* DO you like?
SHOW ME!!!!!
March 24, 2006 at 6:35 am
Sorry all, but I can't stop myself from observing that part of this thread has been reduced to the level of MUCK-raking.
Now that I have that out of my system, cheers to Mr. Peterson for opening a conversation that caused people to both express their own and examine others' opinions. To me that is the point of this type of thing.
I come down on the anti-MUCK side of the world rather firmly based on this thought: Earlier in the thread it was pointed out that the right solution models the business process. How many examples of successful real-world business processes (as in legally profitable going concerns) designed and implemented those processes with the intent of creating MUCK? My thinking is that, truly, there aren't very many left.
------------
Buy the ticket, take the ride. -- Hunter S. Thompson
March 24, 2006 at 6:48 am
Putting all lookup types in one table violates one or more of the normal forms of database design. This topic ebs and flows, but always falls back to using one table for each lookup type, which is good, and which underpins good database design.
If a BigInt is required for the primary key, implying a huge table, imagine what that does for performance.
March 24, 2006 at 7:03 am
I would like to have seen a more fact and example based argument. What do these things to do performance and scalability. Every design decision has to be weighed out because we never have requirements that dovetail - they always manage to conflict in some way and we are left choosing the right scenario for the right situation. I couldn't get a scenario to solution feel here - just felt like the answer was "do a MUCK and you're an idiot". Where I work, there is usually a 2-3 month lag for making structural db changes. So, the MUCK allows me to keep up with my business changes much faster. I do sacrifice RI at times, but I make it clear to my clients and they generally accept. Between time, resources, quality and requirement conflicts/constraints - we need more comparative analysis so we can better pick our battles. We all know what the right way to do any given task is...but we are rarely given the time, resources and leeway to modify other requirements to make it happen.
March 24, 2006 at 7:03 am
I use lookup tables a lot and ALWAYS keep them separate, even if some only have a few rows in them... but how badly does this effect DB performance?
There is an alternative I've considered but never implemented which is to have a MUCK and then create a set of Views so I only have 1 underlying table but each view only returns the relevant data - not sure if the overhead of having 20 views + 1 table is more than having 20 tables tho!
Anyway, just a thought for the pot...
March 24, 2006 at 7:21 am
crazy unitabular model...
March 24, 2006 at 7:27 am
Damn I wish I had written this! The point (don't combine dissimilar data, even "lookup" data, into one table) is absolutely correct. The presentation looked just fine. Derivation of the Thing table and its significance had me ROFLMAO.
I have fought this fight, coming in behind designers with an imperfect understanding of relational integrity. When we have the option to add the proper tables and "de-compose" the MUCK table, we do. When we can't, that's life, and we deal with it and go on. But there is a cost that makes it worth creating the right tables if you can possibly secure the mandate.
Those of you who don't see how important this article is, please reconsider. Appearance is not reality. The appearance of economy (fewer tables) is not the reality of functional economy (enforcing relational integrity with the least effort, reflecting the truth of the nature of the data with the structures used to contain it, clarity for those who follow after you [which, by the way, includes you after you've been away a while and come back]).
If one in a hundred MUCK table users sees the light because of this article, its author has performed a worthwhile service.
Worth saying and well said.
March 24, 2006 at 7:45 am
Excellent! When I read an article last year promoting the use of MUCK tables, I couldn't see how it could possibly be maintainable but wrote it off as someone elses solution for their own unique problems. It's great to see an article that debunks the MUCK table "solution" and exposes several of the inevitable problems with this model.
Well Done!
March 24, 2006 at 8:23 am
Entirely a poor article. Wordy, unclear, and pretentious. You have some good ideas. Unfortunately, you fail to express them clearly.
Even worse, you tell us how wrong the "MUCK" practice is but never propose a solution.
March 24, 2006 at 8:24 am
There is no doubt that you are spot on. But how do you convince those who live by "muck"-ing up the database?
Viewing 15 posts - 106 through 120 (of 202 total)
You must be logged in to reply to this topic. Login to reply