Lookup Table Madness

Question

Post reply

Lookup Table Madness

Viewing 15 posts - 106 through 120 (of 202 total)

You must be logged in to reply to this topic. Login to reply

Anthony L. Thomas Ten Centuries Points: 1246 More actions · Answer 1

"Let me say that I am teaching classes 'relational database design'."

God help us!

"Sometime when roads are bad off-road vehicles are better then the standard ones."

Exactly my point! When you need a different solution—a Jeep instead of the Mercedes—you picked the vehicle for the terrain. In this case, you don't turn your Mercedes into a Jeep, you use the Jeep.

For your DBMS, change the tool you are using, don't try to make an RDBMS do non-relation things.... It's only a SQL DBMS after all: it needs all the help it can get, and diverging from the model on which it was based is not very stable reasoning.

Sincerely,

Anthony Thomas

P.S., and I'm never done.

Crazy Canuck Valued Member Points: 52 More actions · Answer 2

Thank you for this insightful article.

I am no DBA however I do a lot of database design work as part of my consulting work. I have struggled with this issue for a long time...constantly switching back and forth in my mind as to which approach is better.

I do agree with some of the other comments that the "right" approach can be determined by the scale of the issue/task/project you're faced with. The MUCK approach is definitely faster [and dirtier ] to implement for small prototype projects.

Having implemented numerous projects using both approaches I can tell you that the MUCK seems simple and elegant on the surface but does not scale and adapt to changing business requirements. It works fine when all code sets conform to the same generic definition but breaks quickly when some code sets require different data than others. Dealing with code sets that are hierarchical in nature adds another layer of complexity that the MUCK approach does not handle very well. Sure you can add a parent CodeID column etc but it makes things complex and associates data (columns) with codes sets that don’t make use of the data and consequently have null values for the columns they don’t use. This just feels wrong.

Thanks for the frank discussion. I think the extra work maintaining separate “lookup” tables outweighs the perceived simplicity of a MUCK approach

DCPeterson SSCoach Points: 15266 More actions · Answer 3

Someone sent me an email with the following question and I thought that both the question and answer might help someone else so I am posting it here:

Hi Don,

So there was a little flurry of activity over the latest lookup table article. As I hinted in the thread, I'm looking at writing an article clarifying the reasoning to reject the MUCK/CLUT design (upon re-reading, some of my own comments in that thread are pretty muddled!). Part of my review has been to re-read your article on the topic, as well as a couple of others. (It looks like about a 6-month cycle of one or the other of the designs being proposed).

One of my main points will be that it's not a normalization issue at all. In fact, throughout a lot of discussions on the site, I see people refer to normalization where it doesn't apply.

In that vein, I would like to try and persuade you that your own reference to the MUCK design violating normalization is in error -- or have you convince me that I'm missing the ball on that point.

Here's the relevant section and quote from your article (emphasis added -- hope your email takes HTML format

"It should work, right? Yes, but you have to be willing to ignore the elephant in the living room, our employee table is not properly normalized because CodeID is partially dependent on CodeType which is not part of the key."

Although, in a loose sense, you are correct that CodeID is dependent upon CodeType, this is a dependency that is, in fact, due to the foreign key constraint. Specifically, it is not a functional dependency (nor is it either of the more general forms of relation dependency, multivalue or join). As such, it has nothing to do with normalization.

To illustrate my point further, consider what would happen if you disallowed the interrelationship of the attributes which participate in a compound foreign key -- the natural result would be that you would not be allowed to ever have a compound primary key, because nothing could "foreign key to it".

My hypothosis on this confusion is that people take the colloquial memorization trick for normalization -- "every attribute depends on the key, the whole key, and nothing but the key", and don't think about the fact that "depends on" in that sentence is talking about functional, multivalue, and join dependencies (or, more succinctly, join dependencies, since the other two are special cases of the more general).

Does this make sense? Again, please let me know if you think I've missed the mark on this.

Thanks!

Here is my reply:

The answer is ultimately that it depends… You are correct that my article was necessarily too brief to cover the subject in detail and so was a bit muddled, and you are also correct in pointing out the general muddled nature of the forum posts; some of them due to the muddled thinking of the poster, some due to the nature of forums…

Let me see if I can make my case: First off, forget about the IDENTITY column, it isn’t a key in any logical sense. So in this example I’ll leave it out. Also, just ignore the data types, I did this quick and dirty and data types are irrelevant for this discussion.

The real key for the MUCK table has to be Code and CodeType in order to uniquely identify a record.

At first glance this looks to be “normalized” because the following.

{codeType, Code} à {Description} is the only FD for the MUCK

{Code} à {CodeType} doesn’t really exist because identical code values might exist under different CodeTypes i.e. {CA, State} and {CA, Software Vendor} etc… But it does sort of exist because there is a many to many relationship between Code and CodeType. This is exactly what is meant by a Multi-Valued Dependency and is resolved by the rules for the 4^th Normal Form.

The effects of a MVD’s are not limited to the MUCK table either, they extend to any table that references it.

For Employee we have:

{EmpNo} à {FirstName}

{EmpNo} à {LastName}

{EmpNo} à {StatusCode}

But CodeType isn’t functionally dependent on EmpNo, it is actually “hard coded” into the table. The proposition “Employee {EmpNo} has a CodeType of {CodeType}” makes no sense; CodeType is an attribute of Code, not Employee. In fact, there is a kind of pseudo FD between Code and CodeType; if the Codes were unique there would be a FD between them. CodeType is only there to enforce a constraint on the Code value, which ought to have been enforced through the use of a separate EmployeeStatus table. In fact CodeType, as used here, is really meta-data as opposed to data.

Think of another example

Suspend your disbelief and just pretend that this is what your company needs…

{CompanyName} à {Address}

{CompanyName} à {ContactFirstName}

{CompanyName} à {ContactLastName}

Also note that any combination of the three still holds and makes sense. i.e. {CompanyName} à {Address, ContactLastName}

The Key in this table consists of two “independent” columns. Even though they are both about the same subject, they are not in any way dependent on each other.

Of course, you could just eliminate the CodeType from the Employee table and enforce that constraint via a trigger, but then Code has to be the key of the MUCK table. And now instead of defining a direct FD as a column in a table with a proper key, you would have just obscured it, violating the Information Principle but the FD is still there. This isn’t a violation of normalization per se, but it certainly violates good database design.

I might add that one of the most common misconceptions in the database industry is that “3NF is good enough” when in fact 3NF is pretty irrelevant to relations with composite keys, which is of course, what we have been discussing. As a result, most people, who never bother to learn anything beyond the incorrect and hopelessly fuzzy definitions of 1NF – 3NF, would find it impossible to define precisely the problem with MUCK tables in terms of normalization.

/*****************

If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

*****************/

William Houlihan-239309 SSC Enthusiast Points: 131 More actions · Answer 4

I've never posted a reply to any of these article bedfore BUT this is just a brilliant explaination of what is wrong with so many databses that I've seen. I've tried to explain to people what is wrong wih the design but failed to get my point across - in future I'll just print this article ( maybe cut out some of the sarcasm) and show the.

The logical progression from CODE TABLEs to the attitude that everything is an X is brilliantly explained and I've seen it so often

Well Don!!

Ian Yates SSCoach Points: 19738 More actions · Answer 5

From the point of view of someone who completely agrees with the author - that someone being me I guess - the article was enjoyable as I thought of situations where I could just refer the person on the other side of the "discussion" to it

Having said that, the flow, etc of the article could've been improved upon. But, the point is well made and the extra stuff afterwards can be ignored. Simply put, if you have a MUCK table, you cannot declare ref integrity against it - this should be enough of a reason to ring alarm bells. As for the extra "code type" field that would be required in both tables.... I would be very upset seeing this in production code Or any code

Martin Vrieze SSCrazy Points: 2760 More actions · Answer 6

OK...so we don't like lookup tables.

So what the *bleep* DO you like?

SHOW ME!!!!!

Bryant McClellan SSCarpal Tunnel Points: 4397 More actions · Answer 7

Sorry all, but I can't stop myself from observing that part of this thread has been reduced to the level of MUCK-raking.

Now that I have that out of my system, cheers to Mr. Peterson for opening a conversation that caused people to both express their own and examine others' opinions. To me that is the point of this type of thing.

I come down on the anti-MUCK side of the world rather firmly based on this thought: Earlier in the thread it was pointed out that the right solution models the business process. How many examples of successful real-world business processes (as in legally profitable going concerns) designed and implemented those processes with the intent of creating MUCK? My thinking is that, truly, there aren't very many left.

------------
Buy the ticket, take the ride. -- Hunter S. Thompson

Paul Novelli SSC Enthusiast Points: 162 More actions · Answer 8

Putting all lookup types in one table violates one or more of the normal forms of database design. This topic ebs and flows, but always falls back to using one table for each lookup type, which is good, and which underpins good database design.

If a BigInt is required for the primary key, implying a huge table, imagine what that does for performance.

SueStill SSC Eights! Points: 946 More actions · Answer 9

I would like to have seen a more fact and example based argument. What do these things to do performance and scalability. Every design decision has to be weighed out because we never have requirements that dovetail - they always manage to conflict in some way and we are left choosing the right scenario for the right situation. I couldn't get a scenario to solution feel here - just felt like the answer was "do a MUCK and you're an idiot". Where I work, there is usually a 2-3 month lag for making structural db changes. So, the MUCK allows me to keep up with my business changes much faster. I do sacrifice RI at times, but I make it clear to my clients and they generally accept. Between time, resources, quality and requirement conflicts/constraints - we need more comparative analysis so we can better pick our battles. We all know what the right way to do any given task is...but we are rarely given the time, resources and leeway to modify other requirements to make it happen.

AdamDale3 Valued Member Points: 68 More actions · Answer 10

I use lookup tables a lot and ALWAYS keep them separate, even if some only have a few rows in them... but how badly does this effect DB performance?

There is an alternative I've considered but never implemented which is to have a MUCK and then create a set of Views so I only have 1 underlying table but each view only returns the relevant data - not sure if the overhead of having 20 views + 1 table is more than having 20 tables tho!

Anyway, just a thought for the pot...

Radomirs Cirskis Valued Member Points: 59 More actions · Answer 11

Radomirs Cirskis

Valued Member

Points: 59

March 24, 2006 at 7:21 am

#628606

crazy unitabular model...

Charles Wannall Ten Centuries Points: 1023 More actions · Answer 12

Damn I wish I had written this! The point (don't combine dissimilar data, even "lookup" data, into one table) is absolutely correct. The presentation looked just fine. Derivation of the Thing table and its significance had me ROFLMAO.

I have fought this fight, coming in behind designers with an imperfect understanding of relational integrity. When we have the option to add the proper tables and "de-compose" the MUCK table, we do. When we can't, that's life, and we deal with it and go on. But there is a cost that makes it worth creating the right tables if you can possibly secure the mandate.

Those of you who don't see how important this article is, please reconsider. Appearance is not reality. The appearance of economy (fewer tables) is not the reality of functional economy (enforcing relational integrity with the least effort, reflecting the truth of the nature of the data with the structures used to contain it, clarity for those who follow after you [which, by the way, includes you after you've been away a while and come back]).

If one in a hundred MUCK table users sees the light because of this article, its author has performed a worthwhile service.

Worth saying and well said.

NF Scott Smith Right there with Babe Points: 773 More actions · Answer 13

Excellent! When I read an article last year promoting the use of MUCK tables, I couldn't see how it could possibly be maintainable but wrote it off as someone elses solution for their own unique problems. It's great to see an article that debunks the MUCK table "solution" and exposes several of the inevitable problems with this model.

Well Done!

Steve Fontaine SSC-Addicted Points: 415 More actions · Answer 14

Entirely a poor article. Wordy, unclear, and pretentious. You have some good ideas. Unfortunately, you fail to express them clearly.

Even worse, you tell us how wrong the "MUCK" practice is but never propose a solution.

Gerald Schanz SSC Journeyman Points: 98 More actions · Answer 15

There is no doubt that you are spot on. But how do you convince those who live by "muck"-ing up the database?