September 8, 2005 at 12:41 pm
(Hope this isn't a duplicate post as my first post seems to have bee sucked into a black hole...)
No argument that the use of separate tables is the quote 'best' way to design a repository for persistence of code-value information, from a theoretical standpoint and for all the reasons stated in favor of that approach. I would go so far to agree that in a perfect world it is the only approach that should be used, the perfect world ruled by non-technical business people with no development budget or timeline constraints.
I have what I consider to be a pretty good understanding of RDBMS theory but still confess to using a MUCK approach at times, but only after much consideration and in cases where I'm 99.999% percent sure the inherent potential problems will not crop up in the 'real world'.
My personal threshold for making the decision for doing separate tables vs. MUCK is as follows:
- MUCK table used only for storing redundant data for populating drop-downs on user interface only
- MUCK not even considered until 6+ candidate tables identified
- Each lookup type has no more than 10 entries
- Entire MUCK table will not exceed 500 entries
- Candidate values have varchar type only
To summarize I don't use a MUCK except for extremely basic cases where I'm sure there won't be problems down the road. How can I say I'm sure? Well you never know but I base this on the success of the past 20+ years of experience in developing applications and after the initial refinement of personal rules deciding when to use them.
I'm not ignorant of the true normalized approach optimized for an RDBMS - I avert my eyes and CONSCIOUSLY choose to ignore this and use MUCK under very specific conditions but I don't feel that makes me an idiot. I would feel like an idiot however if I lost a bid to develop an application for a client because my proposed cost is higher than the next contractor, solely because I've factored in the hours it takes to develop user interfaces and stored procedures (even using code generators\metadate) to maintain separate lookup tables versus cloning my successful MUCK model in a fraction of the time. Even using cloning and code generators, if it just took one hour to create everything involved with maintaing a separate table for each lookup candidate, it would add 10 hours for ten tables when it could be done in an hour. With the aforementioned magic number of '300' tables you can see the effects of scale (and cost).
Again I cannot stress enough the consideration that must be given to each case in deciding to incorporate a MUCK approach in an application, and should never be used blindly.
maddog
September 8, 2005 at 2:15 pm
Thanks for making my argument. You are willing to cut corners for the sake of saving time creating tables, but are you then coding your application so situations like the one given in the article or the one illustrated by Pamela won't occur? If so your time savings disappears rapidly and the quality of the software most likely decreases (increased likelyhood of bugs). Even then who will ensure that every application that uses the data will be so coded? Now you not only have bad database design, but redundant code to boot. If not then you have totally nullified any idea of data integrity. Not only that but you have destroyed data independence and struck a severe blow to the future maintainability and validity of that data.
Your criteria have NOTHING to do with the problems that MUCK tables cause. They don't help mitigate them in even the smallest way, so I don't know why you bother to have them.
Why is it that you think that the values in an application drop-down menu aren't important enough to enforce some sort of integrity? Why do you think that the database should be created to support your application? You have the cart solidly before the horse here. Applications come and go, data stays. The data should be modeled in such a way that ANY application can access it an make meaningful use of it. The use of poor database designs is one of the major reasons that it is so difficult to integrate different systems. There is just too much of the data that is dependent on the application "black box" code to sort out, that using the data in another system requires a herculean effort.
The size of the MUCK table is irrelevant (outside of the possible physical performance concerns) because it isn't the MUCK table that will need to be cleaned up, it's all the other tables that have pseudo-foreign-key references to it.
The logic behind the varchar type criteria completely escapes me. Of course a MUCK table will have a varchar data type for the values, by definition, it couldn't be anything else. Hence my criticism that you can't even enforce basic domain integrity constraints on the stupid things.
/*****************
If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek
*****************/
September 8, 2005 at 3:35 pm
Except that my judicial use of this criteria has never resulted in ANY of the potential problems associated with MOST MUCK tables. So - should I continue using what works well for me or blindly follow the other sheep and adhere to an ideal that yes, in theory (and practice for most people) is better at all costs? I truly believe there are practical situations that override the academic ideal, even when you are aware of what the ideal is.
Out of haste I did forget to list probably the most important criteria I use in selecting a candidate for a lookup type in a MUCK table, which is to say that I need to be close to 100% sure that the entries will NOT change once they have been entered, almost to the point that someone could say it might be better to have the possible values hard-coded wherever they are used as in the front-end. You will ask how can you be sure, but I'll re-state that the criteria I use in making this decision has served me well and saved clients money and me time and I have NOT heard of a single problem with the results in those 20 years (actually about 19.5 years as I did not look at this as critically then as I do now). And I know this because these are repeat clients over that entire timeframe. So with that said and as strongly as some might disagree, the PRACTICAL net effect is that a very critical and judicial use of MUCK simply does not require any type of explicit referential integrity or consideration of the other pitfalls (all RDBMS purists roll over in graves or cubicles, please) so in most cases I don't even attempt to do so (no check constraints, no triggers) - I am not lazy, I am just that sure that there won't be any problems down the road. If I whiff even the slightest hint that one of the criteria might be violated in the future then somehing is not a candidate for a MUCK lookup type. Although I can't put my finger on them there are other less tangible criteria I use to decide what goes in a MUCK table and what does not, sometimes just an intuition (and pure logicians hate intuition).
"The logic behind the varchar type criteria completely escapes me."
You're absolutely right, of course the MUCK table will have a varchar data type for the values. To clarify what I meant: I will only use code values in a muck table whose natural data type will always be used as varchar, by the application or user interfaces that reference them, so there will no need for explicit conversion to other types which mitigates the issues associated with that.
It is possible I have indeed done a disservice to readers in asserting my 2cents, which might end up with some folks feeling needlessly justifed but might also push someone over the edge to using MUCK perhaps less judiciously, and for that I apologize. I just know what has worked for me. 99% of the people I know I would never even bring up the subject of using MUCK tables if asked advice on building a new application, and I would promote the use of separate tables. I'm just one who prefers to say 'never say never'.
maddog
September 8, 2005 at 4:06 pm
"I need to be close to 100% sure that the entries will NOT change once they have been entered, almost to the point that someone could say it might be better to have the possible values hard-coded..."
If you are that sure they won't change, I WOULD say it. Putting those values in the database just incurs unnecessary overhead--if you are really that sure...
/*****************
If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek
*****************/
September 8, 2005 at 4:23 pm
...Change as in deletions from the list of values. If I feel this can be a possibility it is not a candidate for MUCK table. In more cases I will allow a table to be a candidate for a MUCK if there is the possibility that I think a value could be added in the future. More often it is the case where the number of entries will remain static but there might be the chance of the value text changing to impart the same meaning and intent but the spelling might change; i.e. a client suddenly prefers the text 'Open' to 'Active' in a dropdown to represent a status of an order. Quick change in database via front-end, no red tape involved with change control process, customer happy.
maddog
September 8, 2005 at 4:40 pm
Very pathetic. But I am wondering how people are reading the article and do they really understand what theory and practice mean and the differences between pure theory and practice. First, article describes the various scenario and is not even tells anybody that one scenario is more preferred over another. The second point, it is clearly stated that in many cases combination of scenarios can and should be used.
Third scenario, as pointed in article can be used for very specifically defined codes. For example, how you going to have state codes for various countries. For each country there is it’s own standard. Or, to avoid the MUCK tables, it will be the table for each country? Some other cases are making the third scenario is very useful as well. And, based on your theory, if there is 400 databases on one server each database should have state, time zone ,… tables because there is no way to set relationship between those tables and lookup table in another database. So, you are saying NO to the data centralization. Good luck to keep lookup tables synchronized in 400 databases. Especially, when 2-3 new databases are created per week.
Main point, all the theory scenarios are available but the designer should very carefully choose them based on the specific conditions of a task.
September 8, 2005 at 4:44 pm
Completely agreed with you that from time to time MUCK design is the better design even it is against the relational rules.
September 8, 2005 at 4:47 pm
Ahh, so of course your MUCK table would have the kind of surrogate keys I describe in the article. And of course you are fully aware of the problems that they can lead to, and of course you have never seen these problems actually occur in "real life" and round and round we go...
/*****************
If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek
*****************/
September 8, 2005 at 4:50 pm
If New Orleans had paid attention to relational rules, then they wouldn't be in the trouble they are!
Just kidding, but one can take relational evangelism to far.
Main problem with MUCK designs is that people are quick to take advantage of the opportunities presented by them, but are much less quick to deal with issues arising from them.
Signature is NULL
September 8, 2005 at 4:51 pm
Date's "Incoherence principle" kicks in here and your arguments have all been thouroughly refuted already, so I'm not going to bother again.
Have fun in your theory-free fantasyland.
/*****************
If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek
*****************/
September 8, 2005 at 5:05 pm
Didn't know I was presenting an argument...just stating that I've used a MUCK model under very strict circumstances for a very long time with no problems (honest). I agree the fully normalized approach is the preferred one for 99% of the people 99% of the time. OK, I'm done.
maddog
September 8, 2005 at 5:27 pm
I think we forget what an RDBMS actually is. There are a myriad of ways to store data: embedded in the code ("hard-coded"), in a text file (something.cfg or somethingelse.ini...dare I say it: nothing.xml), there is the registry (if you run on Windows), there are several file-based DBMSes out there, then there are SQL DBMSes (unfortunately the closest thing to a real RDBMS).
Why the choice of a SQL DBMS? Why not one of the others? Certainly one of the other possible platforms would host the idea of a property bag sort of MUCK table some choose to use.
As with most things, choose the tool to fit the job. You have to realize that RDBMSes are...hmmm...predicated on the fundementals of the Relational Model. By adhering to the principles of that model, you benefit from the "truths" it provides you. None of these other systems can make this sort of statement, the promise, that the Relational Model gives you.
The biggest problem with the vended SQL DBMSes is that they give the developer the ability to circumvent the RM, which was the whole purpose of the platform to begin with.
For example: most SQL DBMSes allow you to create tables without Primary Keys. In the RM, no such thing exists. If we had a true RDBMS, the idea of creating such a MUCK table would not even exist because it would be forbidden by the system.
Just because you can do a thing does not necessarily mean you should do a thing. And if you want the garauntees the RM provides you, you must follow the rules.
Sincerely,
Anthony Thomas
September 8, 2005 at 5:40 pm
maddogs, that post was not specifically directed at you.
/*****************
If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek
*****************/
September 8, 2005 at 5:53 pm
Hey Anthony, how's it going?
Anyway, Leo's main argument (although he lacks the understanding necessary to frame it so plainly) is basically as follows: "The current SQL products don't deliver a full implementation of the Relational Model; ergo the Relational Model of data management doesn't fit my needs; ergo it can be lightly dismissed as 'just a theory' that doesn't apply to 'real life'." Talk about pathetic...
/*****************
If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek
*****************/
September 8, 2005 at 6:19 pm
This is endless conversation. Let me say that I am teaching classes "relational database design". So I know and advocate relational theory as much as I can. And I am not advocating the MUCK table, neither that the relational theory is bad. The only things I am trying to say that live is more complicated then the theory and not always you should follow the relational theory road. Sometime when roads are bad off-road vehicles are better then the standard ones. It is very seldom when you need to do something different but when it is necessary it is necessary. I am done!
Viewing 15 posts - 91 through 105 (of 202 total)
You must be logged in to reply to this topic. Login to reply