Check Constraints versus Lookup Tables

  • I am rewriting an exisiting Visual FoxPro in Sql Server, and have normalised the existing tables, and introduced the use of lookup tables to cut down on data entry errors. As a result of this, to retrieve data there are quite a lot of joins required.

    I realised that some of the Lookup tables would only have a few differing values in them, and was wondering whether rather than having a lookup table it would be better to put a check constraint onto certain columns, as that would cut down on the number of joins. One example is a frequency column, that currently only has 3 different values 'Mth','Qtr','Ann', and I can't really see there being many more that can be added to that list. My boss wants me to normalise as far as possible, but I am not sure what to do.

  • Not sure how a check constraint will help with joins and retrieving data, but they supposed to be faster than joins in inserting data.

    Steve Jones

    sjones@sqlservercentral.com

    http://www.sqlservercentral.com/columnists/sjones

    The Best of SQL Server Central.com 2002 - http://www.sqlservercentral.com/bestof/

    http://www.dkranch.net

  • Yes - I do believe that check constraints are faster than foreign key constraints. If you are submitting through a stored procedure, it is even faster to just test the submission for these types of values.

    Guarddata-

  • More for convenience than for speed, you might want to consider building a view based on the various table joins. Makes accessing the data easier. Also give you the option later of changing the underlying data from joins to a check constraint without having to modify your code which accesses the data.

  • I would favour using a lookup table even for these small domains. Although you are right that check constraints would would involve fewer joins (including the SQL Server internal joinlike operation involved in enforcing FK constraints) and therefore be faster, the advantages of using a lookup table are:

    Normalization - a check constraint means you are holding repeated data values - with the extra space requirement that that involves.

    Transparency - I like to make my databases as self-documenting as possible, and with a lookup table, it is more evident that ('Mth','Qtr','Ann') represents a set of frequencies - you can also have a desciption column as well as a name cloumn, for additional clarity.

    Ease of maintenance - if you do need to change the values in this column, anmd espacially if the values might be used in several columns, it would be easier to maintain the domain values if they are held in a table.

    I reckon that on the whole these advantages outweigh any (probably minimal) performance gain from using check constraints, and the and tyhe additional effdort involved in needing extra joins can be reduced if you follow rprice's recommendation of creating views on fact tables and their associated lookup tables.

    If you really do want to dispense with lookup tables, an alternative might be to create a user-defined 'frequency' data type with a rule bound to it which restricts values to {'M','Q','Y'}. That way you can keep the maintenance and some of the transparency advantages, while avoiding the extra joins and probably achieving performance similar to that of a check constraint. Unfortunately, it looks like rules are being phased out in favour of ANSI CHECK contraints, so you may prefer not to use them...

    Tim Wilkinson

    "If it doesn't work in practice, you're using the wrong theory"
    - Immanuel Kant

  • That's a tough call. I think for a very small set where you really really don't think it will change, I'd use the constraint. One consideration is if you are storing a code (1=Open, 2=closed), then you usually need to translate it somewhere and so may as well make a lookup table. Then again, you can always do both, just not enforce the fkey relationship with the lookup table!

    Andy

    http://www.sqlservercentral.com/columnists/awarren/

  • It is common in the MS books to see them suggest using Constraints over FKeys where you can (sorry loaned my books out or I would give a spocific quote0> In case such as you and Andy describe is where you get the real bennifit since you are decreasing the IO needed for insterts and updates due to FKey reads that occurr. Another prime example is if you are in the US and developing a mailg DB but only mail to contiguious US locations (49 locations when you count DC) so a Constriant of between 1 and 49 would be better to use.

    However there of course is a downside. Suppose the unexpected happens and you end up adding an item to the table you thought wouldn't change. You need to make sure you go back and fix your costraint or you will be doing a considerable amount of troubleshooting (especially someone else may). I usually create diagrams and inlcude the associated table with a text note exapling the relationship and that a constraint is used to enforce as opposed to an FKey.

    Now to add futher insult to all the above. If the related table is used a lot then keep in mind the PK is a contsraint on that table and is now loaded into memory which means for small number of values the FK relationship may operate just as fast as a plain Constraint simply because the number of values aren't enough to create a huge performance difference between doing an index seek on the PK or compare between 1 and 3. Even the IO on the Hard Drive may not cause that much of an issue.

    It all boils down to the main thing you should always do. Test it both ways when you have data and adjust as needed. You will find yourself always adjusting database to optimize their performance and should on a pretty regular basis revaluate your design.

  • Another two cents on the lookup table issue: what always bugs me is the need to have so many lookup tables. When fully normalized, you can end up with loads of them, and most will have under a dozen rows... and that just seems wasteful for a lot of admittedly trivial reasons (hard drive storage, documentation, intricate FK relationships, and general required support nuisance value).

    These tables all look pretty much the same:

    VALUE

    (

    Value_Code tinyint

    -- The "quick and dirty" identifying value. Primary key.[\i]

    ,Name varchar(50)

    -- A common name/title for the value[\i]

    ,Description varchar(500)

    -- A more detailed explanation/reminder of what it is[\i]

    )

    (And of course there's tons more stuff you could add--when code was added, who added it, is it still active, and so forth.)

    In one solution that I read (a discussion thread here at SQL Central months ago), the writer proposed creating a single lookup table to hold all lookup tables. Something like:

    LOOKUP_VALUES

    (

    Lookup_Value_ID int

    -- This could be an identity column, or values could be hand-selected. Primary key.[\i]

    ,Lookup_Category varchar(20)

    -- Brief label used to identify and group related lookup values[\i]

    ,Value_Code tinyint

    Lookup_Category and Value_Code would be a candidate (unique) key[\i]

    ,Name varchar(50)

    ,Description varchar(500)

    )

    The main advantage here is you've got "one stop shopping" for all your lookup values. The main disadvantage, to my mind, is that you lose having the "Value_Code" in your data tables. This value is often very critical to applications (think header files and IF and CASE statements), and to retrieve the "real" value you'd always have to join into the LOOKUP_VALUES table. (This is one reason you might want to hand-select the Lookup_Value_ID values.)

    By and large, I still go with individual lookup tables for all the reasons everyone else has already said. However, I suspect that there are situations where this "master lookup table" theory would be very useful.

    Philip

  • I would look at the application that will be used to access your data. Most of our apps add the requirement that the application check for valid values so that it can present clearer and friendlier error messages. If the app can't hit the lookup table for valid values and give the needed performance, then the programmer is probably going to hard code the values. If they do then the lookup table just gets in the way and I would go with the constraint. In this case you are already looking at change control to add a new value anyway. IF the app will use the lookup table go with that. A single column lookup table doesn’t really add a join most of the time. The value is in the main table. The lookup table is only used to validate new data or changes.

  • You could have them query to get the constraint. Not as nice perhaps, but doable.

    Andy

    http://www.sqlservercentral.com/columnists/awarren/

  • quote:


    Another two cents on the lookup table issue: what always bugs me is the need to have so many lookup tables. When fully normalized, you can end up with loads of them, and most will have under a dozen rows... and that just seems wasteful for a lot of admittedly trivial reasons (hard drive storage, documentation, intricate FK relationships, and general required support nuisance value).

    These tables all look pretty much the same:

    VALUE

    (

    Value_Code tinyint

    -- The "quick and dirty" identifying value. Primary key.[\i]

    ,Name varchar(50)

    -- A common name/title for the value[\i]

    ,Description varchar(500)

    -- A more detailed explanation/reminder of what it is[\i]

    )

    (And of course there's tons more stuff you could add--when code was added, who added it, is it still active, and so forth.)

    In one solution that I read (a discussion thread here at SQL Central months ago), the writer proposed creating a single lookup table to hold all lookup tables. Something like:

    LOOKUP_VALUES

    (

    Lookup_Value_ID int

    -- This could be an identity column, or values could be hand-selected. Primary key.[\i]

    ,Lookup_Category varchar(20)

    -- Brief label used to identify and group related lookup values[\i]

    ,Value_Code tinyint

    Lookup_Category and Value_Code would be a candidate (unique) key[\i]

    ,Name varchar(50)

    ,Description varchar(500)

    )

    The main advantage here is you've got "one stop shopping" for all your lookup values. The main disadvantage, to my mind, is that you lose having the "Value_Code" in your data tables. This value is often very critical to applications (think header files and IF and CASE statements), and to retrieve the "real" value you'd always have to join into the LOOKUP_VALUES table. (This is one reason you might want to hand-select the Lookup_Value_ID values.)

    By and large, I still go with individual lookup tables for all the reasons everyone else has already said. However, I suspect that there are situations where this "master lookup table" theory would be very useful.

    Philip


  • I have been working on a very similar issue recently on an upgrade from Access to SQL Server. The design required slight changes also.

    Interestingly the original design had one common lookup table for all the other lookup tables as has been suggested here, but I found that a bit messy (my own personal taste).

    Essentially I have used check constraints for those 'tables' with few values and that are unlikely to change in the future (I don't agree with who said check constraints contain repeated data values - they shouldn't!)

    For others where there are more values I have used tables with PK-FK constraints.

    You must also consider whether the app. will be used to add further values or whether this will be done through the back end as using the lookup tables is easier for this (otherwise the whole constraint has to be dropped and re-created).

    Paul R Williams.

  • Something I have issues with in Databases is when things are hidden. And 2 areas this can easily happen are: Triggers and Constraints. Now Triggers are the nasty one because its like a hidden black box, and its easy to forget that those actions are happening.

    Now I know Contraints are a good idea. But something I'm about to start embarking on is looking at some form of DataDictionary, with some method of validating the data via stored procs. Now this might not be as elegant as Contstraints, but does allow for more flexibility in the ways data can be constrained. As well its puts all the constraints into a single pool, which makes it eas to visually see what is going on, and to manage.

    Its very dangerous to have hidden or black box type processes going on. This week we have learnt that when we take data from our pre-payroll system and export it to the Payroll system, that when it imports the data it likes to change some of the codes... some of the time. Now this is something that should be part of the Pre-Payroll system, where everyone can see it. Because it was hidden, its been causing problems for years and we've been paying out a lot of extra $$$$$ to employees that we didn't need to.

    Cheers,

    Quentin

  • I have seen databases that did not use lookup tables shrink dramatically once converted to use them. This will help offset if not completely offset any performance issue from joining to lookup tables.

    Another thing to do is use something like a char(4) with a good abbreviation as the PK and then have a nice description field. Then on most of your reports etc you will not need to join back to pick up the good description.

    If you have binary condition (1=Open, 2=closed) use a bit field.

    In summary having a lookup table and fk that was less necessary has never hurt. But not using one where it should have been always comes back as a problem. (Bad data, new business rule)

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply