Best Practice Question - Reference Tables

  • Hello, I am writing to see what the best practice or just how other people handle this case with reference tables.

    I am working with a legacy db system that is quite a mess and have been working for the last 3 years to bring it up to DB standards.

    I have a question about reference tables: our system is basically a logistical supply chain system and our transactions are in the form of "requests" that get approved to change data cataloging.

    We have reference tables that were built just like this:

    ITEM_CODE ITEM_DESCRIPTION

    C Something Descriptive

    D Hi Im a Description

    Z Arent I descriptive?

    But guess what? Item Codes change and descriptions can change. The "requests" that get processed are tagged with the item_code but the description is a lookup at the time of query.

    So, if the request is done in 2005 and at that time item_code C was "Something Descriptive" but in 2012 the description for C changed to "Something Totally Different" - the 2005 request would now display with the description from 2012 when it should really still be showing the description from 2005.

    I of course can add flag fields for is the record inactive or not or effective date the record... but since the descriptions are not stored on the request.... how would you handle being able to retrieve the old description for old stuff and only pulling the new description for new stuff?

    Would I just add the inactive_flag and/or effective dated records to the table and then programmatically say things like if today's date is greater than or equal to the effective date then use the current description and else use the next to last description?

    Just curious what the BEST method would be for this... we (I) want to minimize hard coding data as much as possible.

  • You should have a primary key on your table. Since you have two columns and the values can be changed for either of them you should add an identity column. Then you store the identity value in your main table. Otherwise you are fighting a losing battle.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • The best way to fix this is to model it so that a code and a concept of date (maybe date and time, depending on how much granularity you need) is part of the business key for the reference table. "C" means something different in 1982 than it does today, so it's a different thing. Just adding a new ID column won't fix this underlying design issue.

    Once you have that, then you can relate the child table to the proper reference data row.

    Having said that...I've been successful in some cases like this in convincing the business that changing codes to mean radically different things is bad for everyone, not just the application. People can also mess this up. The whole point of a code is to substitute meaning for something else. Making it time dependent just adds to the confusion.

  • Thanks, so my idea of adding an effective_date column and then doing the if today is greater than or equal to the effective_date use X row otherwise use the older record is the best way? I agree it needs a primary key but that was secondary to my other issue. 🙂

    And yea, I'm totally with you about reference data not changing... like HELLO make a new code if you want a new description!

  • Time periods can be tricky in design. If the codes have no overlapping time periods, I'd go for effective date.

  • Thank you appreciate the feedback!

  • I didn't realize the entire scope of the issue on my first post.

    You may have to go so far as adding an ItemCodeEffectiveDate to your main table to so you know which version to get. You can't just assume there is only one "old" version. There will likely be many. Given the complexities it might almost make sense to denormalize the code and description and just put them in your base table. That is not the best choice but often you need to denormalize to maintain historical perspective of stuff like this.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Thanks! Yes, I was just simplifying my response but yup totally agree with you.

    Unfortunately we cannot alter request table to include the description as this would impact the web based front end and we are not allowed to make changes that affect the application itself without funding. I would certainly have done as you suggested if we were permitted. 🙁

  • amy26 (12/18/2013)


    Hello, I am writing to see what the best practice or just how other people handle this case with reference tables.

    I am working with a legacy db system that is quite a mess and have been working for the last 3 years to bring it up to DB standards.

    I have a question about reference tables: our system is basically a logistical supply chain system and our transactions are in the form of "requests" that get approved to change data cataloging.

    We have reference tables that were built just like this:

    ITEM_CODE ITEM_DESCRIPTION

    C Something Descriptive

    D Hi Im a Description

    Z Arent I descriptive?

    But guess what? Item Codes change and descriptions can change. The "requests" that get processed are tagged with the item_code but the description is a lookup at the time of query.

    So, if the request is done in 2005 and at that time item_code C was "Something Descriptive" but in 2012 the description for C changed to "Something Totally Different" - the 2005 request would now display with the description from 2012 when it should really still be showing the description from 2005.

    I of course can add flag fields for is the record inactive or not or effective date the record... but since the descriptions are not stored on the request.... how would you handle being able to retrieve the old description for old stuff and only pulling the new description for new stuff?

    Would I just add the inactive_flag and/or effective dated records to the table and then programmatically say things like if today's date is greater than or equal to the effective date then use the current description and else use the next to last description?

    Just curious what the BEST method would be for this... we (I) want to minimize hard coding data as much as possible.

    I agree with Karen that the BEST method would be not to ever change descriptions but we know how that goes. She also mentioned the second best method and to put a few more wheels on that suggestion, it's called a "Type 2 Slowly Changing Dimension". Please see the following URL for more information on that.

    http://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2

    A bit contrary to that article, I strongly recommend that you do NOT leave the current active row with an empty EndDate. Instead, us '99990101' (the shortcut is just '9999' and it will come out to be 9999-01-01 in a DATETIME column). This will make your queries a whole lot easier and SARGable.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • You might want to read Ralph Kimball's The Data Warehouse Toolkit. (Or at least have a look.) It's not as scary as it sounds, really. He explains what slowly changing dimensions are and how to implement them in your design. If you follow his example, you will be able to summarize your data easily.

    Happy reading!

  • Thanks but I am not using a star schema or doing dimensional modeling. 🙂 Much appreciated though... I've actually met Ralph Kimball and gotten his autograph... hehe. 😉 But of course could be helpful to someone else if they read my question with a similar issue. 🙂

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply