Adding a new Dimension

  • What is the standard practice for adding a new dimension (a totally new table, not a new row to an exisiting dimension table)? Wouldn't you have to redo the entire fact table, to add a new field, and then populate it based on the key in your new dimension table?

  • To add a new dimension, you first populate the new dimension (obviously :-)) in the datawarehouse.

    Then you add a new surrogate key field in the fact table and you change the ETL to take into account this new dimension.

    When new records arrive for the fact table, the new surrogate key should be populated automatically.

    But what with older records? Here you have multiple options:

    * leave them NULL (could cause issues in reports)

    * if the business key of the dimension was always in the fact data, but you just didn't use it before, you could reload the facts for just that one dimension. (so basically update the new surrogate key with archive data)

    * place a dummy surrogate key (for example -1). In your dimension, you have the description "Dimension not yet available" for the -1 key value. Or "Unknown" or "Not Applicable", whatever floats your boat 🙂

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • * leave them NULL (could cause issues in reports)

    I don't recommend this. All dimension keys should have some value. Implement the third suggestion in the above post. I use zero for a NULL -- Unknown and a -1 for a NULL -- Not Applicable (the users just see UNK and N/A). If there is a value but it just hasn't been used before, I would start by populating the new column with zero until I could update the old data. If it hasn't been used before, or wouldn't be used by most records, I would use -1 until I could make the appropriate arrangements. I suppose there's no reason that you can't even create a temporary -2, Needs to be added, or some such, though I've never gone that route before.

    I've had to do this many times, and I usually create a second download that I have to update manually until I can work the new download into the main ETL, which often takes some time.

  • RonKyle (4/19/2011)


    * leave them NULL (could cause issues in reports)

    I don't recommend this. All dimension keys should have some value. Implement the third suggestion in the above post. I use zero for a NULL -- Unknown and a -1 for a NULL -- Not Applicable (the users just see UNK and N/A). If there is a value but it just hasn't been used before, I would start by populating the new column with zero until I could update the old data. If it hasn't been used before, or wouldn't be used by most records, I would use -1 until I could make the appropriate arrangements. I suppose there's no reason that you can't even create a temporary -2, Needs to be added, or some such, though I've never gone that route before.

    I do like your approach to implement separate markers for "A" and "I" Null Values but I'm a little concerned about the use of zero - as one of them - which may create some kind of confusion. Either way, nicely done.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • RonKyle (4/19/2011)


    * leave them NULL (could cause issues in reports)

    I don't recommend this. All dimension keys should have some value. Implement the third suggestion in the above post. I use zero for a NULL -- Unknown and a -1 for a NULL -- Not Applicable (the users just see UNK and N/A). If there is a value but it just hasn't been used before, I would start by populating the new column with zero until I could update the old data. If it hasn't been used before, or wouldn't be used by most records, I would use -1 until I could make the appropriate arrangements. I suppose there's no reason that you can't even create a temporary -2, Needs to be added, or some such, though I've never gone that route before.

    I've had to do this many times, and I usually create a second download that I have to update manually until I can work the new download into the main ETL, which often takes some time.

    Hey, I was just stating all of the options. 😀

    Not all options are as good as other options. And I placed a warning next to it. 🙂

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Sorry, but in my view, it's not a valid option. I appreciate this isn't universally accepted, but there are many who would say the same thing.

  • I do like your approach to implement separate markers for "A" and "I" Null Values but I'm a little concerned about the use of zero - as one of them - which may create some kind of confusion. Either way, nicely done.

    Thanks. As for the confusion, I will admit that after years of having no issues with this system, I've run into a couple of issues with two new dimensions where this has been an issue, and for these I've had to use -2 for UNK so that 0 could be reserved for No. Overall, however, it has worked well, and I don't yet have a good consistent solution for my new issue.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply