Adding dimension (degenerate?)

  • I have a fact table (several million fairly wide rows) and a dozen dimension tables. There's an additional 20 columns in the fact table, each with a small number of values; some are bits, others tiny ints, or Y/N flags. I considered adding these as degenerate dimensions to allow browsing by them but I gather that it's better for browsing performance to create and populate dimension tables for these, and then add the appropriate foreign keys to the fact table. After having done that they do not appear in the cube designer operation "Add related tables".

    Two questions:

    Am I on the right track creating these dimension tables in the database referred to be the data source?

    Why don't these tables appear when I try to add them as related tables? As I said I added foreign keys to the fact table to them.

  • I discovered that I need to first add these tables to the data source designer; then they can be added via the cube designer.

    I would still like to know if this is the recommended approach or should I create degenerate dimensions? One disadvantage to creating these new regular dimension tables and foreign keys in the database is that they complicate adding additional records to the fact table, should they contain a column value not in a dimension. I can not be sure of all the possible values; the data comes from various clients and is not always clean.

  • From a modelling perspective from what you have said, I would say your best bet is to create a Junk Dimension or Junk Dimensions depending upon the data in these extra 20 cols.

    In that way you have proper dimension tables but should only need to create one or two Junk Dimension tables. In a DW you need to limit the Dimensions to as few as possible otherwise it becomes very difficult to maintain and also can become more complex for business users. Simplicity is the key if you wish your business users to get the max out of the DW.

    Junk Dimensions are useful where you have data columns with cardinality where the data values don't vary a lot such as bit cols or Y/N cols that you have mentioned you have.

    Lookup on Kimball website or similar to find out about Junk Dimensions for further info.

    Hope that helps.

    Paul R Williams.

  • Paul Williams (3/24/2014)


    From a modelling perspective from what you have said, I would say your best bet is to create a Junk Dimension or Junk Dimensions depending upon the data in these extra 20 cols.

    In that way you have proper dimension tables but should only need to create one or two Junk Dimension tables. In a DW you need to limit the Dimensions to as few as possible otherwise it becomes very difficult to maintain and also can become more complex for business users. Simplicity is the key if you wish your business users to get the max out of the DW.

    Junk Dimensions are useful where you have data columns with cardinality where the data values don't vary a lot such as bit cols or Y/N cols that you have mentioned you have.

    Lookup on Kimball website or similar to find out about Junk Dimensions for further info.

    Hope that helps.

    +1 Junk dimension is the way to go.

    Since you have 20 columns, I wouldn't store every possible combination, but only those that actually occur in the data.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply