Datawarehouse Denormalization

  • Hi,

    I Have a table which I've de-normalized for my DimDestinationLocations. see Below

    How ever due to the amount of potential repeating rows I'm tempted to

    load out the distinct address's into another table and have a look up.

    For every booking there would be a Collection and Destination address, this table is going to get very large.

    Is there any room to do this in Kimball methodology or is it just bad practise.

  • Couldn't insert the image for some reason

    See Below

  • SimonH (11/6/2015)


    Hi,

    I Have a table which I've de-normalized for my DimDestinationLocations. see Below

    How ever due to the amount of potential repeating rows I'm tempted to

    load out the distinct address's into another table and have a look up.

    For every booking there would be a Collection and Destination address, this table is going to get very large.

    Is there any room to do this in Kimball methodology or is it just bad practise.

    Could you have a dim.Location table and then, in fact.Booking, add CollectionKey and DestinationKey, both FKs to dim.Location?

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Thanks Phil.

    That's the plan,

    However

    If a company has a warehouse say in Southampton, and they were sending

    packages from Manchester say every day/every hour.

    The Manchester and Southampton addresses are going to be repeated 1000's of times in the Dimlocations table, each with a unique Natural Key (Booking Key)

    And a surrogate key

    to the Bookings Fact table.

    My Question is could I have just the distinct addresses in DimLoocations,

    and have a bridging table (containing the keys) to the Fact table?

    Or Should I just leave repeating addresses in Dimension table?

    Thanks,

  • SimonH (11/6/2015)


    Thanks Phil.

    That's the plan,

    However

    If a company has a warehouse say in Southampton, and they were sending

    packages from Manchester say every day/every hour.

    The Manchester and Southampton addresses are going to be repeated 1000's of times in the Dimlocations table, each with a unique Natural Key (Booking Key)

    And a surrogate key

    to the Bookings Fact table.

    My Question is could I have just the distinct addresses in DimLoocations,

    and have a bridging table (containing the keys) to the Fact table?

    Or Should I just leave repeating addresses in Dimension table?

    Thanks,

    Assuming that the repetition of addresses is merely a shortcoming of the front-end system and not something that you will ever want to analyse in the DW, I would get rid of the dupes on their way in to the DW.

    As I suggested, if the booking fact table contains FKs as necessary to the de-duped addresses dimension, you have just made things a whole lot simpler without losing anything useful.

    Of course, your ETL process will have to implement a lookup on the locations dim (and have a way of creating new ones before the booking fact population).

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • SimonH (11/6/2015)


    Thanks Phil.

    That's the plan,

    However

    If a company has a warehouse say in Southampton, and they were sending

    packages from Manchester say every day/every hour.

    The Manchester and Southampton addresses are going to be repeated 1000's of times in the Dimlocations table, each with a unique Natural Key (Booking Key)

    And a surrogate key

    to the Bookings Fact table.

    My Question is could I have just the distinct addresses in DimLoocations,

    and have a bridging table (containing the keys) to the Fact table?

    Or Should I just leave repeating addresses in Dimension table?

    Thanks,

    That really depends on your requirements and how that data is stored/represented on the front end. For example if you normalize the address information and link it with a fact table if that address changes it will change all the historical data, that may or may not be desired in a datawarehouse....

  • That really depends on your requirements and how that data is stored/represented on the front end. For example if you normalize the address information and link it with a fact table if that address changes it will change all the historical data, that may or may not be desired in a datawarehouse

    The Address Is linked to a booking so it's added everytime a booking is made.

    It just so happens some client make multiple bookings to the same destination. So that wouldn't be an issue.

    Thanks

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply