load facts and dimensions and surrogate keys

  • Hi,

    while loading data from staging into the warehouse, first the dimension tables are uploaded. then the fact tables.

    When loading the fact tables i need the surrogate PKs of the dimension tables. What is the best approach to get the surrogate key for each record?

    In the staging area, The fact and dimension records both will have the PKs of the source area. While loading the fact, is it OK to look up the dimension based on the PK of the source system. Can this lookup be avoided? What is the best practice for loading data?

    Thanks!

  • The way we have done this is to load the source data into a staging table in a staging database.

    Perform validation on the source data

    Lookup surrogate keys and apply them to the staging table

    Push the changes/new data with the applied surrogate keys.

    We do a direct lookup from the warehouse production database to get our surrogate keys. This may not be the best practice, but I thought it might help if you have an idea of how someone else worked with ETL.

     

  • We haven't gotten to the fact table yet with SSIS... still trying to overcome the little obstacles like converting an integer date in Sybase to an SQL date... and deciphering that "ErrorColumn" message that doesn't seem to relate to a column; but our plan is to do the key lookup in the stream.  I guess time will tell whether this approach works or not; but that is the plan today.  I will be watching to see how others respond.  This could be very helpful.


    Cheers,

    david russell

  • Much depends on how you generate the surrogate keys. If you are using Sql Server Identity property on the dimension table, you have no choice really but to load the Dimension first to get the identities assigned, then join back to it on the native/source-system key during fact load.

    If this join is expensive *and* if you need to incur that cost many times during fact load (for example, a conformed Account dimension that is referenced by several different fact tables) you can boost fact load performance by building a temporary table in the staging area that is a small subset of the Dimension, containing columns for only the surrogate key and native/source-system keys. You place a 100% fillfactor fully covering index on this, with native keys at the front of the index, and it should end up as a really fast native key to surrogate key translator for your fact load.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply