load facts and dimensions and surrogate keys

Question

Post reply

load facts and dimensions and surrogate keys

shailendra-251422

Ten Centuries

Points: 1111
More actions
November 22, 2005 at 5:54 am

#66843

Hi,
while loading data from staging into the warehouse, first the dimension tables are uploaded. then the fact tables.
When loading the fact tables i need the surrogate PKs of the dimension tables. What is the best approach to get the surrogate key for each record?
In the staging area, The fact and dimension records both will have the PKs of the source area. While loading the fact, is it OK to look up the dimension based on the PK of the source system. Can this lookup be avoided? What is the best practice for loading data?
Thanks!

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply

cliffb SSCarpal Tunnel Points: 4598 More actions · Answer 1

The way we have done this is to load the source data into a staging table in a staging database.

Perform validation on the source data

Lookup surrogate keys and apply them to the staging table

Push the changes/new data with the applied surrogate keys.

We do a direct lookup from the warehouse production database to get our surrogate keys. This may not be the best practice, but I thought it might help if you have an idea of how someone else worked with ETL.

david russell-253790 SSCarpal Tunnel Points: 4042 More actions · Answer 2

We haven't gotten to the fact table yet with SSIS... still trying to overcome the little obstacles like converting an integer date in Sybase to an SQL date... and deciphering that "ErrorColumn" message that doesn't seem to relate to a column; but our plan is to do the key lookup in the stream. I guess time will tell whether this approach works or not; but that is the plan today. I will be watching to see how others respond. This could be very helpful.

Cheers,

david russell

PW-201837 SSC-Insane Points: 20805 More actions · Answer 3

Much depends on how you generate the surrogate keys. If you are using Sql Server Identity property on the dimension table, you have no choice really but to load the Dimension first to get the identities assigned, then join back to it on the native/source-system key during fact load.

If this join is expensive *and* if you need to incur that cost many times during fact load (for example, a conformed Account dimension that is referenced by several different fact tables) you can boost fact load performance by building a temporary table in the staging area that is a small subset of the Dimension, containing columns for only the surrogate key and native/source-system keys. You place a 100% fillfactor fully covering index on this, with native keys at the front of the index, and it should end up as a really fast native key to surrogate key translator for your fact load.