SSIS Performance problem

  • Hello all,

    I'm designing DW where one of dimensions has over 5M records right now and each year will be added 2M more records. I need to design efficient load process. Slow Changing Dimension transformation processed 1.5M records in 14h and I’ve stopped it. I'm using sql2008r2 on win2008r2 with 40GB RAM and appropriate number of CPUs.

    I have to keep tracking 4 attributes and i can't do it by DateModified field in OLTP because i need to be able to report on old data too.

    Please help!!!:exclamationmark:

  • I tend to avoid using the slowly changing dimension transformation for the poor performance it causes.

    You will be better off taking the logic out, and either using a three step method of Inserts, Updates and deletes using seperate dataflows.

    or alternaitevly use MERGE to handle the inserts/updates/deletes, though i have not fully tested this for performance with large datasets

  • Thanks for the advice!!

    Any other suggestions

  • i think by losing the SCD transformation your performance will improve greatly,

    also have a look at CAT performance best pratices for general SSIS tuning advice.

    http://sqlcat.com/top10lists/archive/2008/10/01/top-10-sql-server-integration-services-best-practices.aspx

    sorry i just realised that this is in the 2005 forum, so ignore what i mentioned about MERGE as that is 2008+ only

  • Although I generally agree that avoiding the SCD is a best practice, I was curious, you are adding 2M rows a YEAR, thats only about about 8K records a work day. It seems all of your pain comes from the initial load and not the daily/monthly maintenance. Did I read it wrong?

    CEWII

  • You're right. BIG telecom company and i CAN'T filter load data sets

  • I'm wondering why your initial load of data is flowing throw SCD, if you have all the data for current and past records it would seem like you could populate the data and then fill in gaps..

    Or am I completely off base..

    CEWII

  • Elliott Whitlow (1/27/2011)


    I'm wondering why your initial load of data is flowing throw SCD, if you have all the data for current and past records it would seem like you could populate the data and then fill in gaps..

    Or am I completely off base..

    CEWII

    First listen to what Elliott is saying, it might save you a lot of work and trouble.

    Secondly: dump the SCD wizard, as already suggested.

    If you really want to speed up your package:

    * use checksums to decide if a row is an update or not, so that you'll only perform an update if an attribute has really changed

    * execute the updates in a set based manner

    * if possible, use partitioning for your inserts (only Enterprise Edition)

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply