SSIS Package with SCD

  • Hi All,

    Firstly I am new to these forums and apologies if this question has already been answered. I have been looking for some time now, however maybe I am not looking in the right context for the solution.

    Here is the issue,

    I have been tasked with creating a data warehouse that is in fine sync, 8 hours out of a master 30GB live SQL database.

    Now its fairly simple in SSIS to move the data, and in a very quick way... however, I need changes to the live to reflect in the warehouse.

    I am using Slowly Chaning Dimensions, and the fact is these do work, however, on some of the tables with larger record sets, it takes a beyond reasonable amount of time to complete... Therefore not viable,

    So my question is, am I doing something wrong with the SCD's or is there a better way to achieve what I am looking for? Ive tried using two lookups with conditions, but seems a very cumbersome way to achieve this...

    I have for a simple workaround created a package that basically truncates the warehouse tables at night, then bulk imports back in, this takes a matter of minutes... I know this isnt ideal but right now this seems to be a quick solution.

    Any information would be appreciated a lot!

    Thanks!

  • There are many solutions and also several questions. when you say you are doing a datawharehouse are you just moving the data from tableA in the source to TableA in the DW or are you breaking everything up into Dimension and Fact tables? A rather simple solution might be to add a column to each table to record the shipped time and then just grab ne records on load and update the field. If there is already some kind of timestamp field then you can use that. You may also be able to use log shipping to accomplish your goal depending on what the DW is used for.

    Dan

    If only I could snap my figures and have all the correct indexes apear and the buffer clean and.... Start day dream here.

  • Thanks Dan,

    The bottom line on what I am trying to achieve is as close to real time mirror of say table A in the source to Table A in the destination. As rows are added / edited in the live from the main system to reflect. The SCD as I said works fine in principle when I tested on a small table and I was really pleased with the results, but when I applied to some larger ones it just took forever.

    It is to try and get reports as real time as possible without any affect on the live. Since my post I have just stumbled across a custom component from http://www.cozyroc.com/ssis/table-difference. I have not used it yet as I have only just found it.

    I do have a timestamp within the collection, so maybe that is some food for thought... Would the records not grow at a high rate though?

    Thanks

    Luke

  • I would suggest you looking into replication services and log shipping and examine those to options. With replication services you could infact be realtime.

    Dan

    If only I could snap my figures and have all the correct indexes apear and the buffer clean and.... Start day dream here.

  • Excellent, Ill look into replication services too... Thankyou 🙂

  • Dan.Humphries (1/27/2012)


    I would suggest you looking into replication services and log shipping and examine those to options. With replication services you could infact be realtime.

    I agree with Dan here (not just because he's called Dan!) because what you've described sounds ideal for the SQL Server Replication options and for the subscriber being updated so that it's almost in sync with the source/published data 🙂

  • There's a good blog post here from the SSIS team:

    http://blogs.msdn.com/b/mattm/archive/2009/11/13/handling-slowly-changing-dimensions-in-ssis.aspx%5B/url%5D

    In my experience, the built in SCD transform is too slow to be usable in most situations I've dealt with, I tend to dump everything to a staging table in a fast load and then use MERGE to do this. There are some 3rd party SCD components, which are meant to be a lot faster, but I've never used them

  • Thanks for the feedback guys, really appreciate it!

    I have looked into replication services, however our database has been around since the 90's and developed into a beast which it is now. We only need a finite amount of data, and the plan going further is to create an OLAP cube from the data warehouse.

    I have since tried using a merge lookup between the souce and my data warehouse, I have then used a conditional split... I only need a type 1 SCD - so just a traditional insert or update is required. This seems to be working a lot faster than the inbuilt SCD in SSIS. The only downside of this is there is a lot of manual steps to create as I have about 55 tables in my data warehouse.

    Does anyone forsee any issue with my plan? No worries if not, but as always your help is appreciated.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply