Reconciliation Approach

  • The boss wants a reconciliation import to be part of the ETL process. First, it should check to see if the source table's record count is the same as the destination table's record count. If not, the task should extract rows from the source data and destination data in 15-day increments based on the MaxDate column in each table. The SSIS package should then check each chunk of records to reconcile the differences until all differences between the destination is the same as the source. I guess it should loop like this:

    Source Destination

    Loop 1 x y

    Loop 2 x-15 days x-15 days

    And so on. If there is a row in the destination that was not in the source, then the row should be DELETED from the destination. If the row in the destination is different from the equivalent row in the source, the destination row should be UPDATED. If the row in the source does not have an equivalent in the destination, the row should be INSERTED into the destination.

    What is the most efficient way to pull this off? And does it make sense to loop through the tables and extract chunks of data because the tables are large?

  • I don't know what "large" means for you, and this will make a difference.

    And it also makes a difference in what version of SQL Server you're on: 2005 or 2008/R2?

    If your on one of the 2008 versions, I'd recommend using the MERGE statement to do all of this.

    And if you're on 2005, I'd actually recommend using DELETE, UPDATE and INSERT statements to do what you need.

    Yep, I'd skip SSIS completely with this.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • imani_technology (10/5/2010)


    And so on. If there is a row in the destination that was not in the source, then the row should be DELETED from the destination. If the row in the destination is different from the equivalent row in the source, the destination row should be UPDATED. If the row in the source does not have an equivalent in the destination, the row should be INSERTED into the destination.

    WayneS (10/5/2010)


    If your on one of the 2008 versions, I'd recommend using the MERGE statement to do all of this.

    And if you're on 2005, I'd actually recommend using DELETE, UPDATE and INSERT statements to do what you need.

    I have few questions:

    Why not implement the SCD transformation for the above dataset.

    For the DELETED Part Left join would suffice

    Raunak J

  • Also these days everybody is talking a lot about Change Data Capture...which is also a worth time for this issue

    Raunak J

  • We are using SQL Server 2008 R2. From what I understand, we are going to deal with about a million rows.

  • imani_technology (10/5/2010)


    We are using SQL Server 2008 R2. From what I understand, we are going to deal with about a million rows.

    This really shouldn't blow your brains off when you are dealing with data warehousing :-):-):-)

    Raunak J

  • I agree, but my boss thinks the performance will be better if I break the data up into 15-day chunks. Do you think that would be necessary? It would be nice if we could skip the looping.

  • It is always better to load only the changed data to the warehouse. Yes your "BOSS" stands correct!!!:-)

    Raunak J

  • Yes, but he wants me to compare and change the data in 15-day chunks. Would putting that into a loop necessarily boost performance?

  • You must understand that implementing a MDM or CDC is far superior than batch loading...the CDC actually tries to eliminate the batch window(15 day in your case)

    Give some time and study the CDC methodology:-)

    Raunak J

  • The boss would like us to move toward an MDM scenario. Where can I learn more about that in a Microsoft context?

  • Thanks for the insight! I have done some research on CDC, which is leading me to more questions. Is it okay for me to start a new thread reagarding Oracle, SSIS, and CDC?

  • Absolutely. But remember not to cross post.:-)

    Raunak J

  • How about this?

    If

    source.Inventory_Item_Id = destination.Inventory_Item_Id

    AND

    source.Creation_Date = destination.Creation_Date

    AND

    source.Last_Updated_Date <> destination.Last_Updated_Date

    Then start "reconciliation" process.

    Can I use a Lookup to do this? Will this properly prevent too many source rows from being processed?

  • Also, is the looping process that the boss wants really necessary? Is breaking up the data into 15-day chunks going to improve performance or slow it down? After all we're only talking about a miliion rows or so.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply