SSIS Compare 2 Tables for Archiving

  • Hello everyone...

    I am working on a project that is building an archive database for reporting purposes.

    Our production system has a process to purge data older than 60 days. We have a need to report on the data for a period of three years.

    My plan is to build a new database to store the history of all the data. So my plan is to have my MASTER_TABLE and MASTER_TABLE_STAGING. So every day I will run the SSIS package that will bring all records for the production table into MASTER_TABLE_STAGING. Then I need to figure out a way to merge STAGING to the MASTER_TABLE. I have it working when records do not exist in MASTER_TABLE and do in STAGING using an EXCEPT operator query. However, I just ran in to an issue where the record exists in MASTER_TABLE and STAGING, but there is a column that has data, so the insert fails because of the primary key violation.

    I guess my question is how to build a process that will insert data from the STAGING table that does not exist in MASTER_TABLE and at the same time update MASTER_TABLE with values in STAGING if the record already exists.

    I can use SSIS if there is a good way, or I can create a stored procedure and execute it if that is the easier way.

    Any suggestions on the process would be helpful.

  • SSIS is pretty crummy at doing updates, because it is intended to be used in a pipelined fashion.

    What I'd recommend you do is the following:

    Step 1: Use an Execute SQL Task block to write an UPDATE statement. The update statement should be something like:

    "UPDATE m

    SET m.Fields = s.Fields

    FROM MasterTable m

    JOIN StagingTable s ON m.JoinKey = s.JoinKey"

    This will take care of updating the records that already exist.

    Step 2: Write a SQL query that will grab you the records from your source. Put a Data Flow task in, and an OLE DB Source which uses your SQL Query as a SQL Command option. Make sure that the records are distinct - either use a SELECT DISTINCT, a GROUP BY, or put in a Sort Transform in your package with an eliminate duplicates option.

    Step 3: Put a lookup transform in, and have it be something like "SELECT Field FROM Master_Table". Join it with the field from your source, in order to find the cases where the record already exists.

    Step 4: Take the error output from the lookup transform and connect it to your destination, with the option to redirect rows on error. This will let you take only the records that do not exist, and put them in your destination.

  • Here is an example from BOL for SQL Server 2008 ... Note that the first portion of the code shows how to perform what I believe what you want to do ..

    the second portion demonstrates the use of the MERGE statement which unfortunately is not available in 2005.

    Just hope this might assist you in doing what you want to do

    USE AdventureWorks;

    GO

    CREATE PROCEDURE dbo.InsertUnitMeasure

    @UnitMeasureCode nchar(3),

    @Name nvarchar(25)

    AS

    BEGIN

    SET NOCOUNT ON;

    -- Update the row if it exists.

    UPDATE Production.UnitMeasure

    SET Name = @Name

    WHERE UnitMeasureCode = @UnitMeasureCode

    -- Insert the row if the UPDATE statement failed.

    IF (@@ROWCOUNT = 0 )

    BEGIN

    INSERT INTO Production.UnitMeasure (UnitMeasureCode, Name)

    VALUES (@UnitMeasureCode, @Name)

    END

    END;

    GO

    -- Test the procedure and return the results.

    EXEC InsertUnitMeasure @UnitMeasureCode = 'ABC', @Name = 'Test Value';

    SELECT UnitMeasureCode, Name FROM Production.UnitMeasure

    WHERE UnitMeasureCode = 'ABC';

    GO

    ----Second portion - only available in 2008 and onwards

    -- Rewrite the procedure to perform the same operations using the MERGE statement.

    -- Create a temporary table to hold the updated or inserted values from the OUTPUT clause.

    CREATE TABLE #MyTempTable

    (ExistingCode nchar(3),

    ExistingName nvarchar(50),

    ExistingDate datetime,

    ActionTaken nvarchar(10),

    NewCode nchar(3),

    NewName nvarchar(50),

    NewDate datetime

    );

    GO

    ALTER PROCEDURE dbo.InsertUnitMeasure

    @UnitMeasureCode nchar(3),

    @Name nvarchar(25)

    AS

    BEGIN

    SET NOCOUNT ON;

    MERGE Production.UnitMeasure AS target

    USING (SELECT @UnitMeasureCode, @Name) AS source (UnitMeasureCode, Name)

    ON (target.UnitMeasureCode = source.UnitMeasureCode)

    WHEN MATCHED THEN

    UPDATE SET Name = source.Name

    WHEN NOT MATCHED THEN

    INSERT (UnitMeasureCode, Name)

    VALUES (source.UnitMeasureCode, source.Name)

    OUTPUT deleted.*, $action, inserted.* INTO #MyTempTable;

    END;

    GO

    -- Test the procedure and return the results.

    EXEC InsertUnitMeasure @UnitMeasureCode = 'ABC', @Name = 'New Test Value';

    EXEC InsertUnitMeasure @UnitMeasureCode = 'XYZ', @Name = 'Test Value';

    EXEC InsertUnitMeasure @UnitMeasureCode = 'ABC', @Name = 'Another Test Value';

    SELECT * FROM #MyTempTable;

    -- Cleanup

    DELETE FROM Production.UnitMeasure WHERE UnitMeasureCode IN ('ABC','XYZ');

    DROP TABLE #MyTempTable;

    Go

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply