Merge,Look Up,SCD which one is good for millions of records

  • Hi,

    I am new to SSIS.

    I have to load 50 millions of records(full load) into a SQL Server table.

    After the full load I have to create another package for incremental load based on Hash value.

    Please suggest me one of the below mentioned approach to get good performance.

    -Merge command

    -Look Up

    -SCD

    Thanks in advance

  • Use the Merge, of those 3.

  • Hi Rocky´s

    I totally agree with that, the use of the merge command is for me the better option. I designed several ETL packages the last year and I tried the 3 options, and at the end I always used the Merge command.

    Here I leave you a code snippet from my personal library. This code makes the following:

    - Log the start time and name of the package in a custom log table with an identity column

    - Merge into a target table the results of a query (as a source). You can adapt the source query to your needs

    - The columns that match the merge condition are updated, if not, they are inserted.

    - At the end the finish time is log together with the row count of updated and inserted records.

    -- I use a table to log the start and finish time, package name, number of rows inserted and number of rows updated

    insert into SSIS_Log

    (PackageName,DTStart)

    values

    (?,GetDate()); -- Map the System variable "PackageName"

    declare

    @mergeResultsTable table (MergeAction VARCHAR(20));

    declare

    @insertCount int,

    @updateCount int;

    MERGE Table_A AS TARGET

    USING (

    SELECT

    Col1

    ,Col2

    .....

    ,ColN

    FROMTable_B

    ) AS SOURCE

    ON (TARGET.Key1 = SOURCE.Key1 and TARGET.Key_2 = SOURCE.Key_2)

    -- Update

    WHEN MATCHED THEN UPDATE SET

    TARGET.Col1= SOURCE.Col1

    ,TARGET.Col2= SOURCE.Col2

    ,TARGET.UpdateDT=GetDate() -- I create a UpdateDT column in the target table to store the update datetime

    -- etc

    --Insert

    WHEN NOT MATCHED BY TARGET THEN INSERT

    (

    Col1

    ,Col2

    ,CreateDT -- I create a CreateDT column in the target table to store the insert (or creation) datetime

    -- etc

    )

    VALUES

    ( SOURCE.Col1

    ,SOURCE.Col2

    ,GetDate()

    -- etc

    )

    OUTPUT $action into @mergeResultsTable;

    update SSIS_Log

    set UpdatedCount = -- Log the count of updated rows

    (SELECT COUNT(*)

    FROM @mergeResultsTable

    WHERE MergeAction = 'UPDATE')

    , InsertedCount = -- Log the count of inserted rows

    (SELECT COUNT(*)

    FROM @mergeResultsTable

    WHERE MergeAction = 'INSERT')

    ,DTFinish = GetDate()

    where Idx =

    (

    select MAX(Idx)

    from SSIS_Log

    );

    If your target table should be treated as a SCD I strongly recommend to you the following article by Adam Aspin:

    Slowly changing dimensions using T-SQL MERGE[/url]

    If you have further question just write me.

    Kind Regards

    Paul Hernández

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply