Step needed to check for duplications

  • I have created an SSIS package that runs through several Exec SQL Task steps. They mainly import data into a staging table, update certain fields within that table, and perform mathematical and financial calculations on other fields. It then will take the final records, after performing these steps, and insert them into a production table.

    I need to create a step that will read through the staging table, prior to insert into production, and if there are no duplicates... move on to the final insert into production. However, if there are duplicates, I need it to insert those duplicates into a review table, then delete them (?) out of the staging table after their copied to the review table, and then insert the non-duplicated records from the staging table into production.

    I want to know the best way to do this in the BI tool. Does a Loop container sound the best route?

  • I would use a SQL step, just do a group by those columns that would determine that they are duplicate and use having a count > 1.  Simple example:

     

    create

    table #t (

    C1 int,

    C2 int

    )

    insert

    into #t values (1,1)

    insert

    into #t values (2,2)

    insert

    into #t values (2,3)

    insert

    into #t values (3,4)

    select

    C1,count(*) as cnt from #t

    group

    by C1

    having

    count(*) > 1

     

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply