de-duplication task within SSIS

  • Just wanted to know if there is any de-duplication task within SSIS or some way to achieve de-duplication using SSIS.

    Dont want to use t-sql here.

    thanks

  • You can use the Sort transofrmation which enables you to remove duplicate rows. This is one way to remove duplicate rows. Any particual reason you don't want to use SQL for that, though? The Sort transformation is what's called a "blocking" transformation, and does not perform well.

  • Aggregation can remove dups too. But it all works faster using the database engine, as already noted.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Panks-913490 (12/29/2009)


    Just wanted to know if there is any de-duplication task within SSIS or some way to achieve de-duplication using SSIS.

    Dont want to use t-sql here.

    thanks

    You may also check the third-party Distinct component. It provides better performance and less memory consumption compared to the standard Sort component.

    ---
    SSIS Tasks Components Scripts Services | http://www.cozyroc.com/

  • have a similar situation

    i'm in the process of reorganizing tables in a database that stores event log data. currently i have a few tables per server and i'm going to merge it into 3-5 tables for all the servers. data goes from the servers into staging tables every night and then into a table depending what kind of server it is.

    select s.eventlog, s.recordnumber, s.timegenerated, s.timewritten,

    s.eventid, s.eventtype, s.eventtypename, s.eventcategory,

    s.eventcategoryname, s.sourcename, s.strings, s.computername,

    s.sid, s.message, s.data

    from sql_seclog_staging s

    left join sql_server_logs l on s.computername = l.computername

    and s.recordnumber = l.recordnumber

    and s.recordnumber is null

    this is my SQL that i'm going to be running but was wondering if using one of the SSIS tasks is better. i only want to import new data

  • The SORT control has a checkbox at the bottom of the control.

    check it to remove duplicates. This means you have to sort even if you have no need to ssort to take advantage of the removal of the duplicates.

    HTH

    IanO

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply