December 29, 2009 at 12:36 am
Just wanted to know if there is any de-duplication task within SSIS or some way to achieve de-duplication using SSIS.
Dont want to use t-sql here.
thanks
December 29, 2009 at 3:06 am
You can use the Sort transofrmation which enables you to remove duplicate rows. This is one way to remove duplicate rows. Any particual reason you don't want to use SQL for that, though? The Sort transformation is what's called a "blocking" transformation, and does not perform well.
December 29, 2009 at 4:50 am
Aggregation can remove dups too. But it all works faster using the database engine, as already noted.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
December 30, 2009 at 7:03 am
Panks-913490 (12/29/2009)
Just wanted to know if there is any de-duplication task within SSIS or some way to achieve de-duplication using SSIS.Dont want to use t-sql here.
thanks
You may also check the third-party Distinct component. It provides better performance and less memory consumption compared to the standard Sort component.
December 30, 2009 at 12:18 pm
have a similar situation
i'm in the process of reorganizing tables in a database that stores event log data. currently i have a few tables per server and i'm going to merge it into 3-5 tables for all the servers. data goes from the servers into staging tables every night and then into a table depending what kind of server it is.
select s.eventlog, s.recordnumber, s.timegenerated, s.timewritten,
s.eventid, s.eventtype, s.eventtypename, s.eventcategory,
s.eventcategoryname, s.sourcename, s.strings, s.computername,
s.sid, s.message, s.data
from sql_seclog_staging s
left join sql_server_logs l on s.computername = l.computername
and s.recordnumber = l.recordnumber
and s.recordnumber is null
this is my SQL that i'm going to be running but was wondering if using one of the SSIS tasks is better. i only want to import new data
January 11, 2010 at 1:40 pm
The SORT control has a checkbox at the bottom of the control.
check it to remove duplicates. This means you have to sort even if you have no need to ssort to take advantage of the removal of the duplicates.
HTH
IanO
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply