ssis group by for data set

Question

Post reply

ssis group by for data set

musalappa@gmail.com

Ten Centuries

Points: 1228
More actions
May 28, 2009 at 11:37 am

#205367

hi,
I have an SQL source table with structure as below
col1, col2, col3, col4, col5
a1 b1 c1 d e
a1 b1 c1 f g
a2 b1 c1 h i
a2 b1 c1 j k
...
I need to import only col1, col2 and col3 fields into destination table and these should be unique.
col1, col2, col3
a1 b1 c1
a2 b1 c1
...
Currently I am using group by clause in OLE source sql cmd to do this. Is there a better way to accomplish this?

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

bc_ Hall of Fame Points: 3175 More actions · Answer 1

Is this what you mean?

SELECT DISTINCT col1, col2, col3

FROM yourTable

[font="Arial Narrow"]bc[/font]

musalappa@gmail.com Ten Centuries Points: 1228 More actions · Answer 2

musalappa@gmail.com

Ten Centuries

Points: 1228

May 28, 2009 at 12:29 pm

#1001653

select col1, col2, col3 from group by col1, col2, col3

bc_ Hall of Fame Points: 3175 More actions · Answer 3

bc_

Hall of Fame

Points: 3175

May 28, 2009 at 12:43 pm

#1001665

[font="Arial Narrow"]bc[/font]

bc_ Hall of Fame Points: 3175 More actions · Answer 4

don't know what happened to the prior post....here it is.

I misunderstood what you were asking for. Usually you would use a GROUP BY to get some aggregate like MIN, MAX, SUM, COUNT, etc. Either of these queries accomplishes what you stated, gets the unique combination of values in the 3 columns.

SELECT DISTINCT col1, col2, col3

FROM yourTable

SELECT col1, col2, col3

FROM yourTable

GROUP BY col1, col2, col3

[font="Arial Narrow"]bc[/font]

musalappa@gmail.com Ten Centuries Points: 1228 More actions · Answer 5

currently i am building the recordset in OLE Source using T-SQL. I want to know if there is a way to do it in ssis itself. I am testing aggregate transform but want to know how its done usually.

bc_ Hall of Fame Points: 3175 More actions · Answer 6

My two cents would be to use the TSQL. I've never tested the ssis Aggregate transform, but would be surprised if it outperformed sql server (of course it probably depends :-)). Plus you don't have as much data to port across by letting the database engine handle it.

Perhaps others could provide a counterpoint.

[font="Arial Narrow"]bc[/font]

Phil Parkin SSC Guru Points: 247227 More actions · Answer 7

bc (5/28/2009)
My two cents would be to use the TSQL. I've never tested the ssis Aggregate transform, but would be surprised if it outperformed sql server (of course it probably depends :-)). Plus you don't have as much data to port across by letting the database engine handle it.
Perhaps others could provide a counterpoint.

I agree with this.

RBarryYoung SSC Guru Points: 143329 More actions · Answer 8

I think that the aggregate transform component can do this also.

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

bc_ Hall of Fame Points: 3175 More actions · Answer 9

I did a little test yesterday afternoon against a query with 2 varchar(5) columns, ~250k rows (109 - 123 distinct records*). I built 2 Data Flows, with one Data Flow using the Aggregate transform and one using a Select Distinct query. The Aggregate transform data flow consistently takes 8-9 seconds, the Select Distinct < .25 second in debug mode.

*The other difference that may or may not affect you. My database is case-insensitive, the aggregate transform even with all the comparison flags checked returned additional records that appeared to be duplicates (had blank values).

[font="Arial Narrow"]bc[/font]