ssis group by for data set

  • hi,

    I have an SQL source table with structure as below

    col1, col2, col3, col4, col5

    a1 b1 c1 d e

    a1 b1 c1 f g

    a2 b1 c1 h i

    a2 b1 c1 j k

    ...

    I need to import only col1, col2 and col3 fields into destination table and these should be unique.

    col1, col2, col3

    a1 b1 c1

    a2 b1 c1

    ...

    Currently I am using group by clause in OLE source sql cmd to do this. Is there a better way to accomplish this?

  • Is this what you mean?

    SELECT DISTINCT col1, col2, col3

    FROM yourTable

    [font="Arial Narrow"]bc[/font]

  • select col1, col2, col3 from

    group by col1, col2, col3

  • [font="Arial Narrow"]bc[/font]

  • don't know what happened to the prior post....here it is.

    I misunderstood what you were asking for. Usually you would use a GROUP BY to get some aggregate like MIN, MAX, SUM, COUNT, etc. Either of these queries accomplishes what you stated, gets the unique combination of values in the 3 columns.

    SELECT DISTINCT col1, col2, col3

    FROM yourTable

    SELECT col1, col2, col3

    FROM yourTable

    GROUP BY col1, col2, col3

    [font="Arial Narrow"]bc[/font]

  • currently i am building the recordset in OLE Source using T-SQL. I want to know if there is a way to do it in ssis itself. I am testing aggregate transform but want to know how its done usually.

  • My two cents would be to use the TSQL. I've never tested the ssis Aggregate transform, but would be surprised if it outperformed sql server (of course it probably depends :-)). Plus you don't have as much data to port across by letting the database engine handle it.

    Perhaps others could provide a counterpoint.

    [font="Arial Narrow"]bc[/font]

  • bc (5/28/2009)


    My two cents would be to use the TSQL. I've never tested the ssis Aggregate transform, but would be surprised if it outperformed sql server (of course it probably depends :-)). Plus you don't have as much data to port across by letting the database engine handle it.

    Perhaps others could provide a counterpoint.

    I agree with this.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • I think that the aggregate transform component can do this also.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • I did a little test yesterday afternoon against a query with 2 varchar(5) columns, ~250k rows (109 - 123 distinct records*). I built 2 Data Flows, with one Data Flow using the Aggregate transform and one using a Select Distinct query. The Aggregate transform data flow consistently takes 8-9 seconds, the Select Distinct < .25 second in debug mode.

    *The other difference that may or may not affect you. My database is case-insensitive, the aggregate transform even with all the comparison flags checked returned additional records that appeared to be duplicates (had blank values).

    [font="Arial Narrow"]bc[/font]

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply