SSIS : Group by in Aggregated task

  • Dear all,

    I have created an aggregated task which is producing the below data:

    CIA_ContainerIdentififerValue

    ACCOUNTING DATE

    DC_ManagerCode

    NET MARKET VALUE IN FUND CCY

    PositionWeight

    USD MARKET VALUE

    MANAGER NAME

    DU0403

    18.09.2017

    DU0403_DI0408

    127398537.6

    0.239072586

    127398537.6

    NULL

    DU0403

    18.09.2017

    DU0403_DI0408

    5842001.51

    0.010962939

    5842001.51

    BLC BEITS - Pool AEW

    DU0403

    18.09.2017

    DU0403_DI0409

    65585680.86

    0.123076284

    65585680.86

    NULL

    DU0403

    18.09.2017

    DU0403_DI0411

    2579.97

    4.8415E-06

    2579.97

    BLC BEITS - Pool MS

    DU0403

    18.09.2017

    DU0403_DI0411

    83472305.81

    0.156641832

    83472305.81

    NULL

    DU0403

    18.09.2017

    DU0403_DI0415

    3010068.15

    0.005648611

    3010068.15

    BLC BEITS - Pool RC

    DU0403

    18.09.2017

    DU0403_DI0415

    139860050.5

    0.262457525

    139860050.5

    NULL

    DU0403

    18.09.2017

    DU0403_DI0421

    77777391.11

    0.145954913

    77777391.11

    NULL

    Basically the aggregatio is done grouping all the varchar fields and sum the 3 numeric ones.

    For example:

    DU0403

    18.09.2017

    DU0403_DI0408

    127398537.6

    0.239072586

    127398537.6

    NULL

    DU0403

    18.09.2017

    DU0403_DI0408

    5842001.51

    0.010962939

    5842001.51

    BLC BEITS - Pool AEW

    In the ManagerName (Last column of above example) which is part of the grouping, for each grouping I may have null or a name.

    What I would like is that if it is null, then, instead of producing a new line, it would just sum.

    As example, for the above two lines, what I would like is to have a sum  just in one line, so, the null would sum with

    BLC BEITS - Pool AEW as everything (other then the manager name is the same).

     :

    DU0403

    18.09.2017

    DU0403_DI0408

    127398537.6

    0.239072586

    127398537.6

    BLC BEITS - Pool AEW

     Is it possible to do this?

  • river1 - Thursday, September 28, 2017 10:23 AM

    Dear all,

    I have created an aggregated task which is producing the below data:

    CIA_ContainerIdentififerValue

    ACCOUNTING DATE

    DC_ManagerCode

    NET MARKET VALUE IN FUND CCY

    PositionWeight

    USD MARKET VALUE

    MANAGER NAME

    DU0403

    18.09.2017

    DU0403_DI0408

    127398537.6

    0.239072586

    127398537.6

    NULL

    DU0403

    18.09.2017

    DU0403_DI0408

    5842001.51

    0.010962939

    5842001.51

    BLC BEITS - Pool AEW

    DU0403

    18.09.2017

    DU0403_DI0409

    65585680.86

    0.123076284

    65585680.86

    NULL

    DU0403

    18.09.2017

    DU0403_DI0411

    2579.97

    4.8415E-06

    2579.97

    BLC BEITS - Pool MS

    DU0403

    18.09.2017

    DU0403_DI0411

    83472305.81

    0.156641832

    83472305.81

    NULL

    DU0403

    18.09.2017

    DU0403_DI0415

    3010068.15

    0.005648611

    3010068.15

    BLC BEITS - Pool RC

    DU0403

    18.09.2017

    DU0403_DI0415

    139860050.5

    0.262457525

    139860050.5

    NULL

    DU0403

    18.09.2017

    DU0403_DI0421

    77777391.11

    0.145954913

    77777391.11

    NULL

    Basically the aggregatio is done grouping all the varchar fields and sum the 3 numeric ones.

    For example:

    DU0403

    18.09.2017

    DU0403_DI0408

    127398537.6

    0.239072586

    127398537.6

    NULL

    DU0403

    18.09.2017

    DU0403_DI0408

    5842001.51

    0.010962939

    5842001.51

    BLC BEITS - Pool AEW

    In the ManagerName (Last column of above example) which is part of the grouping, for each grouping I may have null or a name.

    What I would like is that if it is null, then, instead of producing a new line, it would just sum.

    As example, for the above two lines, what I would like is to have a sum  just in one line, so, the null would sum with

    BLC BEITS - Pool AEW as everything (other then the manager name is the same).

     :

    DU0403

    18.09.2017

    DU0403_DI0408

    127398537.6

    0.239072586

    127398537.6

    BLC BEITS - Pool AEW

     Is it possible to do this?

    Sure. Try aggregating the manager name using MAX() and removing them from the GROUP BY.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Sorry. I think I did not undertood your answer.

    You mean, inside the same agregation taking the manager name from group by and doing a max()?

    How is that prossible inside a aggregation if this is a varchar field?

    Or do you mean, first do something like a sort and then create an aggregation ?

  • I'm not sure what the other poster meant either, but I generally try to avoid using SSRS expressions for grouping, and prefer to create the necessary data group values in the dataset.   That way, you have complete control over the nature of the data, and you can worry less about grouping errors in the report, and worry instead about getting them correct in the query first.   It has always led to better results for me. 

    In your case, I'm not sure what YOU meant either, because if you just included that name in the grouping, then NULL is just another value to be grouped by, so ALL the NULL values within a group will get summed into one row.   This doesn't appear to be the case for your data at the moment, but I would validate that.  Then you can explain if what you mean is that you want to collapse these NULL groupings into other groups or if you mean something else.   There are some potential problems with trying to do such a thing.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • I guess that he is right about using the max. I just don't know where

  • I don't know if I should use the max inside the aggregation where I am also grouping or if I should have a task before to use this max and only after group it. But the goal is to have just one possibility on the manager name. If possible the name but if the field is null then the null. Then I can group it . He is right. But I don't know where to put the max() as this is a character based field not a numeric field.

  • river1 - Thursday, September 28, 2017 12:39 PM

    I don't know if I should use the max inside the aggregation where I am also grouping or if I should have a task before to use this max and only after group it. But the goal is to have just one possibility on the manager name. If possible the name but if the field is null then the null. Then I can group it . He is right. But I don't know where to put the max() as this is a character based field not a numeric field.

    As I said, it's not usually a good idea to group things using expressions.   The dataset should be providing the data in consumable portions for grouping, and if some of the data is not tied to a manager name value that is something other than NULL, then how, EXACTLY, do you display that data when the rest of it groups together with the other values other than manager name?   If you're talking about funds being managed, why is there any data that doesn't have a manager name?   I strongly suspect that this change has to occur back in the dataset, within the query or stored procedure that provides the data.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Thank you for your reply. I understand but in this case, I think I prefer to uinderstand what is the way of doing this max() as requested. With time, we can think on other approach.

    In the meantime, any idea of how can I do the max() to the manager name in oder to have just one occurence (null or name)?

    Thanks for the support

  • river1 - Thursday, September 28, 2017 1:09 PM

    Thank you for your reply. I understand but in this case, I think I prefer to uinderstand what is the way of doing this max() as requested. With time, we can think on other approach.

    In the meantime, any idea of how can I do the max() to the manager name in oder to have just one occurence (null or name)?

    Thanks for the support

    Now that I look at this for the third time, I think I understand, finally, what's going on here.   I see data for the same DC_ManagerCode, where one row has a Manager Name and another is NULL.   Instead of using Manager Name in the grouping at all, take it out of the grouping, and let it be an aggregation, using MAX.   This will work correctly if we assume that grouping at the DC_ManagerCode level, in addition to all the other grouping fields other than Manager Name, is the lowest level of grouping, and that whether or not a Manager Name is present or not in the data isn't particularly relevant because perhaps there's always at least one row in each group at the lowest level that DOES have a Manager Name, and thus any records that appear in your source data without one will get tied to the MAX manager name within a given value for DC_ManagerCode.   If that makes sense....

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Problem is that. When, in the same aggregation, I take the Manager name from the group and try to use the max(). It does not appear to me the option.

    Inside the aggregation, when I tick the manager name, the only optins that I have inside the aggregation are the group by. Probably because this is a varchar field.

    So probably I am confused on where can I use this max() inside the aggregation.

    Can you please help understand?

  • river1 - Thursday, September 28, 2017 1:30 PM

    Problem is that. When, in the same aggregation, I take the Manager name from the group and try to use the max(). It does not appear to me the option.

    Inside the aggregation, when I tick the manager name, the only optins that I have inside the aggregation are the group by. Probably because this is a varchar field.

    So probably I am confused on where can I use this max() inside the aggregation.

    Can you please help understand?

    OK, maybe it's not possible to aggregate VARCHARs in SSIS, but it certainly is in SQL. Here is an example:
    CREATE TABLE #Test
    (
      TestId  INT
    , ManagerName VARCHAR(50)
    );

    INSERT #Test
    (
      TestId
    , ManagerName
    )
    VALUES
    (
      1, NULL
    )
    ,(
      1, 'BLC BEITS - Pool AEW'
    )

    SELECT t.TestId, MAX(t.ManagerName)
    FROM #Test t
    GROUP BY t.TestId;

    I had assumed that the same thing would work in SSIS, but as I never aggregate there (the performance is terrible) I was not certain.
    Thus it would seem that the best solution to your problem may be to bring the data into a staging table and then INSERT/UPDATE from the staging table to your target table from there, taking care of the NULLs in the process.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • river1 - Thursday, September 28, 2017 1:30 PM

    Problem is that. When, in the same aggregation, I take the Manager name from the group and try to use the max(). It does not appear to me the option.

    Inside the aggregation, when I tick the manager name, the only optins that I have inside the aggregation are the group by. Probably because this is a varchar field.

    So probably I am confused on where can I use this max() inside the aggregation.

    Can you please help understand?

    Oops!  I had forgotten that SSIS aggregates can't handle character data for MAX values.  It's unfortunate, but you're going to have to get away from using that Aggregate task and run an actual query instead.  SSIS aggregation just isn't going to work unless you can do something with your source data to derive that maximum value to replace any NULLs in your source data.  If that can be done, then you can just leave the Manager Name in the grouping as it will always be the same value.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • But how can I do that? Can I replace the null values for the name of the manager when the aggregation is done?

  • I mean, I need to find a way to replace the null for the name is order to be able to generate the aggregate without having two results.

  • river1 - Thursday, September 28, 2017 2:12 PM

    I mean, I need to find a way to replace the null for the name is order to be able to generate the aggregate without having two results.

    Did you read my comment about using a staging table? That is the easiest solution.
    The only solution I can think of in SSIS is not very pretty and requires that you write a chunk of C# code in a Script Component. I don't recommend going down that path when you can solve this easily in T-SQL.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply