SSIS : Group by in Aggregated task

Question

Post reply

SSIS : Group by in Aggregated task

river1

SSC-Insane

Points: 24309

September 28, 2017 at 10:23 am

#349359

Dear all,

I have created an aggregated task which is producing the below data:

CIA_ContainerIdentififerValue

ACCOUNTING DATE

DC_ManagerCode

NET MARKET VALUE IN FUND CCY

PositionWeight

USD MARKET VALUE

MANAGER NAME

DU0403

18.09.2017

DU0403_DI0408

127398537.6

0.239072586

127398537.6

NULL

DU0403

18.09.2017

DU0403_DI0408

5842001.51

0.010962939

5842001.51

BLC BEITS - Pool AEW

DU0403

18.09.2017

DU0403_DI0409

65585680.86

0.123076284

65585680.86

NULL

DU0403

18.09.2017

DU0403_DI0411

2579.97

4.8415E-06

2579.97

BLC BEITS - Pool MS

DU0403

18.09.2017

DU0403_DI0411

83472305.81

0.156641832

83472305.81

NULL

DU0403

18.09.2017

DU0403_DI0415

3010068.15

0.005648611

3010068.15

BLC BEITS - Pool RC

DU0403

18.09.2017

DU0403_DI0415

139860050.5

0.262457525

139860050.5

NULL

DU0403

18.09.2017

DU0403_DI0421

77777391.11

0.145954913

77777391.11

NULL

Basically the aggregatio is done grouping all the varchar fields and sum the 3 numeric ones.

For example:

DU0403

18.09.2017

DU0403_DI0408

127398537.6

0.239072586

127398537.6

NULL

DU0403

18.09.2017

DU0403_DI0408

5842001.51

0.010962939

5842001.51

BLC BEITS - Pool AEW

In the ManagerName (Last column of above example) which is part of the grouping, for each grouping I may have null or a name.

What I would like is that if it is null, then, instead of producing a new line, it would just sum.

As example, for the above two lines, what I would like is to have a sum just in one line, so, the null would sum with

BLC BEITS - Pool AEW as everything (other then the manager name is the same).

:

DU0403

18.09.2017

DU0403_DI0408

127398537.6

0.239072586

127398537.6

BLC BEITS - Pool AEW

Is it possible to do this?

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply

Phil Parkin SSC Guru Points: 246965 More actions · Answer 1

river1 - Thursday, September 28, 2017 10:23 AM
Dear all,
I have created an aggregated task which is producing the below data:
CIA_ContainerIdentififerValue
ACCOUNTING DATE
DC_ManagerCode
NET MARKET VALUE IN FUND CCY
PositionWeight
USD MARKET VALUE
MANAGER NAME
DU0403
18.09.2017
DU0403_DI0408
127398537.6
0.239072586
127398537.6
NULL
DU0403
18.09.2017
DU0403_DI0408
5842001.51
0.010962939
5842001.51
BLC BEITS - Pool AEW
DU0403
18.09.2017
DU0403_DI0409
65585680.86
0.123076284
65585680.86
NULL
DU0403
18.09.2017
DU0403_DI0411
2579.97
4.8415E-06
2579.97
BLC BEITS - Pool MS
DU0403
18.09.2017
DU0403_DI0411
83472305.81
0.156641832
83472305.81
NULL
DU0403
18.09.2017
DU0403_DI0415
3010068.15
0.005648611
3010068.15
BLC BEITS - Pool RC
DU0403
18.09.2017
DU0403_DI0415
139860050.5
0.262457525
139860050.5
NULL
DU0403
18.09.2017
DU0403_DI0421
77777391.11
0.145954913
77777391.11
NULL
Basically the aggregatio is done grouping all the varchar fields and sum the 3 numeric ones.
For example:
DU0403
18.09.2017
DU0403_DI0408
127398537.6
0.239072586
127398537.6
NULL
DU0403
18.09.2017
DU0403_DI0408
5842001.51
0.010962939
5842001.51
BLC BEITS - Pool AEW
In the ManagerName (Last column of above example) which is part of the grouping, for each grouping I may have null or a name.
What I would like is that if it is null, then, instead of producing a new line, it would just sum.
As example, for the above two lines, what I would like is to have a sum just in one line, so, the null would sum with
BLC BEITS - Pool AEW as everything (other then the manager name is the same).
:
DU0403
18.09.2017
DU0403_DI0408
127398537.6
0.239072586
127398537.6
BLC BEITS - Pool AEW
Is it possible to do this?

Sure. Try aggregating the manager name using MAX() and removing them from the GROUP BY.

The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin

river1 SSC-Insane Points: 24309 More actions · Answer 2

Sorry. I think I did not undertood your answer.

You mean, inside the same agregation taking the manager name from group by and doing a max()?

How is that prossible inside a aggregation if this is a varchar field?

Or do you mean, first do something like a sort and then create an aggregation ?

sgmunson SSC Guru Points: 110640 More actions · Answer 3

I'm not sure what the other poster meant either, but I generally try to avoid using SSRS expressions for grouping, and prefer to create the necessary data group values in the dataset. That way, you have complete control over the nature of the data, and you can worry less about grouping errors in the report, and worry instead about getting them correct in the query first. It has always led to better results for me.

In your case, I'm not sure what YOU meant either, because if you just included that name in the grouping, then NULL is just another value to be grouped by, so ALL the NULL values within a group will get summed into one row. This doesn't appear to be the case for your data at the moment, but I would validate that. Then you can explain if what you mean is that you want to collapse these NULL groupings into other groups or if you mean something else. There are some potential problems with trying to do such a thing.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

river1 SSC-Insane Points: 24309 More actions · Answer 4

I guess that he is right about using the max. I just don't know where

river1 SSC-Insane Points: 24309 More actions · Answer 5

I don't know if I should use the max inside the aggregation where I am also grouping or if I should have a task before to use this max and only after group it. But the goal is to have just one possibility on the manager name. If possible the name but if the field is null then the null. Then I can group it . He is right. But I don't know where to put the max() as this is a character based field not a numeric field.

sgmunson SSC Guru Points: 110640 More actions · Answer 6

river1 - Thursday, September 28, 2017 12:39 PM
I don't know if I should use the max inside the aggregation where I am also grouping or if I should have a task before to use this max and only after group it. But the goal is to have just one possibility on the manager name. If possible the name but if the field is null then the null. Then I can group it . He is right. But I don't know where to put the max() as this is a character based field not a numeric field.

As I said, it's not usually a good idea to group things using expressions. The dataset should be providing the data in consumable portions for grouping, and if some of the data is not tied to a manager name value that is something other than NULL, then how, EXACTLY, do you display that data when the rest of it groups together with the other values other than manager name? If you're talking about funds being managed, why is there any data that doesn't have a manager name? I strongly suspect that this change has to occur back in the dataset, within the query or stored procedure that provides the data.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

river1 SSC-Insane Points: 24309 More actions · Answer 7

Thank you for your reply. I understand but in this case, I think I prefer to uinderstand what is the way of doing this max() as requested. With time, we can think on other approach.

In the meantime, any idea of how can I do the max() to the manager name in oder to have just one occurence (null or name)?

Thanks for the support

sgmunson SSC Guru Points: 110640 More actions · Answer 8

river1 - Thursday, September 28, 2017 1:09 PM
Thank you for your reply. I understand but in this case, I think I prefer to uinderstand what is the way of doing this max() as requested. With time, we can think on other approach.
In the meantime, any idea of how can I do the max() to the manager name in oder to have just one occurence (null or name)?
Thanks for the support

Now that I look at this for the third time, I think I understand, finally, what's going on here. I see data for the same DC_ManagerCode, where one row has a Manager Name and another is NULL. Instead of using Manager Name in the grouping at all, take it out of the grouping, and let it be an aggregation, using MAX. This will work correctly if we assume that grouping at the DC_ManagerCode level, in addition to all the other grouping fields other than Manager Name, is the lowest level of grouping, and that whether or not a Manager Name is present or not in the data isn't particularly relevant because perhaps there's always at least one row in each group at the lowest level that DOES have a Manager Name, and thus any records that appear in your source data without one will get tied to the MAX manager name within a given value for DC_ManagerCode. If that makes sense....

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

river1 SSC-Insane Points: 24309 More actions · Answer 9

Problem is that. When, in the same aggregation, I take the Manager name from the group and try to use the max(). It does not appear to me the option.

Inside the aggregation, when I tick the manager name, the only optins that I have inside the aggregation are the group by. Probably because this is a varchar field.

So probably I am confused on where can I use this max() inside the aggregation.

Can you please help understand?

Phil Parkin SSC Guru Points: 246965 More actions · Answer 10

river1 - Thursday, September 28, 2017 1:30 PM
Problem is that. When, in the same aggregation, I take the Manager name from the group and try to use the max(). It does not appear to me the option.
Inside the aggregation, when I tick the manager name, the only optins that I have inside the aggregation are the group by. Probably because this is a varchar field.
So probably I am confused on where can I use this max() inside the aggregation.
Can you please help understand?

OK, maybe it's not possible to aggregate VARCHARs in SSIS, but it certainly is in SQL. Here is an example:
CREATE TABLE #Test ( TestId INT , ManagerName VARCHAR(50) );

INSERT #Test ( TestId , ManagerName ) VALUES ( 1, NULL ) ,( 1, 'BLC BEITS - Pool AEW' )

SELECT t.TestId, MAX(t.ManagerName)
FROM #Test t
GROUP BY t.TestId;

I had assumed that the same thing would work in SSIS, but as I never aggregate there (the performance is terrible) I was not certain.
Thus it would seem that the best solution to your problem may be to bring the data into a staging table and then INSERT/UPDATE from the staging table to your target table from there, taking care of the NULLs in the process.

The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin

sgmunson SSC Guru Points: 110640 More actions · Answer 11

river1 - Thursday, September 28, 2017 1:30 PM
Problem is that. When, in the same aggregation, I take the Manager name from the group and try to use the max(). It does not appear to me the option.
Inside the aggregation, when I tick the manager name, the only optins that I have inside the aggregation are the group by. Probably because this is a varchar field.
So probably I am confused on where can I use this max() inside the aggregation.
Can you please help understand?

Oops! I had forgotten that SSIS aggregates can't handle character data for MAX values. It's unfortunate, but you're going to have to get away from using that Aggregate task and run an actual query instead. SSIS aggregation just isn't going to work unless you can do something with your source data to derive that maximum value to replace any NULLs in your source data. If that can be done, then you can just leave the Manager Name in the grouping as it will always be the same value.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

river1 SSC-Insane Points: 24309 More actions · Answer 12

But how can I do that? Can I replace the null values for the name of the manager when the aggregation is done?

river1 SSC-Insane Points: 24309 More actions · Answer 13

I mean, I need to find a way to replace the null for the name is order to be able to generate the aggregate without having two results.

Phil Parkin SSC Guru Points: 246965 More actions · Answer 14

river1 - Thursday, September 28, 2017 2:12 PM
I mean, I need to find a way to replace the null for the name is order to be able to generate the aggregate without having two results.

Did you read my comment about using a staging table? That is the easiest solution.
The only solution I can think of in SSIS is not very pretty and requires that you write a chunk of C# code in a Script Component. I don't recommend going down that path when you can solve this easily in T-SQL.

The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin