Help with aggregate function

Question

Post reply

Help with aggregate function

ger_serrano

Old Hand

Points: 302
More actions
November 17, 2017 at 12:34 pm

#402680

have table:
col1 col2 col3
A B 10
B A 5
C D 8
Want to get:
A B 15
C D 8

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply

ZZartin SSC-Dedicated Points: 30921 More actions · Answer 1

And how are you determining that

A B 10
B A 5

Should aggregate to A B 15

ger_serrano Old Hand Points: 302 More actions · Answer 2

col1 and col2 represent company names and col3 is the transaction value, could be negative, between the companies. So I want to know how much company A spent?

Luis Cazares SSC Guru Points: 183706 More actions · Answer 3

Lengthy, but there might not be a better alternative without fixing the data.
CREATE TABLE SampleData( col1 varchar(10), col2 varchar(10), col3 int ); INSERT INTO SampleData VALUES ('A', 'B', 10), ('B', 'A', 5 ), ('C', 'D', 8 );

SELECT CASE WHEN col1 <= col2 THEN col1 ELSE col2 END AS col1, CASE WHEN col1 <= col2 THEN col2 ELSE col1 END AS col2, SUM( col3) AS col3 FROM SampleData GROUP BY CASE WHEN col1 <= col2 THEN col1 ELSE col2 END, CASE WHEN col1 <= col2 THEN col2 ELSE col1 END;

GO DROP TABLE SampleData

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Jeff Moden SSC Guru Points: 1004701 More actions · Answer 4

It won't make a bit of improvement performance-wise over what Luis posted, but it does simplify the code a bit. The derived table could be a CTE or CROSS APPLY, as well. Since this is also for 2012 or greater, we can use IIF for further code simplification although they do resolve to CASE operators behind the scenes.

SELECT d.Col1, d.Col2, Col3 = SUM(d.COL3) FROM ( SELECT Col1 = IIF(Col1<=Col2, Col1, Col2) ,Col2 = IIF(Col1<=Col2, Col2, Col1) ,Col3 FROM dbo.SampleData ) d GROUP BY d.Col1, d.Col2 ;

Also, I recommend that you always use the 2 part naming convention for all tables and objects in all of your code.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ger_serrano Old Hand Points: 302 More actions · Answer 5

That is excellent guys. Both queries work as expected. I have modified the query a little bit to accomodate a case when col3 is negative. In that case it means that for company col1 is a loss but for comany col2 is a plus.so for examble for the following values;
col1 col2 col3
A    B    100
B    A    -50
A    C    200
A    D    100
A    B    150
B    F    200
The result of the modified queries:
SELECT CASE WHEN col1 <= col2 THEN col1 ELSE col2 END AS col1,
CASE WHEN col1 <= col2 THEN col2 ELSE col1 END AS col2,
SUM( CASE WHEN col3 < 0 THEN col3 * -1 ELSE col3 END ) AS col3
FROM SampleData
GROUP BY CASE WHEN col1 <= col2 THEN col1 ELSE col2 END,
CASE WHEN col1 <= col2 THEN col2 ELSE col1 END;

SELECT d.col1, d.col2, col3 = SUM(IIF (d.col3 < 0, d.col3 * -1, d.col3))
FROM (
   SELECT col1 = IIF(col1<=col2, col1, col2)
      ,col2 = IIF(col1<=col2, col2, col1)
      ,col3
    FROM SampleData
   ) d
GROUP BY d.col1, d.col2
;
Result:
A    B    300
A    C    200
A    D    100
B    F    200

That is what I wanted!!
I got all the help I requested. Thank you very much. From my prespective we can close this thread, Not sure who is going to do it.

Jeff Moden SSC Guru Points: 1004701 More actions · Answer 6

ger_serrano - Monday, November 20, 2017 9:11 AM
That is excellent guys. Both queries work as expected. I have modified the query a little bit to accomodate a case when col3 is negative. In that case it means that for company col1 is a loss but for comany col2 is a plus.so for examble for the following values;
col1 col2 col3
A    B    100
B    A    -50
A    C    200
A    D    100
A    B    150
B    F    200
The result of the modified queries:
SELECT CASE WHEN col1 <= col2 THEN col1 ELSE col2 END AS col1,
CASE WHEN col1 <= col2 THEN col2 ELSE col1 END AS col2,
SUM( CASE WHEN col3 < 0 THEN col3 * -1 ELSE col3 END ) AS col3
FROM SampleData
GROUP BY CASE WHEN col1 <= col2 THEN col1 ELSE col2 END,
CASE WHEN col1 <= col2 THEN col2 ELSE col1 END;
SELECT d.col1, d.col2, col3 = SUM(IIF (d.col3 < 0, d.col3 * -1, d.col3))
FROM (
   SELECT col1 = IIF(col1<=col2, col1, col2)
      ,col2 = IIF(col1<=col2, col2, col1)
      ,col3
    FROM SampleData
   ) d
GROUP BY d.col1, d.col2
;
Result:
A    B    300
A    C    200
A    D    100
B    F    200
That is what I wanted!!
I got all the help I requested. Thank you very much. From my prespective we can close this thread, Not sure who is going to do it.

Thanks for the kind feedback and about your modification. Threads are never closed, though, just in case someone comes up with a better idea that will help us all for similar problems.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ger_serrano Old Hand Points: 302 More actions · Answer 7

Sorry . that was only half of the work I had to do. tuens out that every time I have negative value in col3 I have to turn col3 positive and flip col1 and col2 values and then group so if have:
col1 col2 col3
A B 100
B A -50
A C 200
A D 100
A B 150
B F 200
C D -50
I have to produce
col1 col2 col3
A B 300
A C 200
A D 100
B F 200
D C 50

ger_serrano Old Hand Points: 302 More actions · Answer 8

Following query produced the expected results:
A    B    300
A    C    200
A    D    100
B    F    200
D    C    50

SELECT CASE WHEN COL1 <= COL2 AND COL3 > 0 THEN COL1 ELSE COL2 END AS COL1,
CASE WHEN COL1 <= COL2 AND COL3 > 0 THEN COL2 ELSE COL1 END AS COL2,
SUM( CASE WHEN COL3 < 0 THEN COL3 *-1 ELSE COL3 END ) AS COL3
FROM SampleData
GROUP BY CASE WHEN COL1 <= COL2 AND COL3 > 0 THEN COL1 ELSE COL2 END,
CASE WHEN COL1 <= COL2 AND COL3 > 0 THEN COL2 ELSE COL1 END
ORDER BY CASE WHEN COL1 <= COL2 AND COL3 > 0 THEN COL1 ELSE COL2 END,
CASE WHEN COL1 <= COL2 AND COL3 > 0 THEN COL2 ELSE COL1 END

ger_serrano Old Hand Points: 302 More actions · Answer 9

unfortunately taht query did not encompass all scenarios such as:
col1 col2 col3
('A','B',100),
('B','A',-50),
('A','C',200),
('A','D',100),
('A','B',150),
('B','F',200),
('d','K',400),
('K','D',-20),
('C','D',-50),
('D','C',-60);
I need to display which one of col1 or col2 had the max net value (col3). I mean I have to have as a result:
A,B,300
A,C,200
A,D,100
B,F,200
D,K,420
C,D,10
hopefully this table will help to cover all 4 scenarios of values between col1 and col2 (first two rows). Last four rows is the answer we need to get:

A	B	30	30	-30	-30
B	A	90	-90	90	-90
B	A	60
A	B		120
B	A			120
A	B				60

s???9w

Jeff Moden SSC Guru Points: 1004701 More actions · Answer 10

ger_serrano - Monday, December 18, 2017 1:26 PM
unfortunately taht query did not encompass all scenarios such as:
col1 col2 col3
('A','B',100),
('B','A',-50),
('A','C',200),
('A','D',100),
('A','B',150),
('B','F',200),
('d','K',400),
('K','D',-20),
('C','D',-50),
('D','C',-60);
I need to display which one of col1 or col2 had the max net value (col3). I mean I have to have as a result:
A,B,300
A,C,200
A,D,100
B,F,200
D,K,420
C,D,10
hopefully this table will help to cover all 4 scenarios of values between col1 and col2 (first two rows). Last four rows is the answer we need to get:
A B 30 30 -30 -30
B A 90 -90 90 -90
B A 60
A B 120
B A 120
A B 60
sÔ¾ï¿½ï¿½9w

Two things...

First, help us help you... you need to start posting your data in a readily consumable format. Please see the first link in my signature line below under "Helpful Links" for how to do that.

Second, I'm just not baggin' what you're rakin' on your post above. Since your previous post said that all negative values need to be made positive, I have no idea how you came up with the result of "C,D,10" nor do I have any idea how you came up with the numbers you have in your "all 4 scenarios" example above because there's neither hide nor hair of any combination of numbers for "A,B" or "B,A" that will produce any number not evenly divisible by 50.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ger_serrano Old Hand Points: 302 More actions · Answer 11

Really sorry but business keep changing the requirements. My table is:
create table SampleData ( col1 VARCHAR(10), col2 VARCHAR(10), MTM numeric(10));
INSERT INTO SampleData VALUES
('A','B',100),
('B','A',-50),
('A','C',200),
('A','D',100),
('A','B',150),
('B','F',200),
('d','K',400),
('K','D',-20),
('C','D',-50),
('B','A',15),
('D','C',-60),
('X','Y',-25),
('Y','X',-25);
Here is the meaning of col1, col2, and MTM: col1 and col2 are companies and MTM is the result of a trade between col1 and col2 but reported by col1. There is no restriction in the number of trades between companies and MTM could be positive or negative. What we need to report is the net position between col1 and col2. Based on the data I posted I would like to have
col1    col2    MTM
A      B        285
A      C        200
A      D        100
B      F        200
C        D        10
d        K        420
X        Y        0

So back to your question about result C D 10: C reports trade with D -50 which means that C lost 50 in the trade but D won 50. Now D reports another trade with C -60 which means that D lost 60 but C won 60. If we compute those two trades then C trading with D won at the end 10 and that is why we report C D 10, C is in the first position and D in the second position. Hopefuly that makes sense. Same login applies to the trades between X and Y. The net result is X Y 0 or Y X 0 (the same for me).

drew.allen SSC Guru Points: 77014 More actions · Answer 12

ger_serrano - Tuesday, December 19, 2017 8:39 AM
Really sorry but business keep changing the requirements. My table is:
create table SampleData ( col1 VARCHAR(10), col2 VARCHAR(10), MTM numeric(10));
INSERT INTO SampleData VALUES
('A','B',100),
('B','A',-50),
('A','C',200),
('A','D',100),
('A','B',150),
('B','F',200),
('d','K',400),
('K','D',-20),
('C','D',-50),
('B','A',15),
('D','C',-60),
('X','Y',-25),
('Y','X',-25);
Here is the meaning of col1, col2, and MTM: col1 and col2 are companies and MTM is the result of a trade between col1 and col2 but reported by col1. There is no restriction in the number of trades between companies and MTM could be positive or negative. What we need to report is the net position between col1 and col2. Based on the data I posted I would like to have
col1    col2    MTM
A      B        285
A      C        200
A      D        100
B      F        200
C        D        10
d        K        420
X        Y        0
So back to your question about result C D 10: C reports trade with D -50 which means that C lost 50 in the trade but D won 50. Now D reports another trade with C -60 which means that D lost 60 but C won 60. If we compute those two trades then C trading with D won at the end 10 and that is why we report C D 10, C is in the first position and D in the second position. Hopefuly that makes sense. Same login applies to the trades between X and Y. The net result is X Y 0 or Y X 0 (the same for me).

It's actually very easy. You need to have some method to standardize the order of the companies and then make sure that the records match that standard. Once you've standardized your records it's a simple grouping. For example, force the companies in your records to be in alphabetical order ( and switch the sign where they're not). I've used a CROSS APPLY, but you could also use three CASE expressions or a simple UNION ALL.

SELECT s.col1, s.col2, SUM(s.MTM) FROM #SampleData CROSS APPLY ( SELECT col1, col2, mtm WHERE col1 < col2

UNION ALL

SELECT col2, col1, -mtm WHERE col1 >= col2 ) AS s -- standardized GROUP BY s.col1, s.col2

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

ger_serrano Old Hand Points: 302 More actions · Answer 13

Awesome Jeff!!. Thanks a lot. I think the following query is also rdbms agnostic:
with newtab (col1,col2, mtm)
as
(SELECT upper(col1), upper(col2), mtm FROM sampledata
WHERE upper(col1) < upper(col2)

UNION ALL

SELECT upper(col2), upper(col1), -mtm FROM sampledata
WHERE upper(col1) >= upper(col2)
)
select col1, col2, sum(mtm) as mtm from newtab
group by col1, col2;

drew.allen SSC Guru Points: 77014 More actions · Answer 14

ger_serrano - Wednesday, December 20, 2017 7:34 AM
Awesome Jeff!!. Thanks a lot. I think the following query is also rdbms agnostic:
with newtab (col1,col2, mtm)
as
(SELECT upper(col1), upper(col2), mtm FROM sampledata
WHERE upper(col1) < upper(col2)

UNION ALL
SELECT upper(col2), upper(col1), -mtm FROM sampledata
WHERE upper(col1) >= upper(col2)
)
select col1, col2, sum(mtm) as mtm from newtab
group by col1, col2;

That came from me, not Jeff.

I believe that the CTE will require two separate scans of the table, whereas the CROSS APPLY will only require one. I don't have the data set up anymore to test.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA