SUMs when joining one to many tables

Question

Post reply

SUMs when joining one to many tables

kangarolf

Hall of Fame

Points: 3432
More actions
November 25, 2010 at 2:45 am

#74625

Hi all,
I've come across this stumbling block quite a few times when writing up SPs for reports and I was wondering what the 'accepted best practise' is.
(sql code for these tables at bottom)
TableA
ConID ConName
1 Mr A
2 Mr B
TableB
SomeID ConID Regionid Total
1 1 1 10
2 1 2 15
3 2 1 10
TableC
SomeID ConID RegionID Total
1 1 2 10
2 2 2 10
This is all totally fictitious and contrived data, in my case this morning it was items of work completed by contractors in dwellings which might be table B and also extra payments to contractors for things like driving vans and collecting materials etc which might be table C.
I want to sum the totals from both table B and C, grouped by the contractor name/id.
If I do
select a.conID, a.conName, sum(b.total) as BTotal, sum(c.total) as CTotal
from tableA a left outer join tableB b on a.conID = b.conID
left outer join tableC c on a.conID = c.conID
group by a.conID, a.conName
it will return
conID, conName, BTotal, CTotal
1, Mr A, 25, 20 **this is the incorrect sum
2, Mr B, 10, 10
as there are two rows being returned for conID from tableB but only one actual row from tableC, the single tableC row with a total of 10 is being doubled.
Whats the accepted way of forming this query...and then also grouping by multiple columns such as region ID and conID so you get totals for each contractor in each region? In the past I have used derived tables and temp tables, or making joins in views first.
Thanks all
DROP TABLE #tableA
DROP TABLE #tableB
DROP TABLE #tableC
--===== Create the test table with
CREATE TABLE #tableA
(
conID INT ,
conName VARCHAR(150)
)
CREATE TABLE #tableB
(
someID INT ,
conID INT,
regionID INT,
total float
)
CREATE TABLE #tableC
(
someID INT,
conID INT,
regionID INT,
total float
)
--===== Insert the test data into the test table
INSERT INTO #tableA
(conID, conName)
SELECT '1','Mr A' UNION ALL
SELECT '2','Mr B'
INSERT INTO #tableB
(someID, conID, RegionID, Total)
SELECT '1','1', 1, 10 UNION ALL
SELECT '2','1', 2, 15 UNION ALL
SELECT '3','2', 1, 10
INSERT INTO #tableC
(someID, conID, RegionID, Total)
SELECT '1','1', 2, 10 UNION ALL
SELECT '2','2', 2, 10

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Andrew Watson-478275 SSCarpal Tunnel Points: 4652 More actions · Answer 1

As a starter, here's one way you could get the totals you want:

select a.conName, b.bTotal, c.cTotal

from #tableA a

inner join (select conID, SUM(Total) as bTotal from #tableB group by conID) b

on b.conID = a.conID

inner join (select conID, SUM(Total) as cTotal from #tableC group by conID) c

on c.conID = a.conID

kangarolf Hall of Fame Points: 3432 More actions · Answer 2

Hi there,

Thats what I have done in the past created derived tables but I was wondering if there were any alternatives, better options, opinions about the performance hit of derived tables and so on. I cant help looking at some of these rather large derived queries and thinking..'god thats alot of unions and group bys!'

Cheers

Rolf

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 3

~~I believe the following will be faster and use fewer resources but we won't know for sure until we test with a whole lot more data.~~

~~SELECT a.conID, a.conName,~~

~~TotalB = SUM(b.Total),~~

~~TotalC = SUM(c.Total)~~

~~FROM #tableA a~~

~~LEFT JOIN #tableB b ON a.conID = b.conID~~

~~LEFT JOIN #tableC c ON a.conID = c.conID~~

~~GROUP BY a.conID, a.conName~~

{edit} I really shouldn't post until I've had enough coffee. I don't know what the heck I was thinking. [font="Arial Black"]Don't do it this way.[/font] I made a terrible mistake and the performance is horrible. I'll post the test code in a couple of minutes.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 4

Here's the test code including more test data.

DROP TABLE #tableA;

DROP TABLE #tableB;

DROP TABLE #tableC;

--===== Create the test table with

CREATE TABLE #tableA

(

conID INT PRIMARY KEY CLUSTERED,

conName VARCHAR(150)

)

;

CREATE TABLE #tableB

(

someID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,

conID INT,

regionID INT,

total float

)

;

CREATE TABLE #tableC

(

someID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,

conID INT,

regionID INT,

total float

)

;

--=========================================================

-- Insert the test data into the test tables

--=========================================================

--===== Build the data for 26 names contractors.

INSERT INTO #tableA

(conID, conName)

SELECT TOP 26

conID = Number,

conName = 'Mr '+CHAR(Number+64) --65 = "A" in ASCII

FROM master.dbo.spt_values

WHERE Type = 'P'

AND Number BETWEEN 1 AND 26

;

--===== Build a million rows of test data for table B

-- for 26 contractors across 10 regions and

-- totals from 1 to 1000

INSERT INTO #tableB

(conID, RegionID, Total)

SELECT TOP 10000

conID = ABS(CHECKSUM(NEWID()))%26+1,

RegionID = ABS(CHECKSUM(NEWID()))%10+1,

Total = CAST((ABS(CHECKSUM(NEWID()))%10000+1)/100.0 AS FLOAT)

FROM sys.all_columns rowsource1

CROSS JOIN sys.all_columns rowsource2

;

--===== Build rows of test data for table C

-- for 26 contractors across 10 regions and

-- totals from 1 to 1000

INSERT INTO #tableC

(conID, RegionID, Total)

SELECT TOP 10000

conID = ABS(CHECKSUM(NEWID()))%26+1,

RegionID = ABS(CHECKSUM(NEWID()))%10+1,

Total = CAST((ABS(CHECKSUM(NEWID()))%10000+1)/100.0 AS FLOAT)

FROM sys.all_columns rowsource1

CROSS JOIN sys.all_columns rowsource2

;

CREATE INDEX ix_#TableB_conID_total ON #tableB (conID,total);

CREATE INDEX ix_#TableC_conID_total ON #tableC (conID,total);

--===== This method forms a partial cross join when creating the sums

-- as well as having to join too many times.

PRINT '========== The "BAD" way ==========';

SET STATISTICS TIME ON;

SELECT a.conID, a.conName,

TotalB = SUM(b.Total),

TotalC = SUM(c.Total)

FROM #tableA a

LEFT JOIN #tableB b ON a.conID = b.conID

LEFT JOIN #tableC c ON a.conID = c.conID

GROUP BY a.conID, a.conName

;

SET STATISTICS TIME OFF;

--===== This is Andrew's method. It very quickly creates the sums required

-- and then only has to join 3 sets of 26 rows. I changed the INNER

-- joins to LEFT joins to return a "0" for those contractors who had

-- no totals.

PRINT '========== Andrew''s code with a mod ==========';

SET STATISTICS TIME ON;

select a.conName, b.bTotal, c.cTotal

from #tableA a

LEFT join (select conID, SUM(Total) as bTotal from #tableB group by conID) b

on b.conID = a.conID

LEFT join (select conID, SUM(Total) as cTotal from #tableC group by conID) c

on c.conID = a.conID

;

SET STATISTICS TIME OFF;

Here are the results from my 8 year old, single CPU desktop box...

========== The "BAD" way ==========

(26 row(s) affected)

SQL Server Execution Times:

CPU time = 22031 ms, elapsed time = 22302 ms.

========== Andrew's code with a mod ==========

(26 row(s) affected)

SQL Server Execution Times:

CPU time = 31 ms, elapsed time = 33 ms.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Dugi SSCoach Points: 17998 More actions · Answer 5

Jeff you are amazing, buddy... T-SQL simple works for you, really!

:w00t::hehe::cool:;-)

============================================================
SELECT YOUR PROBLEM FROM SSC.com WHERE PROBLEM DESCRIPTION =
http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url]

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 6

Dugi (11/26/2010)
Jeff you are amazing, buddy... T-SQL simple works for you, really!
:w00t::hehe::cool:;-)

Nah... this one was all Andrew. I'm the one that posted the bad code this time. :blush: I don't know what the heck I was thinking except that maybe I was thinking just exactly backwards.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Dugi SSCoach Points: 17998 More actions · Answer 7

Yes I see, but you are doing the brilliant and perfect comparison and explanation using the current solutions!

Have a nice weekend buddy!

Greetings!

============================================================
SELECT YOUR PROBLEM FROM SSC.com WHERE PROBLEM DESCRIPTION =
http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url]

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 8

Dugi (11/26/2010)
Yes I see, but you are doing the brilliant and perfect comparison and explanation using the current solutions!
Have a nice weekend buddy!
Greetings!

Ah... thanks, Dugi! I had to know and just thought I'd share the results. 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Andrew Watson-478275 SSCarpal Tunnel Points: 4652 More actions · Answer 9

I have no idea why I put inner joins in there. In my head they were lefts. 🙂

Mike Mullen Hall of Fame Points: 3254 More actions · Answer 10

Here is another variation with excellent performance.

select #tableA.conName,

isnull((select sum(total) from #tableB where conid = #tableA.conid),0),

isnull((select sum(total) from #tableC where conid = #tableA.conid),0)

from #tableA

order by 1

;