November 25, 2010 at 2:45 am
Hi all,
I've come across this stumbling block quite a few times when writing up SPs for reports and I was wondering what the 'accepted best practise' is.
(sql code for these tables at bottom)
TableA
ConID ConName
1 Mr A
2 Mr B
TableB
SomeID ConID Regionid Total
1 1 1 10
2 1 2 15
3 2 1 10
TableC
SomeID ConID RegionID Total
1 1 2 10
2 2 2 10
This is all totally fictitious and contrived data, in my case this morning it was items of work completed by contractors in dwellings which might be table B and also extra payments to contractors for things like driving vans and collecting materials etc which might be table C.
I want to sum the totals from both table B and C, grouped by the contractor name/id.
If I do
select a.conID, a.conName, sum(b.total) as BTotal, sum(c.total) as CTotal
from tableA a left outer join tableB b on a.conID = b.conID
left outer join tableC c on a.conID = c.conID
group by a.conID, a.conName
it will return
conID, conName, BTotal, CTotal
1, Mr A, 25, 20 **this is the incorrect sum
2, Mr B, 10, 10
as there are two rows being returned for conID from tableB but only one actual row from tableC, the single tableC row with a total of 10 is being doubled.
Whats the accepted way of forming this query...and then also grouping by multiple columns such as region ID and conID so you get totals for each contractor in each region? In the past I have used derived tables and temp tables, or making joins in views first.
Thanks all
DROP TABLE #tableA
DROP TABLE #tableB
DROP TABLE #tableC
--===== Create the test table with
CREATE TABLE #tableA
(
conID INT ,
conName VARCHAR(150)
)
CREATE TABLE #tableB
(
someID INT ,
conID INT,
regionID INT,
total float
)
CREATE TABLE #tableC
(
someID INT,
conID INT,
regionID INT,
total float
)
--===== Insert the test data into the test table
INSERT INTO #tableA
(conID, conName)
SELECT '1','Mr A' UNION ALL
SELECT '2','Mr B'
INSERT INTO #tableB
(someID, conID, RegionID, Total)
SELECT '1','1', 1, 10 UNION ALL
SELECT '2','1', 2, 15 UNION ALL
SELECT '3','2', 1, 10
INSERT INTO #tableC
(someID, conID, RegionID, Total)
SELECT '1','1', 2, 10 UNION ALL
SELECT '2','2', 2, 10
November 26, 2010 at 6:00 am
As a starter, here's one way you could get the totals you want:
select a.conName, b.bTotal, c.cTotal
from #tableA a
inner join (select conID, SUM(Total) as bTotal from #tableB group by conID) b
on b.conID = a.conID
inner join (select conID, SUM(Total) as cTotal from #tableC group by conID) c
on c.conID = a.conID
November 26, 2010 at 6:22 am
Hi there,
Thats what I have done in the past created derived tables but I was wondering if there were any alternatives, better options, opinions about the performance hit of derived tables and so on. I cant help looking at some of these rather large derived queries and thinking..'god thats alot of unions and group bys!'
Cheers
Rolf
November 26, 2010 at 11:36 am
I believe the following will be faster and use fewer resources but we won't know for sure until we test with a whole lot more data.
SELECT a.conID, a.conName,
TotalB = SUM(b.Total),
TotalC = SUM(c.Total)
FROM #tableA a
LEFT JOIN #tableB b ON a.conID = b.conID
LEFT JOIN #tableC c ON a.conID = c.conID
GROUP BY a.conID, a.conName
{edit} I really shouldn't post until I've had enough coffee. I don't know what the heck I was thinking. [font="Arial Black"]Don't do it this way.[/font] I made a terrible mistake and the performance is horrible. I'll post the test code in a couple of minutes.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 26, 2010 at 12:28 pm
Here's the test code including more test data.
DROP TABLE #tableA;
DROP TABLE #tableB;
DROP TABLE #tableC;
--===== Create the test table with
CREATE TABLE #tableA
(
conID INT PRIMARY KEY CLUSTERED,
conName VARCHAR(150)
)
;
CREATE TABLE #tableB
(
someID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
conID INT,
regionID INT,
total float
)
;
CREATE TABLE #tableC
(
someID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
conID INT,
regionID INT,
total float
)
;
--=========================================================
-- Insert the test data into the test tables
--=========================================================
--===== Build the data for 26 names contractors.
INSERT INTO #tableA
(conID, conName)
SELECT TOP 26
conID = Number,
conName = 'Mr '+CHAR(Number+64) --65 = "A" in ASCII
FROM master.dbo.spt_values
WHERE Type = 'P'
AND Number BETWEEN 1 AND 26
;
--===== Build a million rows of test data for table B
-- for 26 contractors across 10 regions and
-- totals from 1 to 1000
INSERT INTO #tableB
(conID, RegionID, Total)
SELECT TOP 10000
conID = ABS(CHECKSUM(NEWID()))%26+1,
RegionID = ABS(CHECKSUM(NEWID()))%10+1,
Total = CAST((ABS(CHECKSUM(NEWID()))%10000+1)/100.0 AS FLOAT)
FROM sys.all_columns rowsource1
CROSS JOIN sys.all_columns rowsource2
;
--===== Build rows of test data for table C
-- for 26 contractors across 10 regions and
-- totals from 1 to 1000
INSERT INTO #tableC
(conID, RegionID, Total)
SELECT TOP 10000
conID = ABS(CHECKSUM(NEWID()))%26+1,
RegionID = ABS(CHECKSUM(NEWID()))%10+1,
Total = CAST((ABS(CHECKSUM(NEWID()))%10000+1)/100.0 AS FLOAT)
FROM sys.all_columns rowsource1
CROSS JOIN sys.all_columns rowsource2
;
CREATE INDEX ix_#TableB_conID_total ON #tableB (conID,total);
CREATE INDEX ix_#TableC_conID_total ON #tableC (conID,total);
--===== This method forms a partial cross join when creating the sums
-- as well as having to join too many times.
PRINT '========== The "BAD" way ==========';
SET STATISTICS TIME ON;
SELECT a.conID, a.conName,
TotalB = SUM(b.Total),
TotalC = SUM(c.Total)
FROM #tableA a
LEFT JOIN #tableB b ON a.conID = b.conID
LEFT JOIN #tableC c ON a.conID = c.conID
GROUP BY a.conID, a.conName
;
SET STATISTICS TIME OFF;
--===== This is Andrew's method. It very quickly creates the sums required
-- and then only has to join 3 sets of 26 rows. I changed the INNER
-- joins to LEFT joins to return a "0" for those contractors who had
-- no totals.
PRINT '========== Andrew''s code with a mod ==========';
SET STATISTICS TIME ON;
select a.conName, b.bTotal, c.cTotal
from #tableA a
LEFT join (select conID, SUM(Total) as bTotal from #tableB group by conID) b
on b.conID = a.conID
LEFT join (select conID, SUM(Total) as cTotal from #tableC group by conID) c
on c.conID = a.conID
;
SET STATISTICS TIME OFF;
Here are the results from my 8 year old, single CPU desktop box...
========== The "BAD" way ==========
(26 row(s) affected)
SQL Server Execution Times:
CPU time = 22031 ms, elapsed time = 22302 ms.
========== Andrew's code with a mod ==========
(26 row(s) affected)
SQL Server Execution Times:
CPU time = 31 ms, elapsed time = 33 ms.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 26, 2010 at 12:33 pm
November 26, 2010 at 12:47 pm
Dugi (11/26/2010)
Jeff you are amazing, buddy... T-SQL simple works for you, really!:w00t::hehe::cool:;-)
Nah... this one was all Andrew. I'm the one that posted the bad code this time. :blush: I don't know what the heck I was thinking except that maybe I was thinking just exactly backwards.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 26, 2010 at 12:58 pm
Yes I see, but you are doing the brilliant and perfect comparison and explanation using the current solutions!
Have a nice weekend buddy!
Greetings!
November 26, 2010 at 1:05 pm
Dugi (11/26/2010)
Yes I see, but you are doing the brilliant and perfect comparison and explanation using the current solutions!Have a nice weekend buddy!
Greetings!
Ah... thanks, Dugi! I had to know and just thought I'd share the results. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
November 29, 2010 at 3:05 am
I have no idea why I put inner joins in there. In my head they were lefts. 🙂
November 30, 2010 at 11:34 am
Here is another variation with excellent performance.
select #tableA.conName,
isnull((select sum(total) from #tableB where conid = #tableA.conid),0),
isnull((select sum(total) from #tableC where conid = #tableA.conid),0)
from #tableA
order by 1
;
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply