SUMs when joining one to many tables

  • Hi all,

    I've come across this stumbling block quite a few times when writing up SPs for reports and I was wondering what the 'accepted best practise' is.

    (sql code for these tables at bottom)

    TableA

    ConID ConName

    1 Mr A

    2 Mr B

    TableB

    SomeID ConID Regionid Total

    1 1 1 10

    2 1 2 15

    3 2 1 10

    TableC

    SomeID ConID RegionID Total

    1 1 2 10

    2 2 2 10

    This is all totally fictitious and contrived data, in my case this morning it was items of work completed by contractors in dwellings which might be table B and also extra payments to contractors for things like driving vans and collecting materials etc which might be table C.

    I want to sum the totals from both table B and C, grouped by the contractor name/id.

    If I do

    select a.conID, a.conName, sum(b.total) as BTotal, sum(c.total) as CTotal

    from tableA a left outer join tableB b on a.conID = b.conID

    left outer join tableC c on a.conID = c.conID

    group by a.conID, a.conName

    it will return

    conID, conName, BTotal, CTotal

    1, Mr A, 25, 20 **this is the incorrect sum

    2, Mr B, 10, 10

    as there are two rows being returned for conID from tableB but only one actual row from tableC, the single tableC row with a total of 10 is being doubled.

    Whats the accepted way of forming this query...and then also grouping by multiple columns such as region ID and conID so you get totals for each contractor in each region? In the past I have used derived tables and temp tables, or making joins in views first.

    Thanks all

    DROP TABLE #tableA

    DROP TABLE #tableB

    DROP TABLE #tableC

    --===== Create the test table with

    CREATE TABLE #tableA

    (

    conID INT ,

    conName VARCHAR(150)

    )

    CREATE TABLE #tableB

    (

    someID INT ,

    conID INT,

    regionID INT,

    total float

    )

    CREATE TABLE #tableC

    (

    someID INT,

    conID INT,

    regionID INT,

    total float

    )

    --===== Insert the test data into the test table

    INSERT INTO #tableA

    (conID, conName)

    SELECT '1','Mr A' UNION ALL

    SELECT '2','Mr B'

    INSERT INTO #tableB

    (someID, conID, RegionID, Total)

    SELECT '1','1', 1, 10 UNION ALL

    SELECT '2','1', 2, 15 UNION ALL

    SELECT '3','2', 1, 10

    INSERT INTO #tableC

    (someID, conID, RegionID, Total)

    SELECT '1','1', 2, 10 UNION ALL

    SELECT '2','2', 2, 10

  • As a starter, here's one way you could get the totals you want:

    select a.conName, b.bTotal, c.cTotal

    from #tableA a

    inner join (select conID, SUM(Total) as bTotal from #tableB group by conID) b

    on b.conID = a.conID

    inner join (select conID, SUM(Total) as cTotal from #tableC group by conID) c

    on c.conID = a.conID

  • Hi there,

    Thats what I have done in the past created derived tables but I was wondering if there were any alternatives, better options, opinions about the performance hit of derived tables and so on. I cant help looking at some of these rather large derived queries and thinking..'god thats alot of unions and group bys!'

    Cheers

    Rolf

  • I believe the following will be faster and use fewer resources but we won't know for sure until we test with a whole lot more data.

    SELECT a.conID, a.conName,

    TotalB = SUM(b.Total),

    TotalC = SUM(c.Total)

    FROM #tableA a

    LEFT JOIN #tableB b ON a.conID = b.conID

    LEFT JOIN #tableC c ON a.conID = c.conID

    GROUP BY a.conID, a.conName

    {edit} I really shouldn't post until I've had enough coffee. I don't know what the heck I was thinking. [font="Arial Black"]Don't do it this way.[/font] I made a terrible mistake and the performance is horrible. I'll post the test code in a couple of minutes.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Here's the test code including more test data.

    DROP TABLE #tableA;

    DROP TABLE #tableB;

    DROP TABLE #tableC;

    --===== Create the test table with

    CREATE TABLE #tableA

    (

    conID INT PRIMARY KEY CLUSTERED,

    conName VARCHAR(150)

    )

    ;

    CREATE TABLE #tableB

    (

    someID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,

    conID INT,

    regionID INT,

    total float

    )

    ;

    CREATE TABLE #tableC

    (

    someID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,

    conID INT,

    regionID INT,

    total float

    )

    ;

    --=========================================================

    -- Insert the test data into the test tables

    --=========================================================

    --===== Build the data for 26 names contractors.

    INSERT INTO #tableA

    (conID, conName)

    SELECT TOP 26

    conID = Number,

    conName = 'Mr '+CHAR(Number+64) --65 = "A" in ASCII

    FROM master.dbo.spt_values

    WHERE Type = 'P'

    AND Number BETWEEN 1 AND 26

    ;

    --===== Build a million rows of test data for table B

    -- for 26 contractors across 10 regions and

    -- totals from 1 to 1000

    INSERT INTO #tableB

    (conID, RegionID, Total)

    SELECT TOP 10000

    conID = ABS(CHECKSUM(NEWID()))%26+1,

    RegionID = ABS(CHECKSUM(NEWID()))%10+1,

    Total = CAST((ABS(CHECKSUM(NEWID()))%10000+1)/100.0 AS FLOAT)

    FROM sys.all_columns rowsource1

    CROSS JOIN sys.all_columns rowsource2

    ;

    --===== Build rows of test data for table C

    -- for 26 contractors across 10 regions and

    -- totals from 1 to 1000

    INSERT INTO #tableC

    (conID, RegionID, Total)

    SELECT TOP 10000

    conID = ABS(CHECKSUM(NEWID()))%26+1,

    RegionID = ABS(CHECKSUM(NEWID()))%10+1,

    Total = CAST((ABS(CHECKSUM(NEWID()))%10000+1)/100.0 AS FLOAT)

    FROM sys.all_columns rowsource1

    CROSS JOIN sys.all_columns rowsource2

    ;

    CREATE INDEX ix_#TableB_conID_total ON #tableB (conID,total);

    CREATE INDEX ix_#TableC_conID_total ON #tableC (conID,total);

    --===== This method forms a partial cross join when creating the sums

    -- as well as having to join too many times.

    PRINT '========== The "BAD" way ==========';

    SET STATISTICS TIME ON;

    SELECT a.conID, a.conName,

    TotalB = SUM(b.Total),

    TotalC = SUM(c.Total)

    FROM #tableA a

    LEFT JOIN #tableB b ON a.conID = b.conID

    LEFT JOIN #tableC c ON a.conID = c.conID

    GROUP BY a.conID, a.conName

    ;

    SET STATISTICS TIME OFF;

    --===== This is Andrew's method. It very quickly creates the sums required

    -- and then only has to join 3 sets of 26 rows. I changed the INNER

    -- joins to LEFT joins to return a "0" for those contractors who had

    -- no totals.

    PRINT '========== Andrew''s code with a mod ==========';

    SET STATISTICS TIME ON;

    select a.conName, b.bTotal, c.cTotal

    from #tableA a

    LEFT join (select conID, SUM(Total) as bTotal from #tableB group by conID) b

    on b.conID = a.conID

    LEFT join (select conID, SUM(Total) as cTotal from #tableC group by conID) c

    on c.conID = a.conID

    ;

    SET STATISTICS TIME OFF;

    Here are the results from my 8 year old, single CPU desktop box...

    ========== The "BAD" way ==========

    (26 row(s) affected)

    SQL Server Execution Times:

    CPU time = 22031 ms, elapsed time = 22302 ms.

    ========== Andrew's code with a mod ==========

    (26 row(s) affected)

    SQL Server Execution Times:

    CPU time = 31 ms, elapsed time = 33 ms.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff you are amazing, buddy... T-SQL simple works for you, really!

    :w00t::hehe::cool:;-)

    ============================================================
    SELECT YOUR PROBLEM FROM SSC.com WHERE PROBLEM DESCRIPTION =
    http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url]

  • Dugi (11/26/2010)


    Jeff you are amazing, buddy... T-SQL simple works for you, really!

    :w00t::hehe::cool:;-)

    Nah... this one was all Andrew. I'm the one that posted the bad code this time. :blush: I don't know what the heck I was thinking except that maybe I was thinking just exactly backwards.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yes I see, but you are doing the brilliant and perfect comparison and explanation using the current solutions!

    Have a nice weekend buddy!

    Greetings!

    ============================================================
    SELECT YOUR PROBLEM FROM SSC.com WHERE PROBLEM DESCRIPTION =
    http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url]

  • Dugi (11/26/2010)


    Yes I see, but you are doing the brilliant and perfect comparison and explanation using the current solutions!

    Have a nice weekend buddy!

    Greetings!

    Ah... thanks, Dugi! I had to know and just thought I'd share the results. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I have no idea why I put inner joins in there. In my head they were lefts. 🙂

  • Here is another variation with excellent performance.

    select #tableA.conName,

    isnull((select sum(total) from #tableB where conid = #tableA.conid),0),

    isnull((select sum(total) from #tableC where conid = #tableA.conid),0)

    from #tableA

    order by 1

    ;

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply