aggregation, group by clause, still learning how to produce results - please help #2

  • I am trying to show the total number of invoices per customer, and the total purchase amounts by customer.

    ---my query

    Select c.cus_code, count(i.inv_NUMBER) as "number of Invoices", round(SUM(line_units * line_price), 2)as "Invoice Total"

    from line l

    join invoice i

    on l.inv_number = i.inv_number

    join customer c

    on i.cus_code = c.cus_code

    group by c.CUS_CODE;

    ---my results, the numbers in the second column are wrong.

    cus_codenumber of InvoicesInvoice Total

    100116444

    100123153.85

    100146422.77

    10015234.97

    10018170.44

    ---what results should look like

    cus_codenumber of InvoicesInvoice Total

    100113444

    100121153.85

    100142422.77

    10015134.97

    10018170.44

    ---invoice table you see has 3 invoices for 10011, 1 invoice for 10012, etc

    CUS_CODEINV_number

    100111002

    100111004

    100111008

    100121003

    100141001

    100141006

    100151007

    100181005

    ---DDL to recreate environment (creates 3 tables)

    CREATE TABLE CUSTOMER (

    CUS_CODE int,

    CUS_LNAME varchar(15),

    CUS_FNAME varchar(15),

    CUS_INITIAL varchar(1),

    CUS_AREACODE varchar(3),

    CUS_PHONE varchar(8),

    CUS_BALANCE float(8)

    );

    INSERT INTO CUSTOMER VALUES('10010','Ramas','Alfred','A','615','844-2573','0');

    INSERT INTO CUSTOMER VALUES('10011','Dunne','Leona','K','713','894-1238','0');

    INSERT INTO CUSTOMER VALUES('10012','Smith','Kathy','W','615','894-2285','345.859985351562');

    INSERT INTO CUSTOMER VALUES('10013','Olowski','Paul','F','615','894-2180','536.75');

    INSERT INTO CUSTOMER VALUES('10014','Orlando','Myron','','615','222-1672','0');

    INSERT INTO CUSTOMER VALUES('10015','O''Brian','Amy','B','713','442-3381','0');

    INSERT INTO CUSTOMER VALUES('10016','Brown','James','G','615','297-1228','221.190002441406');

    INSERT INTO CUSTOMER VALUES('10017','Williams','George','','615','290-2556','768.929992675781');

    INSERT INTO CUSTOMER VALUES('10018','Farriss','Anne','G','713','382-7185','216.550003051758');

    INSERT INTO CUSTOMER VALUES('10019','Smith','Olette','K','615','297-3809','0');

    /* -- */

    CREATE TABLE INVOICE (

    INV_NUMBER int,

    CUS_CODE int,

    INV_DATE datetime

    );

    INSERT INTO INVOICE VALUES('1001','10014','1/16/2008');

    INSERT INTO INVOICE VALUES('1002','10011','1/16/2008');

    INSERT INTO INVOICE VALUES('1003','10012','1/16/2008');

    INSERT INTO INVOICE VALUES('1004','10011','1/17/2008');

    INSERT INTO INVOICE VALUES('1005','10018','1/17/2008');

    INSERT INTO INVOICE VALUES('1006','10014','1/17/2008');

    INSERT INTO INVOICE VALUES('1007','10015','1/17/2008');

    INSERT INTO INVOICE VALUES('1008','10011','1/17/2008');

    /* -- */

    CREATE TABLE LINE (

    INV_NUMBER int,

    LINE_NUMBER int,

    P_CODE varchar(10),

    LINE_UNITS float(8),

    LINE_PRICE float(8)

    );

    INSERT INTO LINE VALUES('1001','1','13-Q2/P2','1','14.9899997711182');

    INSERT INTO LINE VALUES('1001','2','23109-HB','1','9.94999980926514');

    INSERT INTO LINE VALUES('1002','1','54778-2T','2','4.98999977111816');

    INSERT INTO LINE VALUES('1003','1','2238/QPD','1','38.9500007629395');

    INSERT INTO LINE VALUES('1003','2','1546-QQ2','1','39.9500007629395');

    INSERT INTO LINE VALUES('1003','3','13-Q2/P2','5','14.9899997711182');

    INSERT INTO LINE VALUES('1004','1','54778-2T','3','4.98999977111816');

    INSERT INTO LINE VALUES('1004','2','23109-HB','2','9.94999980926514');

    INSERT INTO LINE VALUES('1005','1','PVC23DRT','12','5.86999988555908');

    INSERT INTO LINE VALUES('1006','1','SM-18277','3','6.98999977111816');

    INSERT INTO LINE VALUES('1006','2','2232/QTY','1','109.919998168945');

    INSERT INTO LINE VALUES('1006','3','23109-HB','1','9.94999980926514');

    INSERT INTO LINE VALUES('1006','4','89-WRE-Q','1','256.989990234375');

    INSERT INTO LINE VALUES('1007','1','13-Q2/P2','2','14.9899997711182');

    INSERT INTO LINE VALUES('1007','2','54778-2T','1','4.98999977111816');

    INSERT INTO LINE VALUES('1008','1','PVC23DRT','5','5.86999988555908');

    INSERT INTO LINE VALUES('1008','2','WR3/TT3','3','119.949996948242');

    INSERT INTO LINE VALUES('1008','3','23109-HB','1','9.94999980926514');

    /* -- */

    What do I need to do to query so that the number of invoice is accurate?

    -

  • count(distinct i.inv_NUMBER)

    ______________________________________________________________________

    Personal Motto: Why push the envelope when you can just open it?

    If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.

    Jason L. Selburg
  • because that's what I do when I am learning:-P. Thank you for that. I would not have thought of it and it helps my learning. Sooooo appreciate it.

    -

  • select

    c.cus_code

    ,[number of Invoices]=COUNT(distinct i.INV_NUMBER)

    ,[Invoice Total]=round(SUM(l.LINE_PRICE * l.LINE_UNITS),2)

    from

    #CUSTOMER c

    left outer join #INVOICE i on i.CUS_CODE = c.CUS_CODE

    left outer join #LINE l on l.INV_NUMBER = i.INV_NUMBER

    group by

    c.cus_code

    having

    COUNT(distinct i.INV_NUMBER) > 0

    [font="Courier New"]Looking for a Deadlock Victim Support Group..[/font]
  • Burninator, why are you going all sophisticated on me?

    -

  • Please take a look at the differences between this code and your original - there are several learning points in there...

    CREATE TABLE dbo.Customer

    (

    Code integer NOT NULL PRIMARY KEY,

    LastName varchar(15) NOT NULL,

    FirstName varchar(15) NOT NULL,

    Initial char(1) NOT NULL,

    AreaCode varchar(3) NOT NULL,

    Phone varchar(8) NOT NULL,

    Balance smallmoney NOT NULL

    );

    GO

    INSERT dbo.Customer

    (Code, LastName, FirstName, Initial, AreaCode, Phone, Balance)

    VALUES

    (10010,'Ramas','Alfred','A','615','844-2573',$0.00),

    (10011,'Dunne','Leona','K','713','894-1238',$0.00),

    (10012,'Smith','Kathy','W','615','894-2285',$345.86),

    (10013,'Olowski','Paul','F','615','894-2180',$536.75),

    (10014,'Orlando','Myron','','615','222-1672',$0.00),

    (10015,'O''Brian','Amy','B','713','442-3381',$0.00),

    (10016,'Brown','James','G','615','297-1228',$221.19),

    (10017,'Williams','George','','615','290-2556',$768.93),

    (10018,'Farriss','Anne','G','713','382-7185',$216.55),

    (10019,'Smith','Olette','K','615','297-3809',$0.00);

    CREATE TABLE dbo.Invoice

    (

    InvoiceNumber integer NOT NULL PRIMARY KEY,

    CustomerCode integer NOT NULL REFERENCES dbo.Customer,

    InvoiceDate date NOT NULL

    );

    GO

    INSERT dbo.Invoice

    (InvoiceNumber, CustomerCode, InvoiceDate)

    VALUES

    (1001,10014,'2008-01-16'),

    (1002,10011,'2008-01-16'),

    (1003,10012,'2008-01-16'),

    (1004,10011,'2008-01-17'),

    (1005,10018,'2008-01-17'),

    (1006,10014,'2008-01-17'),

    (1007,10015,'2008-01-17'),

    (1008,10011,'2008-01-17');

    CREATE TABLE dbo.Line

    (

    InvoiceNumber integer NOT NULL REFERENCES dbo.Invoice,

    LineNumber int NOT NULL,

    Pcode varchar(10) NOT NULL,

    Units tinyint NOT NULL,

    LinePrice smallmoney NOT NULL,

    PRIMARY KEY (InvoiceNumber, LineNumber)

    );

    GO

    INSERT dbo.Line

    (InvoiceNumber, LineNumber, Pcode, Units, LinePrice)

    VALUES

    (1001,1,'13-Q2/P2',1,14.99),

    (1001,2,'23109-HB',1,9.95),

    (1002,1,'54778-2T',2,4.99),

    (1003,1,'2238/QPD',1,38.95),

    (1003,2,'1546-QQ2',1,39.95),

    (1003,3,'13-Q2/P2',5,14.99),

    (1004,1,'54778-2T',3,4.99),

    (1004,2,'23109-HB',2,9.95),

    (1005,1,'PVC23DRT',12,5.87),

    (1006,1,'SM-18277',3,6.99),

    (1006,2,'2232/QTY',1,109.92),

    (1006,3,'23109-HB',1,9.95),

    (1006,4,'89-WRE-Q',1,256.99),

    (1007,1,'13-Q2/P2',2,14.99),

    (1007,2,'54778-2T',1,4.99),

    (1008,1,'PVC23DRT',5,5.87),

    (1008,2,'WR3/TT3',3,119.95),

    (1008,3,'23109-HB',1,9.95);

    -- Per customer totals

    SELECT

    InvoiceCount = COUNT_BIG(DISTINCT inv.InvoiceNumber),

    SalesTotal = SUM(line.Units * line.LinePrice)

    FROM dbo.Customer AS cus

    JOIN dbo.Invoice AS inv ON

    inv.CustomerCode = cus.Code

    JOIN dbo.Line AS line ON

    line.InvoiceNumber = inv.InvoiceNumber

    GROUP BY

    cus.Code;

  • Paul, I feel the love.:hehe: I now understand that a CTE is the tool for this job.

    Thanks for the Anatomy and Physiology of CTE based queries. I studied your example and see how the aliases for the columns are built and how they are aggregated against in the outer query.

    Not the main point but I wondered why you used Count_Big for a column whose value, invoice_number is int.

    per msdn.microsoft.com "COUNT_BIG always returns a bigint data type value. COUNT always returns an int data type value"

    -

  • Actually, the above CTE solution is the tool for the job for my other posting:

    http://www.sqlservercentral.com/Forums/Topic1240528-392-1.aspx#bm1240564

    For this scenario, using a CTE seems out of place.

    -

  • aitchkcandoo (1/23/2012)


    Not the main point but I wondered why you used Count_Big for a column whose value, invoice_number is int. per msdn.microsoft.com "COUNT_BIG always returns a bigint data type value. COUNT always returns an int data type value"

    Internally, the count function always returns a BIGINT (regardless of the type of the column of course - we are just counting the rows). When you use COUNT, the internal result is cast to an integer; with COUNT_BIG that cast isn't necessary. It's a very small thing, and just a habit I have developed to use COUNT_BIG by default. You should use whichever (COUNT or COUNT_BIG) makes sense in each case - if you want an integer count, use COUNT.

  • This reminded me of a post some time ago by Marc Friedman where he talks about distinct aggregations and table spools. I took Paul’s solution and applied it (or at least tried) to Marc’s suggestion. The query plans are different and might be worth a look. Like you, aitchkcandoo, I'm also still learning.

    Reference Marc Friedman - http://blogs.msdn.com/b/sqlqueryprocessing/archive/2008/09/22/distinct-aggregation-considered-harmful.aspx

    SET STATISTICS PROFILE ON

    SET STATISTICS TIME ON

    ;WITH CustInvoice AS (

    SELECT Code,

    LastName,

    FirstName,

    Initial,

    AreaCode,

    Phone,

    Balance,

    inv.InvoiceNumber as InvoiceNumber,

    CustomerCode,

    InvoiceDate,

    LineNumber,

    Units,

    LinePrice

    FROM dbo.Customer AS cus

    JOIN dbo.Invoice AS inv

    ON inv.CustomerCode = cus.Code

    JOIN dbo.Line AS line

    ON line.InvoiceNumber = inv.InvoiceNumber

    )

    , PartialSums AS (

    SELECT CustomerCode,

    InvoiceNumber,

    COUNT(*) Cust,

    SUM(Units * LinePrice) AS InvSum

    FROM CustInvoice

    GROUP BY CustomerCode, InvoiceNumber

    )

    SELECT

    CustomerCode,

    SUM(1) AS InvoiceCount,

    SUM(InvSum) AS SalesTotal

    FROM PartialSums

    GROUP BY CustomerCode

  • XploreBI (1/29/2012)


    This reminded me of a post some time ago by Marc Friedman where he talks about distinct aggregations and table spools. I took Paul’s solution and applied it (or at least tried) to Marc’s suggestion. The query plans are different and might be worth a look. Like you, aitchkcandoo, I'm also still learning.

    You might also enjoy: http://sqlblog.com/blogs/paul_white/archive/2011/12/04/is-distinct-aggregation-still-considered-harmful.aspx

  • Read it. Thanks.

    -

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply