February 16, 2014 at 10:07 pm
Hi, i'm building a query to return metrics that will drive 3 seperate pivot tables showing
1. Total count of LineItems per D_Type each month
2. Total count of LineItems per Status each month
3. Avg count of LineItems per Invoice each month
I am able to get the first two, but having hard time with the 3rd.
Here's some representative ddl
create table Remediation
(Invoice nvarchar(10), D_Type nvarchar(20), Status nvarchar(20), RemediationDate datetime);
insert into Remediation values
--this will create data for Jan, 2014
('501', 'Recycle', 'Pass', getdate()-30),
('501', 'Reuse', 'Pass', getdate()-30),
('501', 'Remarket', 'Fail', getdate()-30),
('501', 'Recycle', 'Fail', getdate()-30),
('502', 'Recycle', 'Pass', getdate()-30),
('502', 'Reuse', 'Pass', getdate()-30),
('502', 'Reuse', 'Fail', getdate()-30),
('502', 'Reuse', 'Fail', getdate()-30),
('502', 'Reuse', 'Fail', getdate()-30),
('502', 'Remarket', 'Pass', getdate()-30),
('502', 'Remarket', 'Pass', getdate()-30),
('502', 'Remarket', 'Pass', getdate()-30),
--this will create data for Feb, 2014
('503', 'Recycle', 'Pass', getdate()),
('503', 'Reuse', 'Fail', getdate()),
('503', 'Remarket', 'Fail', getdate()),
('504', 'Recycle', 'Pass', getdate()),
('504', 'Recycle', 'Pass', getdate()),
('504', 'Recycle', 'Pass', getdate());
my query so far returns only first 2 metrics
;with cte as
(
select
[Invoice],
[D_Type],
[Status],
left(datename(month, [RemediationDate]), 3) as Mo,
year([RemediationDate]) as Yr
From
[dbo].[Remediation]
)
select D_Type, Status, Yr, Mo, count(*) as #LineItems
from cte
group by grouping sets
(
(D_Type, Yr, Mo),
(D_Type, Yr),
(Status, Yr, Mo),
(Status, Yr)
);
how to add the average metric to this query?
February 17, 2014 at 12:00 am
Really not sure if this is what you are after. (No sample results so i cant be sure)
;with cte2 as
(
select
[Invoice],
[D_Type],
[Status],
left(datename(month, [RemediationDate]), 3) as Mo,
year([RemediationDate]) as Yr
From
#Remediation
),
cte3 as (
select D_Type, Status, Yr, Mo, count(*) as #LineItems
from cte2
group by grouping sets
(
(D_Type, Yr, Mo),
(D_Type, Yr),
(Status, Yr, Mo),
(Status, Yr)
)
)
select *, AVG(#LineItems) over (partition by D_Type, yr, mo)
from cte3
February 17, 2014 at 1:44 am
A classic CROSSTAB makes short work of this as a single "pivot" table. Please see the following article on what a CROSSTAB IS and how it works.
http://www.sqlservercentral.com/articles/T-SQL/63681/
SELECT Month = SUBSTRING(CONVERT(CHAR(30),DATEADD(mm,DATEDIFF(mm,0,RemediationDate),0),113),4,8)
,[RecycleCount] = SUM(CASE WHEN D_Type = 'Recycle' THEN 1 ELSE 0 END)
,[ReuseCount] = SUM(CASE WHEN D_Type = 'Reuse' THEN 1 ELSE 0 END)
,[RemarketCount] = SUM(CASE WHEN D_Type = 'Remarket' THEN 1 ELSE 0 END)
,[PassCount] = SUM(CASE WHEN Status = 'Pass' THEN 1 ELSE 0 END)
,[FailCount] = SUM(CASE WHEN Status = 'Fail' THEN 1 ELSE 0 END)
,[TotalLineItems] = COUNT(*)
,[TotalInvoices] = COUNT(DISTINCT Invoice)
,[AvgLineItemsPerInvoice] = (COUNT(*)+0.0)/COUNT(DISTINCT Invoice)
FROM dbo.Remediation
GROUP BY DATEDIFF(mm,0,RemediationDate)
ORDER BY DATEDIFF(mm,0,RemediationDate)
;
Here's the output using your test data...
Month RecycleCount ReuseCount RemarketCount PassCount FailCount TotalLineItems TotalInvoices AvgLineItemsPerInvoice
-------- ------------ ----------- ------------- ----------- ----------- -------------- ------------- ---------------------------------------
Jan 2014 3 5 4 7 5 12 2 6.000000000000
Feb 2014 4 1 1 4 2 6 2 3.000000000000
(2 row(s) affected)
--Jeff Moden
Change is inevitable... Change for the better is not.
February 17, 2014 at 8:29 am
Do you mind telling me if I could have added something to my existing query to get avg too?
February 17, 2014 at 9:59 pm
KoldCoffee (2/17/2014)
Do you mind telling me if I could have added something to my existing query to get avg too?
To be honest, I've never had to use GROUPING SETS before and would have to read up on them to answer that question.
Is the problem that I "pivoted" all of the data and that your really need them in the "vertical" format of your original query?
--Jeff Moden
Change is inevitable... Change for the better is not.
February 17, 2014 at 10:10 pm
thanks so much. Yes, that's the problem! I need to pull into excel and assign columns, rows, and let excel do pivoting on the units by month And Year independently. But the data here is already in presentation form... can you help?
polkadot and KoldKoffee are same.:discuss:
--Quote me
February 17, 2014 at 11:06 pm
Thank you very much for the explanation as to why you needed the data this way. I always wonder about these things.
I believe this may be what you asked for.
WITH
cte AS
(
SELECT Invoice
,D_Type
,Status
,Yr = YEAR(RemediationDate)
,Mo = LEFT(DATENAME(mm,RemediationDate),3)
,RemediationDate
FROM dbo.Remediation
)
SELECT D_Type
,Status
,Yr
,Mo
,#LineItems = COUNT(*)
,Avg#LineItemsPerInvoice = COUNT(*)/COUNT(DISTINCT Invoice)
FROM cte
GROUP BY GROUPING SETS
(
(D_Type, Yr, Mo)
,(D_Type, Yr)
,(Status, Yr, Mo)
,(Status, Yr)
)
;
That returns the following which contains all of the same data that your good code did plus the extra column you were looking for. The key was to understand that you couldn't use the AVG aggregate because you needed to count distinct invoices to get the average by invoice by month by year.
D_Type Status Yr Mo #LineItems Avg#LineItemsPerInvoice
-------------------- -------------------- ----------- ---- ----------- -----------------------
NULL Fail 2014 Feb 2 2
NULL Pass 2014 Feb 4 2
NULL Fail 2014 Jan 5 2
NULL Pass 2014 Jan 7 3
NULL Fail 2014 NULL 7 2
NULL Pass 2014 NULL 11 2
Recycle NULL 2014 Feb 4 2
Recycle NULL 2014 Jan 3 1
Remarket NULL 2014 Feb 1 1
Remarket NULL 2014 Jan 4 2
Reuse NULL 2014 Feb 1 1
Reuse NULL 2014 Jan 5 2
Recycle NULL 2014 NULL 7 1
Remarket NULL 2014 NULL 5 1
Reuse NULL 2014 NULL 6 2
(15 row(s) affected)
As a bit of a sidebar, I now know more about GROUPING SETS and concatenated ROLLUPs than I ever wanted to know.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 18, 2014 at 12:00 am
Jeff, it's not quite there yet. Definitely, the problem is that I need average # of lineItems per distinct Invoice.
So, for Jan 2014, where there are12 lineitems spread over 2 distinct invoices, the average is 6. For Feb 2014 there are 6 lineitems spread over 2 distinct invoices, the average is 3. Year totals will be 18 lineitems/4 dstinct invoices=4.5 avg number of lineitems per invoice.
IMO, results should look like this:
SELECT 'D_Type', 'Status', 'Yr', 'Mo', '#LineItems', 'Avg#LineItemsPerInvoice' UNION ALL
SELECT NULL, 'Fail', '2014', 'Feb', '2', '3' UNION ALL
SELECT NULL, 'Fail', '2014', 'Jan', '5', '6' UNION ALL
SELECT NULL, 'Fail', '2014', NULL, '7', '3' UNION ALL
SELECT NULL, 'Pass', '2014', 'Feb', '4', '3' UNION ALL
SELECT NULL, 'Pass', '2014', 'Jan', '7', '6' UNION ALL
SELECT NULL, 'Pass', '2014', NULL, '11', '4.5' UNION ALL
SELECT 'Recycle', NULL, '2014', 'Feb', '4', '3' UNION ALL
SELECT 'Recycle', NULL, '2014', 'Jan', '3', '6' UNION ALL
SELECT 'Recycle', NULL, '2014', NULL, '7', '4.5' UNION ALL
SELECT 'Remarket', NULL, '2014', 'Feb', '1', '3' UNION ALL
SELECT 'Remarket', NULL, '2014', 'Jan', '4', '6' UNION ALL
SELECT 'Remarket', NULL, '2014', NULL, '5', '4.5' UNION ALL
SELECT 'Reuse', NULL, '2014', NULL, '5', '4.5' UNION ALL
SELECT 'Reuse', NULL, '2014', 'Feb', '1', '3' UNION ALL
SELECT 'Reuse', NULL, '2014', 'Jan', '5', '6'
The problem I'm having is that distinct is disallowed with the Over() clause so I can't partition like follows:
WITH
cte AS
(
SELECT Invoice
,D_Type
,Status
,Yr = YEAR(RemediationDate)
,Mo = LEFT(DATENAME(mm,RemediationDate),3)
,RemediationDate
FROM dbo.Remediation
)
SELECT D_Type
,Status
,Yr
,Mo
,#LineItems = COUNT(*)
,Avg#LineItemsPerInvoice = COUNT(*)/COUNT(DISTINCT Invoice) over (partition by yr, mo)
FROM cte
GROUP BY GROUPING SETS
(
(D_Type, Yr, Mo)
,(D_Type, Yr)
,(Status, Yr, Mo)
,(Status, Yr)
)
;
exact error is:
Msg 10759, Level 15, State 1, Line 17
Use of DISTINCT is not allowed with the OVER clause.
February 18, 2014 at 12:11 am
The real problem is the GROUPING SETS. The averages are computed based on the grouping. I'll have to think about this some other time. It's 02:10 here and I've gotta get some shuteye.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 18, 2014 at 12:18 am
and I always thought you were in Australia. Thanks Jeff, I'm kinda (small kinda) glad it stumped you. I'm using the wrong tool for the task, it sounds like.
February 18, 2014 at 9:58 pm
It's not really a stumper the way I did it. According to the groupings, the correct answer displays. The problem is that's not the answer that you're expecting.
Shifting gears a bit here, I'm thinking that this whole thing is just not quite right. We're preaggregating data for an Excel Pivot? Why not just let Excel do the aggregations during the Pivot?
--Jeff Moden
Change is inevitable... Change for the better is not.
February 18, 2014 at 10:34 pm
the reason for preaggregating is because there's too much data to fit into excel and growing. But, in preaggregating the column names need to remain intact, with no intersection of time dimension with fact columns. No intersection means only grouping, so that there are fewer rows to import. It would work, if only I didn't need to divide sum(lineItems) by DISTINCT invoices for AVG metric...
I really wish I could just import all the data and let Excel do the pivoting for me.
I shared Excel spreadsheet.
February 23, 2014 at 1:07 pm
Jeff, the query you gave me gives incorrect average, anyway. So the following query shows how I get just the correct Average#ofLineItemsPerDistinctInvoice for any given month. It's what I need in the final query as well.
WITH
cte AS
(
SELECT Invoice
,D_Type
,Status
,Yr = YEAR(RemediationDate)
,Mo = LEFT(DATENAME(mm,RemediationDate),3)
,RemediationDate
FROM dbo.Remediation
)
SELECT
Yr,
Mo,
--D_Type,
--Status,
--Invoice,
Distinct#Invoice = count(distinct Invoice),
#LineItems = COUNT(*),
#Avg#LineItemsPerInvoice = COUNT(*)/count(distinct Invoice)
FROM cte
group by
Mo,
Yr
--D_Type,
--Status,
--Invoice
which gives following correct averages
select 'Yr','Mo','Distinct#Invoice','#LineItems','#AvgLineItemsPerDistinctInvoice' union all
select '2014','Feb','2','6','3' union all
select '2014','Jan','2','12','6'
When I uncomment D_Type, Status, Invoice from above query I get the following result set:
select 'Mo', 'Yr', 'D_Type', 'Status', 'Invoice', '#LineItems', 'Distinct#Invoice' UNION ALL
select 'Feb', '2014', 'Recycle', 'Pass', '503', '1', '1' UNION ALL
select 'Feb', '2014', 'Recycle', 'Pass', '503', '3', '1' UNION ALL
select 'Feb', '2014', 'Remarket', 'Fail', '503', '1', '1' UNION ALL
select 'Feb', '2014', 'Reuse', 'Fail', '503', '1', '1' UNION ALL
select 'Jan', '2014', 'Recycle', 'Fail', '501', '1', '1' UNION ALL
select 'Jan', '2014', 'Recycle', 'Pass', '501', '1', '1' UNION ALL
select 'Jan', '2014', 'Recycle', 'Pass', '502', '1', '1' UNION ALL
select 'Jan', '2014', 'Remarket', 'Fail', '501', '1', '1' UNION ALL
select 'Jan', '2014', 'Remarket', 'Pass', '502', '3', '1' UNION ALL
select 'Jan', '2014', 'Reuse', 'Fail', '502', '3', '1' UNION ALL
select 'Jan', '2014', 'Reuse', 'Pass', '501', '1', '1' UNION ALL
select 'Jan', '2014', 'Reuse', 'Pass', '502', '1', '1'
Problem with it is that it now gives the count of distinct invoices per D_Type, Status, Invoice which will always be one. I need count distinct for each Yr,Mo grouping only, for eg. In January there were 6 line items and two distinct invoices, so average#ofLineItems is 6/2 = 3
The distinct is definitely the challenge here. Is there any solution in SQL, (maybe a ROLLUP by count of Distinct Invoices per month?) to build this query so I can do the aggregations for D_Type, Status, and Invoice, and also show Avg#ofLineItems/CountofDistinctInvoices per Month, in Excel?
February 23, 2014 at 4:48 pm
Problem is solved by using Dense_Rank function and the blog that helped me solve problem of not being able to use COUNT DISTINCT in the OVER clause with PARTITION BY clause is http://beyondrelational.com/modules/2/blogs/51/posts/15524/distinct-counts-using-aggregate-functions-with-over-clause.aspx.
SELECT Yr
,Mo
, Invoice
,D_Type
,Status
,MAX(RankByInvoice) over (partition by Yr, Mo) as NumDistinctInvoices
,RemediationDate
FROM
(
SELECT
Invoice,
RemediationDate,
D_Type, Status,
YEAR(RemediationDate) as Yr,
LEFT(DATENAME(mm,RemediationDate),3) as Mo,
RankByInvoice = DENSE_RANK() OVER (PARTITION BY YEAR(RemediationDate), LEFT(DATENAME(mm,RemediationDate),3) ORDER BY Invoice)
FROM dbo.Remediation
)p
Another link for solving COUNT DISTINCT with OVER clause is here
http://stackoverflow.com/questions/13480880/window-functions-to-count-distinct-records.
This was a tricky problem for me to solve and I didn't give it a good title or explain the end use of my query which had become a Grouping Set problem after Partition by wasn't working. It would have helped, I think, if I'd explained background and also destination (Excel) or result set!
I do not need to use GROUPING SETS and can go back to using Windowing functions for getting Count Distinct. Thanks to Jeff and Matak for feedback.
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply