August 29, 2016 at 2:37 pm
Hi Everyone,
I am trying to achieve a 3 month rolling average calculation with the first two rows (or two years) of each company's calculated average column returning a NULL.
I have the basic SQL working for a small subset of data for the first company, but when I add a second company's worth of data into the table I'm unsure of why the first two rows for that company do not return NULL like they do for company ABC.
CompnayRevenue YearRevenue AmountThreeMonthRunningAvg
ABC201215654125.71NULL
ABC201316336060.44NULL
ABC201416316640.4916102275.5466
ABC20158286560.4613646420.4633
ABC20166599912.4610401037.8033
ABC20176603495.80977163322.9099
ABC20183618616.98665607341.7521
ABC20192232067.744151393.5121
ABC2020372011.292074232.0055
ABC20210.00868026.3433
DEF20122452324.12941445.1366
DEF20133897922.142116748.7533
DEF20144342007.983564084.7466
DEF201513284278.397174736.17
DEF201615043320.556110889868.9753
DEF201714835750.305914387783.084
DEF201816343303.652115407458.1713
DEF201916209570.902115796208.2867
DEF202011115708.557614556194.3706
DEF20219023715.854612116331.7714
Here is the SQL that I'm using to generate the results above:
SELECT Company
,[Revenue Year]
,[Revenue Amount]
,CASE
WHEN COUNT(*) OVER (
ORDER BY Company
,[Revenue Year] ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) > 2
THEN AVG([Revenue Amount]) OVER (
ORDER BY Company
,[Revenue Year] ROWS BETWEEN 2 PRECEDING
AND CURRENT ROW
)
ELSE NULL
END ThreeMonthRunningAvg
FROM YHL.NAICS_Company_Revenue
ORDER BY 1
,2
I've changed the window of the query to just the company and then just the revenue amount, but I still must be missing something.
When this table is fully loaded there will be about 10 or so company's worth of data ranging from 2012-2021. What I'm trying to achieve is a 3 year running average by company with the first two rows for each company to come back as a NULL (since they're isn't 3 years available for the avg calculation).
Any tips or suggestions are much appreciated.
Thanks for your time.
August 29, 2016 at 2:55 pm
Where's the PARTITION BY Company
part of your query? PARTITION BY is analogous to GROUP BY... and it seems to be missing, so the totals/counts won't "restart" when the Company changes.
Should it not be... Note the "PARTITION BY Company" in both calculations
SELECT Company
,[Revenue Year]
,[Revenue Amount]
,CASE
WHEN COUNT(*) OVER (
PARTITION BY Company
ORDER BY Company
,[Revenue Year] ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) > 2
THEN AVG([Revenue Amount]) OVER (
PARTITION BY Company
ORDER BY Company
,[Revenue Year] ROWS BETWEEN 2 PRECEDING
AND CURRENT ROW
)
ELSE NULL
END ThreeMonthRunningAvg
FROM YHL.NAICS_Company_Revenue
ORDER BY 1
,2
August 30, 2016 at 8:58 am
Thanks for the reply. I missed the partition by clause and that has definitely helped -- appreciate you pointing that out.
I thought I had the 3 month rolling average calculation correct, but it looks like I need to shift the frame of the window function by one row and I'm unsure of exactly how to do this.
Here is the logic that I'm currently using:
SELECT Company
,[Revenue Year]
,[Revenue Amount]
,CASE
WHEN COUNT(*) OVER (
PARTITION BY Company ORDER BY Company
,[Revenue Year]
) > 3
THEN AVG([Revenue Amount]) OVER (
PARTITION BY Company ORDER BY Company
,[Revenue Year] ROWS BETWEEN 3 PRECEDING
AND CURRENT ROW
)
ELSE NULL
END ThreeMonthRunningAvg
FROM myTable
ORDER BY 1
,2
This is producing these results:
CompanyRevenue YearRevenue AmountThreeMonthRunningAvg
ABC2012 15654125.71NULL
ABC2013 16336060.44NULL
ABC2014 16316640.49NULL
ABC2015 8286560.46 14148346.775
ABC2016 6599912.46 11884793.4625
What our folks are wanting to see is that when the year is 2015 they want a three year running average of 2012,2013,2014 so it is almost like i'd need to specify something like (ROWS BETWEEN 3 PRECEDING AND CURRENT ROW -1). I know that syntax is invalid, but is there a way to have the avg calculation only include the 3 prior years and not the current row like I have now?
Thanks so much!
August 30, 2016 at 10:00 am
A couple things.
First, the order by company in your window clause does not make any sense, I think you want it to be sorting by a date. Second, you don't need to include "ELSE NULL" in a CASE statement, the default behavior is to return a NULL. Third, to shift the rolling average by one month you would simply change "ROWS BETWEEN 3 PRECEDING AND CURRENT ROW" to "ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING". Note the sample data below.
-- sample data
DECLARE @table TABLE
(
revid int identity not null,
company varchar(10) not null,
revdate date not null,
revAmt int
);
INSERT @table (company, revdate, revAmt)
SELECT TOP (6) 'A', getdate()+checksum(newid())%100, abs(checksum(newid())%100)+1
FROM sys.all_columns
UNION ALL
SELECT TOP (6) 'B', getdate()+checksum(newid())%100, abs(checksum(newid())%100)+1
FROM sys.all_columns;
-- solutions
SELECT
company,
revdate,
revAmt,
CASE
WHEN COUNT(*) OVER (PARTITION BY company ORDER BY revdate) >= 3
THEN
AVG(revAmt) OVER
(
PARTITION BY company
ORDER BY revdate
ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
)
END, -- this month and the previous 2
CASE
WHEN COUNT(*) OVER (PARTITION BY company ORDER BY revdate) > 3
THEN
AVG(revAmt) OVER
(
PARTITION BY company
ORDER BY revdate
ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING
)
END -- shift the rows 1 back, note the different result
FROM @table
ORDER BY company, revdate; -- not required, using for display only. Remove this.
-- Itzik Ben-Gan 2001
August 30, 2016 at 10:54 am
Just looking at this again, it appears (based on your sample data) that you are looking to do a rolling 3 year average. In that case, you want something like this:
-- (1) sample data
IF OBJECT_ID('tempdb..#YearlySalesByCo') IS NOT NULL DROP TABLE #YearlySalesByCo;
CREATE TABLE #YearlySalesByCo
(
company varchar(10) not null,
[revenue year] smallint not null,
revenue int not null
);
WITH x AS (SELECT yr FROM (VALUES (2009),(2010),(2011),(2012),(2013),(2014),(2015)) x(yr))
INSERT #YearlySalesByCo (company, [revenue year], revenue)
SELECT 'A', yr, abs(checksum(newid())%100000)+10000 FROM x
UNION ALL
SELECT 'B', yr, abs(checksum(newid())%100000)+10000 FROM x;
-- (2) You want this POC index
CREATE UNIQUE NONCLUSTERED INDEX poc_YearlySalesByCo ON #YearlySalesByCo(company, [revenue year])
INCLUDE (revenue);
-- (3) Solution
SELECT
company,
[revenue year],
revenue,
[3YearAverage] =
CASE
WHEN COUNT(*) OVER (PARTITION BY company ORDER BY [revenue year]) >= 3
THEN AVG(revenue) OVER (PARTITION BY company ORDER BY [revenue year])
END
FROM
(
SELECT
company,
YrNbr = DENSE_RANK() OVER (PARTITION BY company ORDER BY [revenue year]),
[revenue year],
revenue = SUM(revenue)
FROM #YearlySalesByCo
GROUP BY Company, [revenue year]
) AS SalesByCoByYrByMo;
GO
-- Itzik Ben-Gan 2001
August 30, 2016 at 11:18 am
Alan.B thank you very much for the two replies. I have learned a lot from the SQL that you posted and I'm grateful for your time.
The first reply gave me what I needed after I adjusted the frame from
ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING
to ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING
Again, thank you very much!!
August 30, 2016 at 11:23 am
np!
-- Itzik Ben-Gan 2001
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply