November 16, 2021 at 1:26 am
Hi Everyone
I have an Access query that computes the percentile for a given set of values. The key is that the calculation is based on a rolling window of the past X days. For example, if calculated today, it will look at the last 30 periods (including today's date) and then determine the percentile of today's value. The rolling 30 periods window is applied to each day. The query I have uses a lot of nested selects etc because Access has no native function to calculate percentile. The problem is that my implementation is not remotely efficient. I want to use SQL Server's native function PERCENT_RANK instead. The part I am struggling with is how to implement the rolling part. Sample data is as follows:
Stk Dt RN VL
A 2020-01-03 1 -0.0341460056
A 2020-01-06 2 0.0211422434
A 2020-01-07 3 -0.0040987371
A 2020-01-08 4 0.0089279974
A 2020-01-09 5 0.1026746903
A 2020-01-10 6 -0.1027788847
A 2020-01-13 7 0.0764304778
RN is row number. The DT has gaps in them so to address this I added a row number (RN) field. This way the last 30 periods is always going to be treated consistently (ie, it will always have 30 data points) vs using last 30 days (which may not have 30 days b/c there are holidays).
I did some research on the PERCENT_RANK function. Looks like I can get it to work as I have described by using the PARTITION clause. However, I don't know how to partition my data as described (ie look at the last 30 periods). I know I have to use RN but not really sure how.
Is someone able to help guide me on this? How can I partition the data so each day it only looks at the last 30 periods (not days).
Thank you
November 16, 2021 at 4:24 am
a standard rolling window would be something like
SUM([Qty]) OVER (PARTITION BY <columnName> ORDER BY <date> ROWS BETWEEN PREVIOUS 30 ROWS AND CURRENT ROW)
and then you would divide by that.
November 16, 2021 at 6:59 pm
I am getting an error message.
SELECT *,
PERCENT_RANK() OVER (PARTITION BY UNDERLYING_SYMBOL ORDER BY QUOTE_DATE ROWS BETWEEN PREVIOUS 12 ROWS AND CURRENT ROW) AS PCT_RANK
FROM DBO.Stock_Log_Price_Changes
error message is
Msg 102, Level 15, State 1, Line 2
Incorrect syntax near 'PREVIOUS'.
Completion time: 2021-11-16T10:51:42.1779430-08:00
How do i fix this?
November 16, 2021 at 7:08 pm
Got some data for us to play with? That's not data.
November 16, 2021 at 8:05 pm
If a parent record doesn't have 30 previous periods/records, then you want to exclude it?
Any chance you could translate your question into a Sales kind of question, so I can use something with a lot of data already, ... like everybody's favorite, AdventureWorks database?
November 16, 2021 at 8:18 pm
November 16, 2021 at 9:40 pm
which ever is easier ... exclude or mark as NULL. I can work with NULLs in subsequent queries.
I don't have AW DB downloaded. Are you able to work with the Excel I provided?
November 16, 2021 at 9:41 pm
Thank you for the feedback. I will definitely review it.
I am not strong with SQL so I am learning as I go.
November 16, 2021 at 10:04 pm
Are you able to work with the Excel I provided?
Copy-paste from your spreadsheet into "Insert/Edit Code Sample"
November 16, 2021 at 10:11 pm
I didn't know about that feature!! the table look great. Thank you for bringing this to my attention. I used 5 days look back period to keep the table small and manageable. If the period has less than 5 days then return NULL. I can deal with the NULL in future queries. Thank you so much !!
SYMBOLQUOTE_DATEROWNUMBERVALUE% Rank
ABC2020-01-031-0.0041460060
ABC2020-01-0620.0011422430.5
ABC2020-01-073-9.87371E-050.25
ABC2020-01-0840.0069279971
ABC2020-01-0950.002674691
ABC2020-01-106-0.0027788850.5
ABC2020-01-22134.81656E-050.5
ABC2020-01-23140.0007011490.75
ABC2020-01-2415-0.0108732420.25
ABC2020-01-2716-0.0125097950.25
ABC2020-01-28170.0103269451
ABC2020-01-2918-0.0004359170.5
ABC2020-01-3019-0.0006679610.25
ABC2020-01-3120-0.016488927NULL
ABC2020-02-03210.008613701NULL
ABC2020-02-04220.014874236NULL
ABC2020-02-05230.010221595NULL
DEF2020-02-06240.003132020.25
DEF2020-02-0725-0.0042392560
DEF2020-02-10260.0047115010.75
DEF2020-02-11270.0041211720.75
DEF2020-02-12280.0056146081
DEF2020-02-1329-0.0018423410.25
DEF2020-02-14300.0005986890.75
DEF2020-02-1831-0.0010909490.75
DEF2020-02-19320.0047611881
DEF2020-02-2033-0.0043578730.75
DEF2020-02-2134-0.0122458240.75
DEF2020-02-2435-0.029824950.75
DEF2020-02-2536-0.0327209690.5
DEF2020-02-2637-0.0013588890.75
DEF2020-02-2738-0.037744758NULL
DEF2020-02-2839-0.04355777NULL
DEF2020-03-02400.055845488NULL
DEF2020-03-0341-0.007660045NULL
November 16, 2021 at 10:13 pm
wait a minute...it changed format. isn't there a way to add a table so all the values are easily viewable?
November 16, 2021 at 10:38 pm
I got this working with a SUM
SELECT *,
SUM(PRICES) OVER (PARTITION BY SYMBOL ORDER BY QUOTE_DATE ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS CURRENTANDPREV3
FROM DBO.Stock
It produces the sum as noted. I checked in Excel and the calculation is correct. This means that the syntax is correct. Next, I changed SUM(PRICE_CHANGE) to PERCENT_RANK():
SELECT *,
PERCENT_RANK() OVER (PARTITION BY SYMBOL ORDER BY QUOTE_DATE ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS CURRENTANDPREV3
FROM DBO.Stock
Msg 10752, Level 15, State 3, Line 2
The function 'PERCENT_RANK' may not have a window frame.
Completion time: 2021-11-16T14:35:52.3292022-08:00
Looks like using ROWS command doesn't work with PERCENT_RANK().
Any suggestions on how to make it work?
November 20, 2021 at 12:21 am
Don't forget that if you are looking for 30 days rolling you will need to inject missing dates so that they are contiguous as the BETWEEN is looking at row counts not date offsets. If there is no data for (lets say...) Chrismas, Boxing day and New Year, then your rolling 30 rows would actually pick up 33 days of data. You would also then need to consider how to treat these NULL days in the average.
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply