September 26, 2016 at 6:47 pm
Hi There, need some help.
I have list of prices and the dates they are valid from and to. For Example:
Date FromDate ToPrice
20/06/201326/08/2013199.99
27/08/201321/01/2014199.99
3/10/201331/10/2013149.99
2/11/20137/11/2013149.99
25/11/201326/11/2013149.99
27/11/201329/11/2013140
30/11/20135/12/2013149.99
6/12/20136/12/2013140
7/12/201325/12/2013149.99
14/01/201414/01/2014119.99
Some Prices overlap so the actual price that is valid is the lowest price, so for example while the entry for 199.99 is from the 27/08/2013-21/01/2014, when the lower price of 149.99 comes into effect on the 3/10/2013 then 149.99 is the actual price.
What I want to do is to break this down to actual active periods and if possible combine periods where the value is the same, so for the example above the result would be :
Date FromDate ToPrice
20/06/20132/10/2013199.99
3/10/201331/10/2013149.99
1/11/20131/11/2013199.99
2/11/20137/11/2013149.99
8/11/201324/11/2013199.99
25/11/201326/11/2013149.99
27/11/201329/11/2013140
30/11/20135/12/2013149.99
6/12/20136/12/2013140
7/12/201325/12/2013149.99
26/12/201313/01/2014199
14/01/201414/01/2014119.99
15/01/201421/01/2014199.99
I am struggling to come up with a solution after trying for a number of days.
Any suggestions welcomed.
Thanks
September 26, 2016 at 9:31 pm
To close to bedtime to code and test, but you should just be able to use the LAG and LEAD operations in a cte to look at the previous [price] and the next [date from] values. Use OVER(ORDER BY [Date From])
Filter out rows where price = lagprice. They don't change the price so they are irrelevant.
[Date To] is recalculated by subtracting one day from the LEAD(Date From]) That way there are no gaps in the date ranges.
Hope this helps.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
September 27, 2016 at 9:10 am
will you ever have the case where there are "gaps" in your date ranges?
ie...there are dates where you have no price
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
September 27, 2016 at 10:38 am
One way is to expand the dates with a number/tally table and then roll them up again:
-- *** Consumable Test Data ***
-- Please supply in future with dates in ISO format.
CREATE TABLE #t
(
DateFrom date NOT NULL
,DateTo date NOT NULL
,Price money NOT NULL
);
INSERT INTO #t
VALUES ('20130620', '20130826', 199.99)
,('20130827', '20140121', 199.99)
,('20131003', '20131031', 149.99)
,('20131102', '20131107', 149.99)
,('20131125', '20131126', 149.99)
,('20131127', '20131129', 140.00)
,('20131130', '20131205', 149.99)
,('20131206', '20131206', 140.00)
,('20131207', '20131225', 149.99)
,('20140114', '20140114', 119.99);
-- *** End Test Data ***
WITH Numbers(n)
AS
(
-- 10000 rows
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) a(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d(n)
)
,MinPrices
AS
(
SELECT A.PriceDate, MIN(Price) AS Price
FROM #t T
JOIN Numbers N
ON N.N <= DATEDIFF(day, T.DateFrom, T.DateTo) + 1
CROSS APPLY (VALUES( DATEADD(day, N -1, T.DateFrom)) ) A(PriceDate)
GROUP BY A.PriceDate
)
,Grps
AS
(
SELECT PriceDate, Price
,ROW_NUMBER() OVER (ORDER BY PriceDate)
- ROW_NUMBER() OVER (PARTITION BY Price ORDER BY PriceDate) AS Grp
FROM MinPrices
)
,Results
AS
(
SELECT MIN(PriceDate) AS DateFrom
,MAX(PriceDate) AS DateTo
,Price
,Grp
FROM Grps
GROUP BY Price, Grp
)
SELECT DateFrom, DateTo, Price
FROM Results
ORDER BY DateFrom;
September 27, 2016 at 1:26 pm
Hi There
No there won't be any gaps, they must be continuous even if they overlap.
Thanks
September 27, 2016 at 1:32 pm
This works great, thank you so much
Ken McKelvey (9/27/2016)
One way is to expand the dates with a number/tally table and then roll them up again:
-- *** Consumable Test Data ***
-- Please supply in future with dates in ISO format.
CREATE TABLE #t
(
DateFrom date NOT NULL
,DateTo date NOT NULL
,Price money NOT NULL
);
INSERT INTO #t
VALUES ('20130620', '20130826', 199.99)
,('20130827', '20140121', 199.99)
,('20131003', '20131031', 149.99)
,('20131102', '20131107', 149.99)
,('20131125', '20131126', 149.99)
,('20131127', '20131129', 140.00)
,('20131130', '20131205', 149.99)
,('20131206', '20131206', 140.00)
,('20131207', '20131225', 149.99)
,('20140114', '20140114', 119.99);
-- *** End Test Data ***
WITH Numbers(n)
AS
(
-- 10000 rows
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) a(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d(n)
)
,MinPrices
AS
(
SELECT A.PriceDate, MIN(Price) AS Price
FROM #t T
JOIN Numbers N
ON N.N <= DATEDIFF(day, T.DateFrom, T.DateTo) + 1
CROSS APPLY (VALUES( DATEADD(day, N -1, T.DateFrom)) ) A(PriceDate)
GROUP BY A.PriceDate
)
,Grps
AS
(
SELECT PriceDate, Price
,ROW_NUMBER() OVER (ORDER BY PriceDate)
- ROW_NUMBER() OVER (PARTITION BY Price ORDER BY PriceDate) AS Grp
FROM MinPrices
)
,Results
AS
(
SELECT MIN(PriceDate) AS DateFrom
,MAX(PriceDate) AS DateTo
,Price
,Grp
FROM Grps
GROUP BY Price, Grp
)
SELECT DateFrom, DateTo, Price
FROM Results
ORDER BY DateFrom;
September 27, 2016 at 3:16 pm
Ken McKelvey (9/27/2016)
One way is to expand the dates with a number/tally table and then roll them up again:
The problem with this approach is that it can be very expensive, because you're creating a lot of unnecessary records. The following gives the same results, but is much more efficient.
;
WITH packed_dates AS (
SELECT p.price, d.dt, d.is_start,
CASE
WHEN LEAD(d.dt, 1, '9999') OVER(PARTITION BY p.price ORDER BY d.dt, d.is_start) = d.dt THEN 1
WHEN LAG(d.dt, 1, '1900') OVER(PARTITION BY p.price ORDER BY d.dt, d.is_start) = d.dt THEN 1
END AS exclusion_flag
FROM #prices p
CROSS APPLY (VALUES(p.date_from, 1), (DATEADD(DAY, 1, p.date_to), 0)) d(dt, is_start)
)
, date_ranges AS (
SELECT pd.dt AS date_from, LEAD(pd.dt) OVER(ORDER BY pd.dt) AS date_to
FROM packed_dates pd
WHERE pd.exclusion_flag IS NULL
)
SELECT dr.date_from, DATEADD(DAY, -1, dr.date_to) AS date_to, MIN(p.price) AS price
FROM date_ranges dr
INNER JOIN #prices p
ON p.date_from < dr.date_to
AND p.date_to >= dr.date_from
WHERE dr.date_from < dr.date_to
GROUP BY dr.date_from, dr.date_to
;
Here is a comparison of times:
----- Expand the dates -----
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#t__________________________________________________________________________________________________________________0000000E5AA8'. Scan count 1, logical reads 10000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 47 ms, elapsed time = 53 ms.
----- "Unpivot"/Lead/Lag -----
Table '#prices_____________________________________________________________________________________________________________0000000E56A2'. Scan count 2, logical reads 14, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
As you can see, the "expand the dates" approach has only one scan of the table, but 10000 logical reads, whereas mine has two scans, but only 14 logical reads. This manifests in the large difference in the CPU and elapsed times. And that's on a very small subset of data.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
September 29, 2016 at 5:39 am
No there won't be any gaps, they must be continuous even if they overlap.
Instead of trying to repair the damage after it has been done, you can add constraints and prevent it from happening in the first place. Read this article:
https://www.simple-talk.com/sql/t-sql-programming/contiguous-time-periods/
After this, download a PDF of the Rick Snodgrass book on temporal queries in SQL. It is free from the University of Arizona.
Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
November 28, 2016 at 7:30 pm
Thank you so much for code, it is a lot faster but it has some issues that I am struggling to figure out.
With the following data the faster version gets it wrong, but the slower version gets it right, any idea why or how to fix as I have come to rely on the performance:
DateFromDateToPrice
2016-06-272016-11-2849.99
2016-06-279999-12-3049.99
2016-11-299999-12-3039.99
The valid From dates are the same and for some reason both of the 49.99 prices are having the exclusion flag set.
I would appreciate any help
Thanks
November 29, 2016 at 12:42 pm
wjh_uk (11/28/2016)
Thank you so much for code, it is a lot faster but it has some issues that I am struggling to figure out.With the following data the faster version gets it wrong, but the slower version gets it right, any idea why or how to fix as I have come to rely on the performance:
DateFromDateToPrice
2016-06-272016-11-2849.99
2016-06-279999-12-3049.99
2016-11-299999-12-3039.99
The valid From dates are the same and for some reason both of the 49.99 prices are having the exclusion flag set.
I would appreciate any help
Thanks
The algorithm assumes that the combination of Price, dt, and is_start is unique. You need to make sure that this is the case before applying the exclusions. Given that, see if you can modify the query to produce unique combinations of those fields.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
December 1, 2016 at 7:32 am
This isn't going to be as fast as Drew's code, but it should avoid the problems, and be slightly faster than Ken's once you get some volume of data considerably larger than the test data provided:
CREATE TABLE #PRICE_VALID_DATES
(
DateFrom date NOT NULL
,DateTo date NOT NULL
,Price money NOT NULL
);
CREATE NONCLUSTERED INDEX IX_PRICE_VALID_DATES_DateFrom_INCLUDES_REMAINDER ON #PRICE_VALID_DATES
(
DateFrom ASC
)
INCLUDE
(
DateTo, Price
);
CREATE NONCLUSTERED INDEX IX_PRICE_VALID_DATES_DateTo_INCLUDES_REMAINDER ON #PRICE_VALID_DATES
(
DateTo ASC
)
INCLUDE
(
DateFrom, Price
);
CREATE NONCLUSTERED INDEX IX_PRICE_VALID_DATES_Price_INCLUDES_REMAINDER ON #PRICE_VALID_DATES
(
Price ASC
)
INCLUDE
(
DateFrom, DateTo
);
SET NOCOUNT ON;
INSERT INTO #PRICE_VALID_DATES
VALUES ('20130620', '20130826', 199.99)
,('20130827', '20140121', 199.99)
,('20131003', '20131031', 149.99)
,('20131102', '20131107', 149.99)
,('20131125', '20131126', 149.99)
,('20131127', '20131129', 140.00)
,('20131130', '20131205', 149.99)
,('20131206', '20131206', 140.00)
,('20131207', '20131225', 149.99)
,('20140114', '20140114', 119.99);
DECLARE @MIN_DATE AS date = (SELECT MIN(DateFrom) FROM #PRICE_VALID_DATES;
DECLARE @MAX_DATE AS date = (SELECT MAX(DateTo) FROM #PRICE_VALID_DATES;
DECLARE @NUM_DAYS AS bigint = DATEDIFF(day, @MIN_DATE, @MAX_DATE) + 1;
WITH E1 AS (
SELECT 1 AS N UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
ALL_DATES AS (
-- ONLY GET AS MANY DATES AS YOU NEED TO COVER THE ENTIRE POSSIBLE RANGE OF VALID DATES
SELECT TOP (@NUM_DAYS) DATEADD(day, ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) - 1, @MIN_DATE) AS DAY_DATE
FROM E1 AS a, E1 AS b, E1 AS c, E1 AS d
),
DAILY_PRICES AS (
SELECT D.DAY_DATE, MIN(P.Price) AS Price,
CASE
--DETECT WHEN THE PRICE CHANGES
WHEN LAG(MIN(P.Price), 1, -0.01) OVER(ORDER BY DAY_DATE) <> MIN(P.Price) THEN 1
ELSE 0
END AS IS_CHANGED
FROM #PRICE_VALID_DATES AS P
INNER JOIN ALL_DATES AS D-- THE JOIN CONDITION APPLIES THE PRICE TO ALL THE DATES IN THE GIVEN RANGE
ON D.DAY_DATE BETWEEN P.DateFrom AND P.DateTo
GROUP BY D.DAY_DATE
)
SELECT PCD.DAY_DATE AS DATE_FROM,
--AS RANGES ARE GUARANTEED TO BE CONTINUOUS, YOU JUST NEED THE DAY BEFORE THE NEXT PRICE CHANGE, OR THE MAX DATE VALUE, AS THE END OF A GIVEN RANGE
ISNULL(DATEADD(day, -1, LEAD(PCD.DAY_DATE, 1) OVER(ORDER BY PCD.DAY_DATE)), @MAX_DATE) AS DATE_TO,
PCD.Price
FROM DAILY_PRICES AS PCD
WHERE IS_CHANGED = 1
OR PCD.DAY_DATE = @MIN_DATE
ORDER BY DAY_DATE;
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
December 1, 2016 at 9:16 am
Here is the updated code:
;
WITH unique_dates AS (
SELECT DISTINCT p.Price, d.dt, d.is_start
FROM #prices p
CROSS APPLY (VALUES(p.date_from, 1), (DATEADD(DAY, 1, p.date_to), 0)) d(dt, is_start)
)
, packed_dates AS (
SELECT ud.price, ud.dt, ud.is_start,
CASE
WHEN LEAD(ud.dt, 1, '9999') OVER(PARTITION BY ud.price ORDER BY ud.dt, ud.is_start) = ud.dt THEN 1
WHEN LAG(ud.dt, 1, '1900') OVER(PARTITION BY ud.price ORDER BY ud.dt, ud.is_start) = ud.dt THEN 1
END AS exclusion_flag
FROM unique_dates ud
)
, date_ranges AS (
SELECT pd.dt AS date_from, LEAD(pd.dt) OVER(ORDER BY pd.dt) AS date_to
FROM packed_dates pd
WHERE pd.exclusion_flag IS NULL
)
SELECT dr.date_from, DATEADD(DAY, -1, dr.date_to) AS date_to, MIN(p.price) AS price
FROM date_ranges dr
INNER JOIN #prices p
ON p.date_from < dr.date_to
AND p.date_to >= dr.date_from
WHERE dr.date_from < dr.date_to
GROUP BY dr.date_from, dr.date_to
;
I've split the first CTE into two parts. The first one finds the unique values that I mentioned, the second one sets the exclusion flags.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply