February 10, 2015 at 5:07 am
Hi,
I have few intervals with status.
They change in Time (order indicated by OrderId).
OrderId Status DateStart DateEnd
----------- ------ ----------------------- -----------------------
1 A 2015-01-01 00:00:00.000 2015-03-31 00:00:00.000
2 B 2015-01-15 00:00:00.000 2015-02-15 00:00:00.000
3 C 2015-02-15 00:00:00.000 2015-03-15 00:00:00.000
4 D 2015-01-15 00:00:00.000 2015-03-15 00:00:00.000
5 E 2015-04-01 00:00:00.000 2015-04-15 00:00:00.000
In the End I would Like to have flattened (in order of appearance, next OrderId has priority) result like this :
OrderId Status DateStart DateEnd
----------- ------ ----------------------- -----------------------
1 A 2015-01-01 00:00:00.000 2015-01-14 00:00:00.000
4 D 2015-01-15 00:00:00.000 2015-03-15 00:00:00.000
1 A 2015-03-16 00:00:00.000 2015-03-31 00:00:00.000
5 E 2015-04-01 00:00:00.000 2015-04-15 00:00:00.000
as you can see:
1) OrderId=1 was split into 2 rows, as it was partially overlapped by next ranges
2) orderdId 2 and 3 don't exist anymore as they've been overlapped by OrderId=5.
code for source table:
select *
FROM
(VALUES
(1, 'A',convert(datetime,'20150101'),convert(datetime,'20150331')),
(2, 'B','20150115','20150215'),
(3, 'C','20150215','20150315'),
(4, 'D','20150115','20150315'),
(5, 'E','20150401','20150415')
) AS TimeIntervals(OrderId, status, DateStart, DateEnd)
Hope you can help.
Tomek
February 10, 2015 at 6:25 am
You may want to take a look at the below. This is a "Gaps and Islands" problem. The below is designed around a numeric gap, but can be worked around to handle dates/times. I've done this a while back (broke it down into finding 1 minute gaps), but this is a good starting point.
February 10, 2015 at 2:23 pm
There's a solution to the gaps and islands problem in Itzik Ben-Gan's book on windowing functions, too. I know I've seen at least one article on the web. Make sure you are searching for the right version of SQL Server, because some solutions are specific to later versions (2012, 2014).
February 10, 2015 at 4:11 pm
This works with your data, not very efficient though
WITH Source AS (
SELECT *
FROM
(VALUES
(1, 'A',convert(datetime,'20150101'),convert(datetime,'20150331')),
(2, 'B','20150115','20150215'),
(3, 'C','20150215','20150315'),
(4, 'D','20150115','20150315'),
(5, 'E','20150401','20150415')
) AS TimeIntervals(OrderId, status, DateStart, DateEnd)
),
Starts AS (
SELECT a.DateStart AS dt
FROM Source a
WHERE NOT EXISTS(SELECT * FROM Source b
WHERE b.OrderId > a.OrderId
AND a.DateStart BETWEEN b.DateStart AND b.DateEnd)),
Ends AS (
SELECT a.DateEnd AS dt
FROM Source a
WHERE NOT EXISTS(SELECT * FROM Source b
WHERE b.OrderId > a.OrderId
AND a.DateEnd BETWEEN b.DateStart AND b.DateEnd)),
Combined AS (
SELECT dt,'S' AS StartEnd
FROM Starts
UNION ALL
SELECT dt,'E' AS StartEnd
FROM Ends
),
Ordered AS (
SELECT dt,StartEnd,
ROW_NUMBER() OVER(ORDER BY dt) AS rn
FROM Combined)
SELECT ca.OrderId,
ca.status,
CASE WHEN s.StartEnd = 'E' THEN DATEADD(day,1,s.dt) ELSE s.dt END AS DateStart,
CASE WHEN e.StartEnd = 'S' THEN DATEADD(day,-1,e.dt) ELSE e.dt END AS DateEnd
FROM Ordered s
INNER JOIN Ordered e ON e.rn = s.rn + 1
CROSS APPLY (SELECT TOP 1 ca.OrderId,ca.status
FROM Source ca
WHERE s.dt >= ca.DateStart
AND e.dt <= ca.DateEnd
ORDER BY ca.OrderId DESC) ca
ORDER BY s.dt;
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537February 10, 2015 at 5:50 pm
venoym (2/10/2015)
You may want to take a look at the below. This is a "Gaps and Islands" problem. The below is designed around a numeric gap, but can be worked around to handle dates/times. I've done this a while back (broke it down into finding 1 minute gaps), but this is a good starting point.
I differ with your opinion here. This is similar to a gaps and islands problem but is not solved the same way. This is the overlapping intervals problem.
The original link to Itzik Ben-Gan's article on SolidQ had gone dead the last time I checked, but I used his methods in discussing various efficient approaches to this problem here:
Calculating Gaps Between Overlapping Time Intervals in SQL [/url]
The link to IBG's article is in that one, and it is too bad his article is gone because it explains it quite well. Note that this article takes the normalized overlapping intervals and converts them to the gaps (so the second step you don't need).
If you find that a bit daunting to understand (I confess that I did at first), let me know and I'll make a solution happen based on your specific test data.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
February 11, 2015 at 3:33 am
Hi,
First of all, thank you all for your input. It is invaluable.
your solutions with numbers looks very interesting, but as Dwain suggested I had problems finding out how it would be relevant to my case
I started reading Itzik Ben Gan's 'Window Functions in SQL Server 2012", but I'm not as far yet.
@mark-3 Cowne - Your solution seems to work fine. Thanks! I need to do some testing. As you mentioned - not very perfomarnt though...
@Dwain - Thanks for commenting on my problems again (I received some useful hints from your sometime ago - also on Time Intervals).
I would be very, very greteful if you would find some time, to show me how to implement your logic to my case.
OH, and important info.I didn't know it before, but solution will be meant for SQL 2012.
I would rather know a way for both 2008r2 and 2012, but I don't want to waste your time...
Many Thanks.
Tomek
February 11, 2015 at 3:42 am
tomek tomek (2/11/2015)
Hi,First of all, thank you all for your input. It is invaluable.
a) you solutions with numbers looks very interesting, but as Dwain suggested I had problems finding out how it would be relevant to my case
b) I started reading Itzik Ben Gan's 'Window Functions in SQL Server 2012", but I'm not as far yet.
@mark-3 Cowne - Your solution seems to work fine. Thanks! I need to do some testing. As you mentioned - not very perfomarnt though...
@Dawin - Thanks for commenting on my problems again (I received some useful hints from your sometime ago - also on Time Intervals).
I would be very, very greteful if you would find some time, to show me how to implement your logic to my case.
Many Thanks.
Tomek
Just be aware that I've changed the query I originally posted, it should work correctly now although sadly no quicker.
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537February 11, 2015 at 4:53 am
It was getting the wrong values for ordereid and status in some circumstances.
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537February 11, 2015 at 6:16 am
it seems there's something wrong with your new code (I don't see same problem with previous solution).
for following dataset:
SELECT *
FROM
(VALUES
(1, '1',convert(datetime,'20121001'),convert(datetime,'20141031')),
(2, '7','20140701','20140731'),
(3, '10','20140801','20141031')
) AS TimeIntervals(OrderId, status, DateStart, DateEnd)
I receive this :
OrderId status DateStart DateEnd
----------- ------ ----------------------- -----------------------
1 1 2012-10-01 00:00:00.000 2014-06-30 00:00:00.000
2 7 2014-07-01 00:00:00.000 2014-07-31 00:00:00.000
1 1 2014-08-01 00:00:00.000 2014-07-31 00:00:00.000
3 10 2014-08-01 00:00:00.000 2014-10-31 00:00:00.000
Correct Result would be:
OrderId status DateStart DateEnd
----------- ------ ----------------------- -----------------------
1 1 2012-10-01 00:00:00.000 2014-06-30 00:00:00.000
2 7 2014-07-01 00:00:00.000 2014-07-31 00:00:00.000
3 10 2014-08-01 00:00:00.000 2014-10-31 00:00:00.000
1. Can you check this?
2. Can you show me example where your second approach fixes problems found with the first one?
For me, first solution seems to be working fine.
regards
Tomek
February 11, 2015 at 6:33 am
The problem is due to two intervals having a one day gap between the end of one and the start of the next, I think these can simply be excluded.
Change
CROSS APPLY (SELECT TOP 1 ca.OrderId,ca.status
FROM Source ca
WHERE s.dt >= ca.DateStart
AND e.dt <= ca.DateEnd
ORDER BY ca.OrderId DESC) ca
ORDER BY s.dt;
to
CROSS APPLY (SELECT TOP 1 ca.OrderId,ca.status
FROM Source ca
WHERE s.dt >= ca.DateStart
AND e.dt <= ca.DateEnd
ORDER BY ca.OrderId DESC) ca
-- Exclude the case where two consecutive intervals have a gap of one day - this will generate a negative interval
WHERE CASE WHEN s.StartEnd = 'E' THEN DATEADD(day,1,s.dt) ELSE s.dt END <= CASE WHEN e.StartEnd = 'S' THEN DATEADD(day,-1,e.dt) ELSE e.dt END
ORDER BY s.dt;
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537February 11, 2015 at 9:12 am
Well, this was a fun one to do.
I decided to try it without reading Dwain's article from earlier so this may not be the best way.
Basically I am joining to a calendar table and grouping to resolve the overlap, which then gives me an islands problem, which I solve in IBG fashion
For the calendar table, I borrowed Dwain's inline function from here[/url]
CREATE TABLE #SourceData
(
OrderID int,
OrderStatus char(1),
StartDatedatetime,
EndDate datetime
)
Insert #SourceData (OrderID,OrderStatus,StartDate,EndDate)
select *
FROM
(VALUES
(1, 'A',convert(datetime,'20150101'),convert(datetime,'20150331')),
(2, 'B','20150115','20150215'),
(3, 'C','20150215','20150315'),
(4, 'D','20150115','20150315'),
(5, 'E','20150401','20150415')
) AS TimeIntervals(OrderId, status, DateStart, DateEnd)
;
WITH CrossToCalendar AS (
SELECT c.[Date], MAX(OrderID) OrderId FROM #SourceData s
CROSS APPLY (SELECT g.[Date] FROM GenerateCalendar('2015-01-01',365) g
WHERE g.[Date] BETWEEN s.StartDate AND s.EndDate) c
GROUP BY c.[Date]
),
IslandBuilder AS (
SELECT DATEADD(day,(-1*DENSE_RANK() OVER (PARTITION BY OrderID ORDER BY [date])),[date]) dr,date,OrderId FROM CrossToCalendar
),
Islands AS (
SELECT MIN(date) StartDate,MAX(date) EndDate,dr,OrderId
FROM IslandBuilder
GROUP BY dr,OrderId
)
SELECT i.OrderId,s.orderstatus,i.StartDate,i.EndDate
FROM Islands i
JOIN #SourceData s
ON i.OrderId = s.OrderID
ORDER BY StartDate
DROP TABLE #SourceData
Not sure if it would be better to use ROW_NUMBER() in cross to calendar instead of grouping so I could carry the status down instead of rejoining at the end, but this is the path I had started down so I stuck with it.
EDIT: Just skimmed through Dwain's article, and nope, he doesn't bother with a calendar table. But he is also packing overlapping intervals, and this problem splits them with Order ID as a tiebreaker instead.
February 11, 2015 at 9:32 am
Here's another version, this one using SQL Server 2012 LEAD
WITH Source AS (
SELECT *
FROM
(VALUES
(1, 'A',convert(datetime,'20150101'),convert(datetime,'20150331')),
(2, 'B','20150115','20150215'),
(3, 'C','20150215','20150315'),
(4, 'D','20150115','20150315'),
(5, 'E','20150401','20150415')
) AS TimeIntervals(OrderId, status, DateStart, DateEnd)
),
Starts AS (
SELECT a.DateStart AS dt
FROM Source a
WHERE NOT EXISTS(SELECT * FROM Source b
WHERE b.OrderId > a.OrderId
AND a.DateStart BETWEEN b.DateStart AND b.DateEnd)),
Ends AS (
SELECT a.DateEnd AS dt
FROM Source a
WHERE NOT EXISTS(SELECT * FROM Source b
WHERE b.OrderId > a.OrderId
AND a.DateEnd BETWEEN b.DateStart AND b.DateEnd)),
Combined AS (
SELECT dt,'S' AS StartEnd
FROM Starts
UNION ALL
SELECT dt,'E' AS StartEnd
FROM Ends
),
Ordered AS (
SELECT dt AS DateStart,
StartEnd AS S_StartEnd,
LEAD(dt) OVER(ORDER BY dt) AS DateEnd,
LEAD(StartEnd) OVER(ORDER BY dt) AS E_StartEnd
FROM Combined)
SELECT ca.OrderId,
ca.status,
CASE WHEN s.S_StartEnd = 'E' THEN DATEADD(day,1,s.DateStart) ELSE s.DateStart END AS DateStart,
CASE WHEN s.E_StartEnd = 'S' THEN DATEADD(day,-1,s.DateEnd) ELSE s.DateEnd END AS DateEnd
FROM Ordered s
CROSS APPLY (SELECT TOP 1 ca.OrderId,ca.status
FROM Source ca
WHERE s.DateStart >= ca.DateStart
AND s.DateEnd <= ca.DateEnd
ORDER BY ca.OrderId DESC) ca
-- Exclude the case where two consecutive intervals have a gap of one day - this will generate a negative interval
WHERE CASE WHEN s.S_StartEnd = 'E' THEN DATEADD(day,1,s.DateStart) ELSE s.DateStart END <= CASE WHEN s.E_StartEnd = 'S' THEN DATEADD(day,-1,s.DateEnd) ELSE s.DateEnd END
ORDER BY s.DateStart;
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537February 11, 2015 at 5:23 pm
tomek tomek (2/11/2015)
@Dwain - Thanks for commenting on my problems again (I received some useful hints from your sometime ago - also on Time Intervals).I would be very, very greteful if you would find some time, to show me how to implement your logic to my case.
Hmmm. Upon taking a closer look at your expected results, I'm not sure if I understood the problem correctly, so let me ask a couple of questions here (refer to comments).
OrderID status DateStart DateEnd
1 A 2015-01-01 00:00:00.000 2015-03-31 00:00:00.000 -- Starts with a 3 month interval
2 B 2015-01-15 00:00:00.000 2015-02-15 00:00:00.000 -- Starts/ends within OrderID=1
3 C 2015-02-15 00:00:00.000 2015-03-15 00:00:00.000 -- Starts/ends within OrderID=1
4 D 2015-01-15 00:00:00.000 2015-03-15 00:00:00.000 -- Starts/ends within OrderID=1
5 E 2015-04-01 00:00:00.000 2015-04-15 00:00:00.000 -- Starts and ends after OrderID=1
Since OrderIDs 2, 3 and 4 all start and end within OrderID=1, why are they not eliminated such that your final result is (these would be the non-overlapping time periods during which you have an order):
OrderID status DateStart DateEnd
1 A 2015-01-01 00:00:00.000 2015-03-31 00:00:00.000
5 E 2015-04-01 00:00:00.000 2015-04-15 00:00:00.000
Perhaps my morning coffee has not yet kicked in... Either that or it is not an overlapping intervals problem as I first thought it was.
You also said originally:
tomek tomek (2/10/2015)
as you can see:1) OrderId=1 was split into 2 rows, as it was partially overlapped by next ranges
2) orderdId 2 and 3 don't exist anymore as they've been overlapped by OrderId=5.
1. What is the relationship between OrderID=1 and OrderIDs IN (2,3) such that they would cause OrderID=1 to be split?
2. Since OrderID=4 is also within OrderID=1, why doesn't it cause some effect to OrderID=1?
3. On your 2), shouldn't that be OrderID=1 (instead of OrderID=5 because 5 overlaps with neither)?
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
February 11, 2015 at 9:15 pm
Basically if two intervals would be active he considers the one with the highest order id active. 2 never shows because it perfectly overlaps with 4, which is higher.
Viewing 15 posts - 1 through 15 (of 18 total)
You must be logged in to reply to this topic. Login to reply