September 13, 2013 at 8:40 am
Here's one that's been nagging me.
I've got two tables, with date "ranges." Table A is an "order" with a start / end range, the related table B is an exclusion date range table, which could have many exclusion ranges per "order."
Table A
OrderID, StartDate, EndDate
14,'2013-09-13','2013-10-01'
Table B
OrderID, ExclusionStart, ExclusionEnd
14, '2013-09-14','2013-09-15'
14, '2013-09-21','2013-09-22'
14, '2013-09-28','2013-09-29'
14, '2013-09-25','2013-09-25'
Sample setup:
CREATE TABLE [dbo].[OrderSample](
[OrderID] [int] ,
[OrderStartDate] [datetime] ,
[OrderEndDate] [datetime]
)
--## Create sample 'order'
INSERT INTO OrderSample VALUES(14,'2013-09-01','2013-09-30')
CREATE TABLE [dbo].[ExclusionSample](
[OrderID] [int] ,
[ExclusionStartDate] [datetime] ,
[ExclusionEndDate] [datetime]
)
--## Create sample 'exclusion ranges' - my product does not work these days
INSERT INTO ExclusionSample VALUES(14, '2013-09-14','2013-09-15')
INSERT INTO ExclusionSample VALUES(14, '2013-09-21','2013-09-22')
INSERT INTO ExclusionSample VALUES(14, '2013-09-28','2013-09-29')
INSERT INTO ExclusionSample VALUES(14, '2013-09-25','2013-09-25')
What's the best way to get the MAX() consecutive valid days (uninterrupted by exclusion dates) per order?
Desired Output:
Table Output
OrderID, MaxConsecValidDates
My current solution is to use a UDF to explode both ranges and get a full list of dates per OrderID in separate "staging" tables, then filter out the excluded dates from a "ValidOrderDates" table... And run a CTE over the top of that to return the consecutive days.
This is slow at the outset to set up, even with relatively low (300/400k rows in each table) but manageable incrementally going forward. Is there a bigger/better/faster approach? Am I overlooking a way to do this in a set-based manner?
Thank you for looking it over!
September 13, 2013 at 9:21 am
I totally don't understand what the output you expect for this. Please provide the ACTUAL output you expect for your given sample. You may need to set up additional sample data (with expected outputs for each set) to fully demonstrate your requirements.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
September 13, 2013 at 9:39 am
Apologies for the confusion, Kevin - thanks for having a look! In a hurry this morning, so I sketched it out pretty fast.
I'll set up some supporting scripts for setup as soon as I have a minute, but the output I currently get looks like:
OrderID | MaxConsecutiveDays
14, 5
Meaning that my required output returns, for each order, the maximum number of consecutive days which are uninterrupted by exclusion dates. In the quick sketch, I believe that would be five days for order 14, as the exclusion periods are weekends, other than one random Wednesday.
September 13, 2013 at 4:29 pm
Given the data you posted, the expected output is 13 I hope, as there is no exclusion before the 14th.
Anyway here is a query. I'm fairly sure it can be made more efficient:
CREATE TABLE [dbo].[OrderSample](
[OrderID] [int] ,
[OrderStartDate] [datetime] ,
[OrderEndDate] [datetime]
)
--## Create sample 'order'
INSERT INTO OrderSample VALUES(14,'2013-09-01','2013-09-30')
INSERT INTO OrderSample VALUES(16,'2013-08-01','2013-09-15')
CREATE TABLE [dbo].[ExclusionSample](
[OrderID] [int] ,
[ExclusionStartDate] [datetime] ,
[ExclusionEndDate] [datetime]
)
--## Create sample 'exclusion ranges' - my product does not work these days
INSERT INTO ExclusionSample VALUES(14, '2013-09-14','2013-09-15')
INSERT INTO ExclusionSample VALUES(14, '2013-09-21','2013-09-22')
INSERT INTO ExclusionSample VALUES(14, '2013-09-28','2013-09-29')
INSERT INTO ExclusionSample VALUES(14, '2013-09-25','2013-09-25')
INSERT INTO ExclusionSample VALUES(16, '2013-08-14','2013-08-15')
INSERT INTO ExclusionSample VALUES(16, '2013-08-21','2013-08-22')
INSERT INTO ExclusionSample VALUES(16, '2013-08-28','2013-08-29')
INSERT INTO ExclusionSample VALUES(16, '2013-09-14','2013-09-14')
go
WITH CTE1 AS (
SELECT OS.OrderID, d.thedate,
active = IIF(EXISTS (SELECT *
FROM ExclusionSample ES
WHERE OS.OrderID = ES.OrderID
AND d.thedate BETWEEN ES.ExclusionStartDate AND ExclusionEndDate), 0, 1)
FROM dates d
JOIN OrderSample OS ON d.thedate BETWEEN OS.OrderStartDate AND OrderEndDate
), CTE2 AS (
SELECT OrderID, active,
SUM(active) OVER (PARTITION BY OrderID ORDER BY thedate
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumactive,
row_number() OVER (PARTITION BY OrderID ORDER BY thedate) AS rowno
FROM CTE1
), CTE3 AS (
SELECT OrderID, active, cumactive, rowno,
denserank = dense_rank () OVER (PARTITION BY OrderID ORDER BY cumactive)
FROM CTE2
), CTE4 AS (
SELECT OrderID, rowno - denserank AS grp, COUNT(*) AS cnt
FROM CTE3
WHERE active = 1
GROUP BY OrderID, rowno - denserank
)
SELECT OrderID, MAX(cnt)
FROM CTE4
GROUP BY OrderID
go
DROP TABLE OrderSample
DROP TABLE ExclusionSample
The query includes the table dates, here is a script (not this year's model) to populated it:
-- Make sure it's empty.
TRUNCATE TABLE dates
go
-- Get a temptable with numbers. This is a cheap, but not 100% reliable.
-- Whence the query hint and all the checks.
SELECT TOP 80001 n = IDENTITY(int, 0, 1)
INTO #numbers
FROM sysobjects o1
CROSS JOIN sysobjects o2
CROSS JOIN sysobjects o3
CROSS JOIN sysobjects o4
OPTION (MAXDOP 1)
go
-- Make sure we have unique numbers.
CREATE UNIQUE CLUSTERED INDEX num_ix ON #numbers (n)
go
-- Verify that table does not have gaps.
IF (SELECT COUNT(*) FROM #numbers) = 80001 AND
(SELECT MIN(n) FROM #numbers) = 0 AND
(SELECT MAX(n) FROM #numbers) = 80000
BEGIN
DECLARE @msg varchar(255)
-- Insert the dates:
INSERT dates (thedate)
SELECT dateadd(DAY, n, '19800101')
FROM #numbers
WHERE dateadd(DAY, n, '19800101') < '21500101'
SELECT @msg = 'Inserted ' + ltrim(str(@@rowcount)) + ' rows into #numbers'
PRINT @msg
END
ELSE
RAISERROR('#numbers is not contiguos from 0 to 80001!', 16, -1)
go
DROP TABLE #numbers
[font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]
September 13, 2013 at 5:06 pm
That's correct - I'd expect 13 as the result from the sample data.
This looks like a more distilled, set based version of what I'm doing - exactly what I was after. I'm going to go through this and really check it out, see how it performs up against the larger sample set I have. Very much appreciated - thank you!!
Erland Sommarskog (9/13/2013)
Given the data you posted, the expected output is 13 I hope, as there is no exclusion before the 14th.Anyway here is a query. I'm fairly sure it can be made more efficient:
CREATE TABLE [dbo].[OrderSample](
[OrderID] [int] ,
[OrderStartDate] [datetime] ,
[OrderEndDate] [datetime]
)
--## Create sample 'order'
INSERT INTO OrderSample VALUES(14,'2013-09-01','2013-09-30')
INSERT INTO OrderSample VALUES(16,'2013-08-01','2013-09-15')
CREATE TABLE [dbo].[ExclusionSample](
[OrderID] [int] ,
[ExclusionStartDate] [datetime] ,
[ExclusionEndDate] [datetime]
)
--## Create sample 'exclusion ranges' - my product does not work these days
INSERT INTO ExclusionSample VALUES(14, '2013-09-14','2013-09-15')
INSERT INTO ExclusionSample VALUES(14, '2013-09-21','2013-09-22')
INSERT INTO ExclusionSample VALUES(14, '2013-09-28','2013-09-29')
INSERT INTO ExclusionSample VALUES(14, '2013-09-25','2013-09-25')
INSERT INTO ExclusionSample VALUES(16, '2013-08-14','2013-08-15')
INSERT INTO ExclusionSample VALUES(16, '2013-08-21','2013-08-22')
INSERT INTO ExclusionSample VALUES(16, '2013-08-28','2013-08-29')
INSERT INTO ExclusionSample VALUES(16, '2013-09-14','2013-09-14')
go
WITH CTE1 AS (
SELECT OS.OrderID, d.thedate,
active = IIF(EXISTS (SELECT *
FROM ExclusionSample ES
WHERE OS.OrderID = ES.OrderID
AND d.thedate BETWEEN ES.ExclusionStartDate AND ExclusionEndDate), 0, 1)
FROM dates d
JOIN OrderSample OS ON d.thedate BETWEEN OS.OrderStartDate AND OrderEndDate
), CTE2 AS (
SELECT OrderID, active,
SUM(active) OVER (PARTITION BY OrderID ORDER BY thedate
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumactive,
row_number() OVER (PARTITION BY OrderID ORDER BY thedate) AS rowno
FROM CTE1
), CTE3 AS (
SELECT OrderID, active, cumactive, rowno,
denserank = dense_rank () OVER (PARTITION BY OrderID ORDER BY cumactive)
FROM CTE2
), CTE4 AS (
SELECT OrderID, rowno - denserank AS grp, COUNT(*) AS cnt
FROM CTE3
WHERE active = 1
GROUP BY OrderID, rowno - denserank
)
SELECT OrderID, MAX(cnt)
FROM CTE4
GROUP BY OrderID
go
DROP TABLE OrderSample
DROP TABLE ExclusionSample
The query includes the table dates, here is a script (not this year's model) to populated it:
-- Make sure it's empty.
TRUNCATE TABLE dates
go
-- Get a temptable with numbers. This is a cheap, but not 100% reliable.
-- Whence the query hint and all the checks.
SELECT TOP 80001 n = IDENTITY(int, 0, 1)
INTO #numbers
FROM sysobjects o1
CROSS JOIN sysobjects o2
CROSS JOIN sysobjects o3
CROSS JOIN sysobjects o4
OPTION (MAXDOP 1)
go
-- Make sure we have unique numbers.
CREATE UNIQUE CLUSTERED INDEX num_ix ON #numbers (n)
go
-- Verify that table does not have gaps.
IF (SELECT COUNT(*) FROM #numbers) = 80001 AND
(SELECT MIN(n) FROM #numbers) = 0 AND
(SELECT MAX(n) FROM #numbers) = 80000
BEGIN
DECLARE @msg varchar(255)
-- Insert the dates:
INSERT dates (thedate)
SELECT dateadd(DAY, n, '19800101')
FROM #numbers
WHERE dateadd(DAY, n, '19800101') < '21500101'
SELECT @msg = 'Inserted ' + ltrim(str(@@rowcount)) + ' rows into #numbers'
PRINT @msg
END
ELSE
RAISERROR('#numbers is not contiguos from 0 to 80001!', 16, -1)
go
DROP TABLE #numbers
September 14, 2013 at 2:50 am
I realised that CTE3 is not necessary, so the query can be simplified to:
WITH CTE1 AS (
SELECT OS.OrderID, d.thedate,
active = IIF(EXISTS (SELECT *
FROM ExclusionSample ES
WHERE OS.OrderID = ES.OrderID
AND d.thedate BETWEEN ES.ExclusionStartDate AND ExclusionEndDate), 0, 1)
FROM bos_sommar..dates d
JOIN OrderSample OS ON d.thedate BETWEEN OS.OrderStartDate AND OrderEndDate
), CTE2 AS (
SELECT OrderID, active,
SUM(active) OVER (PARTITION BY OrderID ORDER BY thedate
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumactive,
row_number() OVER (PARTITION BY OrderID ORDER BY thedate) AS rowno
FROM CTE1
), CTE3 AS (
SELECT OrderID, rowno - cumactive AS grp, COUNT(*) AS cnt
FROM CTE2
WHERE active = 1
GROUP BY OrderID, rowno - cumactive
)
SELECT OrderID, MAX(cnt)
FROM CTE3
GROUP BY OrderID
This also removes a sort operation from the query plan.
[font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]
September 14, 2013 at 4:00 pm
Since you are using SS2012, here is a possible solution using the offset function LEAD.
Supposing all exclusions are inside the order interval, the idea is to generate the intervals between exclusions, which is calculated as the [ExclusionEndDate] plus 1 as the next valid [OrderStartDate] and the next [ExclusionStartDate] minus one as the valid [OrderEndDate]. We do not have more exclusions after the last one so we will use the [OrderEndDate] from the [OrderSample] table.
There is still pending how to generate the first valid range since we are looking forward on the exclusions so we are missing the range from the start date of the order till the start date of the first exclusion. Well, I tried combining the table [OrderSample] with the join of this table and the exclusions, then I use the LEAD to find next exclusion.
Excuse me for the formatting because I do not know how to post T-SQL code. I'll ask Erland for help 🙂
SET NOCOUNT ON;
USE tempdb;
GO
CREATE TABLE [dbo].[OrderSample](
[OrderID] [int] ,
[OrderStartDate] [datetime] ,
[OrderEndDate] [datetime]
)
--## Create sample 'order'
INSERT INTO OrderSample VALUES(14,'2013-09-01','2013-09-30')
INSERT INTO OrderSample VALUES(16,'2013-08-01','2013-09-15')
CREATE TABLE [dbo].[ExclusionSample](
[OrderID] [int] ,
[ExclusionStartDate] [datetime] ,
[ExclusionEndDate] [datetime]
)
--## Create sample 'exclusion ranges' - my product does not work these days
INSERT INTO ExclusionSample VALUES(14, '2013-09-14','2013-09-15')
INSERT INTO ExclusionSample VALUES(14, '2013-09-21','2013-09-22')
INSERT INTO ExclusionSample VALUES(14, '2013-09-28','2013-09-29')
INSERT INTO ExclusionSample VALUES(14, '2013-09-25','2013-09-25')
INSERT INTO ExclusionSample VALUES(16, '2013-08-14','2013-08-15')
INSERT INTO ExclusionSample VALUES(16, '2013-08-21','2013-08-22')
INSERT INTO ExclusionSample VALUES(16, '2013-08-28','2013-08-29')
INSERT INTO ExclusionSample VALUES(16, '2013-09-14','2013-09-14')
GO
WITH c1 AS (
SELECT
OrderID,
OrderStartDate,
OrderEndDate,
NULL AS ExclusionStartDate,
NULL AS ExclusionEndDate
FROM
dbo.OrderSample
UNION ALL
SELECT
OS.OrderID,
OS.OrderStartDate,
OS.OrderEndDate,
ES.ExclusionStartDate,
ES.ExclusionEndDate
FROM
dbo.OrderSample AS OS
INNER JOIN
dbo.ExclusionSample AS ES
ON ES.OrderID = OS.OrderID
),
C2 AS (
SELECT
OrderID,
DATEDIFF([day],
ISNULL(DATEADD([day], 1, ExclusionEndDate), OrderStartDate),
ISNULL(DATEADD([day], -1, LEAD(ExclusionStartDate) OVER(PARTITION BY OrderID ORDER BY ExclusionStartDate)), OrderEndDate)
) + 1 AS ConsecValidDates
FROM
C1
)
SELECT
OrderID,
MAX(ConsecValidDates) AS MaxConsecValidDates
FROM
C2
GROUP BY
OrderID;
GO
DROP TABLE dbo.OrderSample, dbo.ExclusionSample;
GO
September 18, 2013 at 7:59 pm
I believe that if you're exclusion date ranges do not overlap, you may also be able to do it like this:
SELECT OrderID,[MaxNonExclusionDays]=MAX([NonExclusionDays])
FROM (
SELECT OrderID, GapStart=MIN(GapDates), GapEnd=MAX(GapDates)
,[NonExclusionDays]=1+DATEDIFF(day, MIN(GapDates), MAX(GapDates))
FROM (
SELECT OrderID, GapDates
,rn=(ROW_NUMBER() OVER (PARTITION BY OrderID ORDER BY GapDates)-1)/2
FROM (
SELECT OrderID, GapDates
FROM OrderSample a
CROSS APPLY (VALUES (OrderStartDate),(OrderEndDate)) b (GapDates)
UNION ALL
SELECT OrderID, GapDates
FROM ExclusionSample b
CROSS APPLY (
VALUES (ExclusionStartDate-1), (ExclusionEndDate+1)) c(GapDates)
) a
) a
GROUP BY OrderID, rn
) a
GROUP BY OrderID;
I've treated the exclusion days (+ order start/end dates) like a set of islands of date ranges, and then converted them to the gaps, ultimately calculating the MAX gap. Similar to (although not exactly like) what I did in this article: The SQL of Gaps and Islands in Sequences[/url]
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply