February 21, 2012 at 5:07 am
I've got a problem that I will have to solve in the next few weeks. I have a working solution, but and I'd appreciate views as to the 'ideal' approach.
Consider a situation where we have a customer as a parent record with multiple, potentially overlapping services of different types as child records. The service records might look something like this:
CustomerID, ServiceType, StartDate, EndDate
1234, ServiceA, 2010-05-25, 2011-07-15
1234, ServiceA, 2011-07-16, 2011-09-18
1234, ServiceB, 2011-09-19, NULL
2568, ServiceB, 2011-02-28, 2011-04-06
2568, ServiceB, 2011-04-07, NULL
5876, ServiceA, 2009-05-26, 2011-10-26
5876, ServiceA, 2012-01-07, NULL
8989, ServiceB, 2010-06-01, 2011-04-05
8989, ServiceB, 2011-04-06, 2011-04-18
8989, ServiceB, 2011-05-12, 2012-02-05
8989, ServiceB, 2012-02-06, NULL
What I need to do is 'compress' the consecutive and overlapping services of the same type, whilst maintaining different records for different services or where there is a break in the service (of one or more days). Therefore, after this 'compression', the results would look like this:
CustomerID, ServiceType, StartDate, EndDate
1234, ServiceA, 2010-05-25, 2011-09-18
1234, ServiceB, 2011-09-19, NULL
2568, ServiceB, 2011-02-28, NULL
5876, ServiceA, 2009-05-26, 2011-10-26
5876, ServiceA, 2012-01-07, NULL
8989, ServiceB, 2010-06-01, 2011-04-08
8989, ServiceB, 2011-05-12, NULL
I can't simply do a min-start-date and max-end-date for each service, because this will not capture the breaks in service (customers 5876 & 8989). I could do a loop, but this could be very slow because of the quantity of records. I've had success with similar problems by doing cross joins of the table with itself, but this gets complicated quickly.
What approach might others take, and is there some magical approach other than those mentioned above? Note that I'm not asking someone to do the detailed work, just ideas I can explore.
Stuart
P.S. I recognise that given the reporting I need to do, this is an imperfect structure, but there's nothing I can do about it (it's an off-the-shelf package and the report I'm producing is a statutory government return that is rigidly defined).
February 21, 2012 at 5:13 am
Could you please follow the link at the bottom of my signature to find out how to present this sort of question to the forum. I can promise you: you will get relevant help much faster...
February 21, 2012 at 5:19 am
feeling helpful today
CREATE TABLE #TEMP
(CustomerID INT,
ServiceType VARCHAR(20),
StartDate DATE,
EndDate DATE)
INSERT INTO #TEMP VALUES (1234, 'ServiceA', '2010-05-25', '2011-07-15')
INSERT INTO #TEMP VALUES (1234, 'ServiceA', '2011-07-16', '2011-09-18')
INSERT INTO #TEMP VALUES (1234, 'ServiceB', '2011-09-19', NULL)
INSERT INTO #TEMP VALUES (2568, 'ServiceB', '2011-02-28', '2011-04-06')
INSERT INTO #TEMP VALUES (2568, 'ServiceB', '2011-04-07', NULL)
INSERT INTO #TEMP VALUES (5876, 'ServiceA', '2009-05-26', '2011-10-26')
INSERT INTO #TEMP VALUES (5876, 'ServiceA', '2012-01-07', NULL)
INSERT INTO #TEMP VALUES (8989, 'ServiceB', '2010-06-01', '2011-04-05')
INSERT INTO #TEMP VALUES (8989, 'ServiceB', '2011-04-06', '2011-04-18')
INSERT INTO #TEMP VALUES (8989, 'ServiceB', '2011-05-12', '2012-02-05')
INSERT INTO #TEMP VALUES (8989, 'ServiceB', '2012-02-06', NULL)
select * from #TEMP
February 21, 2012 at 5:37 am
Using my query from here
http://www.sqlservercentral.com/Forums/Topic1125847-392-1.aspx
WITH StartsAndEnds(StartEnd,StartDate,EndDate,CustomerID,ServiceType) AS (
SELECT 'S' AS StartEnd,
StartDate,
DATEADD(day,-1,StartDate),
CustomerID,
ServiceType
FROM #TEMP
UNION ALL
SELECT 'E' AS StartEnd,
ISNULL(DATEADD(day,1,EndDate),'2099-12-31'),
ISNULL(EndDate,'2099-12-31'),
CustomerID,
ServiceType
FROM #TEMP),
OrderedStarts AS (
SELECT StartDate,
CustomerID,
ServiceType,
ROW_NUMBER() OVER(PARTITION BY CustomerID,ServiceType ORDER BY StartDate,StartEnd DESC) AS rnBoth,
2*(ROW_NUMBER() OVER(PARTITION BY CustomerID,ServiceType,StartEnd ORDER BY StartDate))-1 AS rnStartEnd
FROM StartsAndEnds),
OrderedEnds AS (
SELECT EndDate,
CustomerID,
ServiceType,
ROW_NUMBER() OVER(PARTITION BY CustomerID,ServiceType ORDER BY EndDate DESC,StartEnd) AS rnBothRev,
2*(ROW_NUMBER() OVER(PARTITION BY CustomerID,ServiceType,StartEnd ORDER BY EndDate DESC))-1 AS rnStartEndRev
FROM StartsAndEnds),
Starts AS (
SELECT StartDate,
CustomerID,
ServiceType,
ROW_NUMBER() OVER(PARTITION BY CustomerID,ServiceType ORDER BY StartDate) AS rn
FROM OrderedStarts
WHERE rnBoth=rnStartEnd),
Ends AS (
SELECT EndDate,
CustomerID,
ServiceType,
ROW_NUMBER() OVER(PARTITION BY CustomerID,ServiceType ORDER BY EndDate) AS rn
FROM OrderedEnds
WHERE rnBothRev=rnStartEndRev)
SELECT s.CustomerID,s.ServiceType,s.StartDate,
NULLIF(e.EndDate,'2099-12-31') AS EndDate
FROM Starts s
INNER JOIN Ends e ON e.CustomerID=s.CustomerID AND e.ServiceType=s.ServiceType AND e.rn=s.rn AND s.StartDate<=e.EndDate
ORDER BY s.CustomerID,s.ServiceType,s.StartDate,e.EndDate;
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537February 22, 2012 at 2:03 am
@Eugene - point taken, link bookmarked.
@anthony-2 - thank you very mcuh.
Thanks for this - it certainly gives me something to work with. I also followed some links from the thread you originally posted in leading to this article on packed intervals (which I now know is the name of what I'm trying to achieve):
http://www.solidq.com/sqj/Pages/2011-March-Issue/Packing-Intervals.aspx
Other work has grabbed my attention for the moment (in that way it does), but I appreciate the responses and will investigate and report back when time permits.
Thanks
Stuart
March 20, 2012 at 11:07 am
As indicated, here is my solution. I've changed the sample data and references from the original post, simply because it's what I ended up working with.
Sample data:
CREATE TABLE Contracts (
seplGuid nvarchar(100),
refeGuid nvarchar(100),
ContractType nvarchar(100),
S_ActualStartDate datetime,
S_ActualEndDate datetime,S_Status nvarchar(50))
INSERT INTO Contracts
VALUES
('1', 'RefA', 'Con 1', '2011-03-26 00:00:00', '2011-04-09 00:00:00', 'Authorised'),
('2', 'RefA', 'Con 1', '2011-04-10 00:00:00', '2011-04-15 00:00:00', 'Authorised'),
('3', 'RefA', 'Con 5', '2011-04-16 00:00:00', NULL, 'Authorised'),
('4', 'RefB', 'Con 6', '2009-01-29 00:00:00', '2012-01-26 00:00:00', 'Authorised'),
('5', 'RefB', 'Con 8', '2012-01-27 00:00:00', '2012-03-30 00:00:00', 'Authorised'),
('6', 'RefC', 'Con 2', '2011-05-02 00:00:00', '2011-05-09 00:00:00', 'Authorised'),
('7', 'RefC', 'Con 2', '2011-08-12 00:00:00', '2011-08-17 00:00:00', 'Authorised'),
('8', 'RefC', 'Con 6', '2012-01-08 00:00:00', '2012-02-09 00:00:00', 'Authorised'),
('9', 'RefC', 'Con 6', '2012-02-10 00:00:00', NULL, 'Draft'),
('10', 'RefD', 'Con 5', '2011-06-06 00:00:00', '2011-11-15 00:00:00', 'Authorised'),
('11', 'RefD', 'Con 8', '2011-12-02 00:00:00', NULL, 'Authorised'),
('12', 'RefE', 'Con 5', '2010-05-12 00:00:00', '2011-05-12 00:00:00', 'Authorised'),
('13', 'RefE', 'Con 6', '2011-05-13 00:00:00', '2012-03-19 00:00:00', 'Authorised'),
('14', 'RefF', 'Con 6', '2011-05-15 00:00:00', '2012-01-23 00:00:00', 'Authorised'),
('15', 'RefF', 'Con 8', '2012-01-24 00:00:00', '2012-03-26 00:00:00', 'Authorised'),
('16', 'RefG', 'Con 6', '2011-06-25 00:00:00', '2011-06-27 00:00:00', 'Authorised'),
('17', 'RefG', 'Con 6', '2011-06-30 00:00:00', '2011-07-02 00:00:00', 'Authorised'),
('18', 'RefG', 'Con 6', '2011-07-05 00:00:00', '2011-07-15 00:00:00', 'Authorised'),
('19', 'RefG', 'Con 6', '2011-07-16 00:00:00', '2012-02-01 00:00:00', 'Authorised'),
('20', 'RefG', 'Con 6', '2012-02-14 00:00:00', NULL, 'Authorised')
Query to process data - I've included some manipulation for NULLs and to allow for a 1-day gap between change in contracts:
/*
This codes actions the 'packed interval' principal of compressing services spred accross consecutive records into a single record. More details about this principal can be found in the supporting documentation.
*/
-- Perp the service data to replace null values & add 1 day to the end dates (to ensure that gaps of up to one day between services are ignored - this will have to be removed at the end to return the original date).
WITH PackedContrats0 AS
(
SELECT
seplGuid,
refeGuid,
ContractType,
ISNULL(S_ActualStartDate, CONVERT(DATETIME, '2099-12-31 00:00:00', 102)) AS S_ActualStartDate,
CASE WHEN S_ActualEndDate IS NULL THEN CONVERT(DATETIME, '2099-12-31 00:00:00', 102) ELSE DATEADD(DAY, + 1, S_ActualEndDate) END AS S_ActualEndDate
FROM Coventry.Contracts
WHERES_Status = N'Authorised'
),
-- Put the start and end dates into the same table, flagging them as either start or end dates with an 'AddSub' value of +1 or -1 - these will be used for summing up the ordinals later.
PackedContrats1 AS
(
-- Get start dates
SELECT
seplGuid,
refeGuid,
ContractType,
S_ActualStartDate AS ServiceDate,
+1 AS AddSub,
ROW_NUMBER() OVER(PARTITION BY refeGuid, ContractType ORDER BY S_ActualStartDate, seplGuid) AS Start,
NULL AS [End]
FROMPackedContrats0
UNION ALL
-- Get end dates
SELECT
seplGuid,
refeGuid,
ContractType,
S_ActualEndDate AS ServiceDate,
-1 AS AddSub,
NULL AS Start,
ROW_NUMBER() OVER(PARTITION BY refeGuid, ContractType ORDER BY S_ActualEndDate, seplGuid) AS [End]
FROMPackedContrats0
),
-- Create a running total for the start and end ordinals in the field 'StartEnd'.
PackedContrats2 AS
(
SELECT
PackedContrats1.*,
ROW_NUMBER() OVER(PARTITION BY refeGuid, ContractType ORDER BY ServiceDate, AddSub DESC, seplGuid) AS StartEnd
FROM PackedContrats1
),
-- This next step does the hard work.
/*
For start events, the expression 'Start - (StartEnd - Start) - 1' represents how many sessions were active just before the current (hence - 1).
For end events, the expression "(StartEnd - End) - End" represents how many sessions are active right after this one.
Filter only events when a group of packed intervals either starts or ends (i.e. = 0).
Group each pair of adjacent start/end events by creating row number, dividing by two and rounding down.
*/
PackedContrats3 AS
(
SELECT
refeGuid,
ContractType,
ServiceDate,
FLOOR((ROW_NUMBER() OVER(PARTITION BY refeGuid, ContractType ORDER BY ServiceDate) -1) / 2+1) AS PairID
FROM PackedContrats2
WHERE (Start - (StartEnd - Start) -1 = 0) OR ((StartEnd - [End]) - [End] = 0)
),
-- Turn the result back into a single row
PackedContrats4 AS
(
SELECT
refeGuid,
ContractType,
MIN(ServiceDate) AS StartDate,
MAX(ServiceDate) AS EndDate
FROM PackedContrats3
GROUP BY refeGuid, ContractType, PairID
)
-- Fix the dates back to their original state (i.e. remove +1 day from start and re-ntroduce NULL values).
SELECT
refeGuid,
ContractType,
StartDate AS SP_ActualStartDate,
CASE WHEN EndDate = CONVERT(DATETIME, '2099-12-31 00:00:00', 102) THEN NULL ELSE DATEADD(DAY, -1, EndDate) END AS SP_ActualEndDate
FROM PackedContrats4
Result:
CREATE TABLE Result (
refeGuid nvarchar(100),
ContractType nvarchar(100),
SP_ActualStartDate datetime,
SP_ActualEndDate datetime)
INSERT INTO Result
VALUES
('RefA', 'Con 1', '2011-03-26 00:00:00', '2011-04-15 00:00:00'),
('RefA', 'Con 5', '2011-04-16 00:00:00', NULL),
('RefB', 'Con 6', '2009-01-29 00:00:00', '2012-01-26 00:00:00'),
('RefB', 'Con 8', '2012-01-27 00:00:00', '2012-03-30 00:00:00'),
('RefC', 'Con 2', '2011-05-02 00:00:00', '2011-05-09 00:00:00'),
('RefC', 'Con 2', '2011-08-12 00:00:00', '2011-08-17 00:00:00'),
('RefC', 'Con 6', '2012-01-08 00:00:00', '2012-02-09 00:00:00'),
('RefD', 'Con 5', '2011-06-06 00:00:00', '2011-11-15 00:00:00'),
('RefD', 'Con 8', '2011-12-02 00:00:00', NULL),
('RefE', 'Con 5', '2010-05-12 00:00:00', '2011-05-12 00:00:00'),
('RefE', 'Con 6', '2011-05-13 00:00:00', '2012-03-19 00:00:00'),
('RefF', 'Con 6', '2011-05-15 00:00:00', '2012-01-23 00:00:00'),
('RefF', 'Con 8', '2012-01-24 00:00:00', '2012-03-26 00:00:00'),
('RefG', 'Con 6', '2011-06-25 00:00:00', '2011-06-27 00:00:00'),
('RefG', 'Con 6', '2011-06-30 00:00:00', '2011-07-02 00:00:00'),
('RefG', 'Con 6', '2011-07-05 00:00:00', '2012-02-01 00:00:00'),
('RefG', 'Con 6', '2012-02-14 00:00:00', NULL)
Although this has got me some way towards my goal, I have further yet to travel. As it's a different issue, I'll raise it in a separate thread.
Thanks for responses - they really helped.
Stuart
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply