Best approach for record comparison

  • I've got a problem that I will have to solve in the next few weeks. I have a working solution, but and I'd appreciate views as to the 'ideal' approach.

    Consider a situation where we have a customer as a parent record with multiple, potentially overlapping services of different types as child records. The service records might look something like this:

    CustomerID, ServiceType, StartDate, EndDate

    1234, ServiceA, 2010-05-25, 2011-07-15

    1234, ServiceA, 2011-07-16, 2011-09-18

    1234, ServiceB, 2011-09-19, NULL

    2568, ServiceB, 2011-02-28, 2011-04-06

    2568, ServiceB, 2011-04-07, NULL

    5876, ServiceA, 2009-05-26, 2011-10-26

    5876, ServiceA, 2012-01-07, NULL

    8989, ServiceB, 2010-06-01, 2011-04-05

    8989, ServiceB, 2011-04-06, 2011-04-18

    8989, ServiceB, 2011-05-12, 2012-02-05

    8989, ServiceB, 2012-02-06, NULL

    What I need to do is 'compress' the consecutive and overlapping services of the same type, whilst maintaining different records for different services or where there is a break in the service (of one or more days). Therefore, after this 'compression', the results would look like this:

    CustomerID, ServiceType, StartDate, EndDate

    1234, ServiceA, 2010-05-25, 2011-09-18

    1234, ServiceB, 2011-09-19, NULL

    2568, ServiceB, 2011-02-28, NULL

    5876, ServiceA, 2009-05-26, 2011-10-26

    5876, ServiceA, 2012-01-07, NULL

    8989, ServiceB, 2010-06-01, 2011-04-08

    8989, ServiceB, 2011-05-12, NULL

    I can't simply do a min-start-date and max-end-date for each service, because this will not capture the breaks in service (customers 5876 & 8989). I could do a loop, but this could be very slow because of the quantity of records. I've had success with similar problems by doing cross joins of the table with itself, but this gets complicated quickly.

    What approach might others take, and is there some magical approach other than those mentioned above? Note that I'm not asking someone to do the detailed work, just ideas I can explore.

    Stuart

    P.S. I recognise that given the reporting I need to do, this is an imperfect structure, but there's nothing I can do about it (it's an off-the-shelf package and the report I'm producing is a statutory government return that is rigidly defined).

  • Could you please follow the link at the bottom of my signature to find out how to present this sort of question to the forum. I can promise you: you will get relevant help much faster...

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • feeling helpful today

    CREATE TABLE #TEMP

    (CustomerID INT,

    ServiceType VARCHAR(20),

    StartDate DATE,

    EndDate DATE)

    INSERT INTO #TEMP VALUES (1234, 'ServiceA', '2010-05-25', '2011-07-15')

    INSERT INTO #TEMP VALUES (1234, 'ServiceA', '2011-07-16', '2011-09-18')

    INSERT INTO #TEMP VALUES (1234, 'ServiceB', '2011-09-19', NULL)

    INSERT INTO #TEMP VALUES (2568, 'ServiceB', '2011-02-28', '2011-04-06')

    INSERT INTO #TEMP VALUES (2568, 'ServiceB', '2011-04-07', NULL)

    INSERT INTO #TEMP VALUES (5876, 'ServiceA', '2009-05-26', '2011-10-26')

    INSERT INTO #TEMP VALUES (5876, 'ServiceA', '2012-01-07', NULL)

    INSERT INTO #TEMP VALUES (8989, 'ServiceB', '2010-06-01', '2011-04-05')

    INSERT INTO #TEMP VALUES (8989, 'ServiceB', '2011-04-06', '2011-04-18')

    INSERT INTO #TEMP VALUES (8989, 'ServiceB', '2011-05-12', '2012-02-05')

    INSERT INTO #TEMP VALUES (8989, 'ServiceB', '2012-02-06', NULL)

    select * from #TEMP

  • Using my query from here

    http://www.sqlservercentral.com/Forums/Topic1125847-392-1.aspx

    WITH StartsAndEnds(StartEnd,StartDate,EndDate,CustomerID,ServiceType) AS (

    SELECT 'S' AS StartEnd,

    StartDate,

    DATEADD(day,-1,StartDate),

    CustomerID,

    ServiceType

    FROM #TEMP

    UNION ALL

    SELECT 'E' AS StartEnd,

    ISNULL(DATEADD(day,1,EndDate),'2099-12-31'),

    ISNULL(EndDate,'2099-12-31'),

    CustomerID,

    ServiceType

    FROM #TEMP),

    OrderedStarts AS (

    SELECT StartDate,

    CustomerID,

    ServiceType,

    ROW_NUMBER() OVER(PARTITION BY CustomerID,ServiceType ORDER BY StartDate,StartEnd DESC) AS rnBoth,

    2*(ROW_NUMBER() OVER(PARTITION BY CustomerID,ServiceType,StartEnd ORDER BY StartDate))-1 AS rnStartEnd

    FROM StartsAndEnds),

    OrderedEnds AS (

    SELECT EndDate,

    CustomerID,

    ServiceType,

    ROW_NUMBER() OVER(PARTITION BY CustomerID,ServiceType ORDER BY EndDate DESC,StartEnd) AS rnBothRev,

    2*(ROW_NUMBER() OVER(PARTITION BY CustomerID,ServiceType,StartEnd ORDER BY EndDate DESC))-1 AS rnStartEndRev

    FROM StartsAndEnds),

    Starts AS (

    SELECT StartDate,

    CustomerID,

    ServiceType,

    ROW_NUMBER() OVER(PARTITION BY CustomerID,ServiceType ORDER BY StartDate) AS rn

    FROM OrderedStarts

    WHERE rnBoth=rnStartEnd),

    Ends AS (

    SELECT EndDate,

    CustomerID,

    ServiceType,

    ROW_NUMBER() OVER(PARTITION BY CustomerID,ServiceType ORDER BY EndDate) AS rn

    FROM OrderedEnds

    WHERE rnBothRev=rnStartEndRev)

    SELECT s.CustomerID,s.ServiceType,s.StartDate,

    NULLIF(e.EndDate,'2099-12-31') AS EndDate

    FROM Starts s

    INNER JOIN Ends e ON e.CustomerID=s.CustomerID AND e.ServiceType=s.ServiceType AND e.rn=s.rn AND s.StartDate<=e.EndDate

    ORDER BY s.CustomerID,s.ServiceType,s.StartDate,e.EndDate;

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • @Eugene - point taken, link bookmarked.

    @anthony-2 - thank you very mcuh.

    @mark-3

    Thanks for this - it certainly gives me something to work with. I also followed some links from the thread you originally posted in leading to this article on packed intervals (which I now know is the name of what I'm trying to achieve):

    http://www.solidq.com/sqj/Pages/2011-March-Issue/Packing-Intervals.aspx

    Other work has grabbed my attention for the moment (in that way it does), but I appreciate the responses and will investigate and report back when time permits.

    Thanks

    Stuart

  • As indicated, here is my solution. I've changed the sample data and references from the original post, simply because it's what I ended up working with.

    Sample data:

    CREATE TABLE Contracts (

    seplGuid nvarchar(100),

    refeGuid nvarchar(100),

    ContractType nvarchar(100),

    S_ActualStartDate datetime,

    S_ActualEndDate datetime,S_Status nvarchar(50))

    INSERT INTO Contracts

    VALUES

    ('1', 'RefA', 'Con 1', '2011-03-26 00:00:00', '2011-04-09 00:00:00', 'Authorised'),

    ('2', 'RefA', 'Con 1', '2011-04-10 00:00:00', '2011-04-15 00:00:00', 'Authorised'),

    ('3', 'RefA', 'Con 5', '2011-04-16 00:00:00', NULL, 'Authorised'),

    ('4', 'RefB', 'Con 6', '2009-01-29 00:00:00', '2012-01-26 00:00:00', 'Authorised'),

    ('5', 'RefB', 'Con 8', '2012-01-27 00:00:00', '2012-03-30 00:00:00', 'Authorised'),

    ('6', 'RefC', 'Con 2', '2011-05-02 00:00:00', '2011-05-09 00:00:00', 'Authorised'),

    ('7', 'RefC', 'Con 2', '2011-08-12 00:00:00', '2011-08-17 00:00:00', 'Authorised'),

    ('8', 'RefC', 'Con 6', '2012-01-08 00:00:00', '2012-02-09 00:00:00', 'Authorised'),

    ('9', 'RefC', 'Con 6', '2012-02-10 00:00:00', NULL, 'Draft'),

    ('10', 'RefD', 'Con 5', '2011-06-06 00:00:00', '2011-11-15 00:00:00', 'Authorised'),

    ('11', 'RefD', 'Con 8', '2011-12-02 00:00:00', NULL, 'Authorised'),

    ('12', 'RefE', 'Con 5', '2010-05-12 00:00:00', '2011-05-12 00:00:00', 'Authorised'),

    ('13', 'RefE', 'Con 6', '2011-05-13 00:00:00', '2012-03-19 00:00:00', 'Authorised'),

    ('14', 'RefF', 'Con 6', '2011-05-15 00:00:00', '2012-01-23 00:00:00', 'Authorised'),

    ('15', 'RefF', 'Con 8', '2012-01-24 00:00:00', '2012-03-26 00:00:00', 'Authorised'),

    ('16', 'RefG', 'Con 6', '2011-06-25 00:00:00', '2011-06-27 00:00:00', 'Authorised'),

    ('17', 'RefG', 'Con 6', '2011-06-30 00:00:00', '2011-07-02 00:00:00', 'Authorised'),

    ('18', 'RefG', 'Con 6', '2011-07-05 00:00:00', '2011-07-15 00:00:00', 'Authorised'),

    ('19', 'RefG', 'Con 6', '2011-07-16 00:00:00', '2012-02-01 00:00:00', 'Authorised'),

    ('20', 'RefG', 'Con 6', '2012-02-14 00:00:00', NULL, 'Authorised')

    Query to process data - I've included some manipulation for NULLs and to allow for a 1-day gap between change in contracts:

    /*

    This codes actions the 'packed interval' principal of compressing services spred accross consecutive records into a single record. More details about this principal can be found in the supporting documentation.

    */

    -- Perp the service data to replace null values & add 1 day to the end dates (to ensure that gaps of up to one day between services are ignored - this will have to be removed at the end to return the original date).

    WITH PackedContrats0 AS

    (

    SELECT

    seplGuid,

    refeGuid,

    ContractType,

    ISNULL(S_ActualStartDate, CONVERT(DATETIME, '2099-12-31 00:00:00', 102)) AS S_ActualStartDate,

    CASE WHEN S_ActualEndDate IS NULL THEN CONVERT(DATETIME, '2099-12-31 00:00:00', 102) ELSE DATEADD(DAY, + 1, S_ActualEndDate) END AS S_ActualEndDate

    FROM Coventry.Contracts

    WHERES_Status = N'Authorised'

    ),

    -- Put the start and end dates into the same table, flagging them as either start or end dates with an 'AddSub' value of +1 or -1 - these will be used for summing up the ordinals later.

    PackedContrats1 AS

    (

    -- Get start dates

    SELECT

    seplGuid,

    refeGuid,

    ContractType,

    S_ActualStartDate AS ServiceDate,

    +1 AS AddSub,

    ROW_NUMBER() OVER(PARTITION BY refeGuid, ContractType ORDER BY S_ActualStartDate, seplGuid) AS Start,

    NULL AS [End]

    FROMPackedContrats0

    UNION ALL

    -- Get end dates

    SELECT

    seplGuid,

    refeGuid,

    ContractType,

    S_ActualEndDate AS ServiceDate,

    -1 AS AddSub,

    NULL AS Start,

    ROW_NUMBER() OVER(PARTITION BY refeGuid, ContractType ORDER BY S_ActualEndDate, seplGuid) AS [End]

    FROMPackedContrats0

    ),

    -- Create a running total for the start and end ordinals in the field 'StartEnd'.

    PackedContrats2 AS

    (

    SELECT

    PackedContrats1.*,

    ROW_NUMBER() OVER(PARTITION BY refeGuid, ContractType ORDER BY ServiceDate, AddSub DESC, seplGuid) AS StartEnd

    FROM PackedContrats1

    ),

    -- This next step does the hard work.

    /*

    For start events, the expression 'Start - (StartEnd - Start) - 1' represents how many sessions were active just before the current (hence - 1).

    For end events, the expression "(StartEnd - End) - End" represents how many sessions are active right after this one.

    Filter only events when a group of packed intervals either starts or ends (i.e. = 0).

    Group each pair of adjacent start/end events by creating row number, dividing by two and rounding down.

    */

    PackedContrats3 AS

    (

    SELECT

    refeGuid,

    ContractType,

    ServiceDate,

    FLOOR((ROW_NUMBER() OVER(PARTITION BY refeGuid, ContractType ORDER BY ServiceDate) -1) / 2+1) AS PairID

    FROM PackedContrats2

    WHERE (Start - (StartEnd - Start) -1 = 0) OR ((StartEnd - [End]) - [End] = 0)

    ),

    -- Turn the result back into a single row

    PackedContrats4 AS

    (

    SELECT

    refeGuid,

    ContractType,

    MIN(ServiceDate) AS StartDate,

    MAX(ServiceDate) AS EndDate

    FROM PackedContrats3

    GROUP BY refeGuid, ContractType, PairID

    )

    -- Fix the dates back to their original state (i.e. remove +1 day from start and re-ntroduce NULL values).

    SELECT

    refeGuid,

    ContractType,

    StartDate AS SP_ActualStartDate,

    CASE WHEN EndDate = CONVERT(DATETIME, '2099-12-31 00:00:00', 102) THEN NULL ELSE DATEADD(DAY, -1, EndDate) END AS SP_ActualEndDate

    FROM PackedContrats4

    Result:

    CREATE TABLE Result (

    refeGuid nvarchar(100),

    ContractType nvarchar(100),

    SP_ActualStartDate datetime,

    SP_ActualEndDate datetime)

    INSERT INTO Result

    VALUES

    ('RefA', 'Con 1', '2011-03-26 00:00:00', '2011-04-15 00:00:00'),

    ('RefA', 'Con 5', '2011-04-16 00:00:00', NULL),

    ('RefB', 'Con 6', '2009-01-29 00:00:00', '2012-01-26 00:00:00'),

    ('RefB', 'Con 8', '2012-01-27 00:00:00', '2012-03-30 00:00:00'),

    ('RefC', 'Con 2', '2011-05-02 00:00:00', '2011-05-09 00:00:00'),

    ('RefC', 'Con 2', '2011-08-12 00:00:00', '2011-08-17 00:00:00'),

    ('RefC', 'Con 6', '2012-01-08 00:00:00', '2012-02-09 00:00:00'),

    ('RefD', 'Con 5', '2011-06-06 00:00:00', '2011-11-15 00:00:00'),

    ('RefD', 'Con 8', '2011-12-02 00:00:00', NULL),

    ('RefE', 'Con 5', '2010-05-12 00:00:00', '2011-05-12 00:00:00'),

    ('RefE', 'Con 6', '2011-05-13 00:00:00', '2012-03-19 00:00:00'),

    ('RefF', 'Con 6', '2011-05-15 00:00:00', '2012-01-23 00:00:00'),

    ('RefF', 'Con 8', '2012-01-24 00:00:00', '2012-03-26 00:00:00'),

    ('RefG', 'Con 6', '2011-06-25 00:00:00', '2011-06-27 00:00:00'),

    ('RefG', 'Con 6', '2011-06-30 00:00:00', '2011-07-02 00:00:00'),

    ('RefG', 'Con 6', '2011-07-05 00:00:00', '2012-02-01 00:00:00'),

    ('RefG', 'Con 6', '2012-02-14 00:00:00', NULL)

    Although this has got me some way towards my goal, I have further yet to travel. As it's a different issue, I'll raise it in a separate thread.

    Thanks for responses - they really helped.

    Stuart

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply