Summarize data over contiguous date ranges

  • I could use some help to achieve the below results

    Suppose I have a table of Start/End dates and values as below:

    IF OBJECT_ID('tempdb..#_tempData', 'U') IS NOT NULL

    DROP TABLE #_tempData

    CREATE TABLE #_tempData (

    ID INT

    ,StartDate DATE

    ,EndDate DATE

    ,Value1 INT

    ,Value2 INT

    );

    INSERT INTO #_tempData

    VALUES

    (1,'2010-10-01','2010-10-31',0,1000)

    ,(1,'2010-10-01','2011-09-30',1000,0)

    ,(1,'2011-05-01','9999-12-31',100,0)

    ,(1,'2011-10-01','9999-12-31',2000,0)

    ,(1,'2011-12-01','2011-12-31',0,1000)

    ,(2,'2012-01-01','2012-01-31',0,2000)

    ,(2,'2012-01-01','2012-12-31',2000,0)

    ,(2,'2013-01-01','9999-12-31',3000,0);

    SELECT * FROM #_tempData;

    And I want to consolidate the data in the table so that any overlapping date periods are consolidated into one row and the value columns are aggregated. My desired results:

    SELECT ID

    ,StartDate

    ,EndDate

    ,Value1

    ,Value2

    FROM (VALUES (1,'2010-10-01', '2010-10-31', 1000, 1000)

    ,(1,'2010-11-01', '2011-04-30', 1000, 0)

    ,(1,'2011-05-01', '2011-09-30', 1100, 0)

    ,(1,'2011-10-01', '2011-11-30', 2100, 0)

    ,(1,'2011-12-01', '2011-12-31', 2100, 1000)

    ,(1,'2012-01-01', '9999-12-31', 2100, 0)

    ,(2,'2012-01-01', '2012-01-31', 2000, 2000)

    ,(2,'2012-02-01', '2012-12-31', 2000, 0)

    ,(2,'2013-01-01', '9999-12-31', 3000, 0)

    ) DesiredResults (ID, StartDate, EndDate, Value1, Value2)

    Can anyone write a nice query to do it?

    Notes

    1) Using SQL Server 2012

    2) Non-overlapping date periods should not be consolidated.

    3) The Value columns need to be aggregated at ID level

    Thanks very much!

  • Perhaps you can explain how you expect to "condense" overlapping periods in ID=1 from 5 rows to 6 rows (sample data vs. expected results)?


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • I guess "condensing" is probably not the right term. I guess resolving overlapping dates while accurately summarizing the metrics (value1, value2) is probably a better description.

    The problem is I have multiple number of overlapping dates for an ID which needs to be resolved. So depending on the data we could end up with more rows than the input.

    Hope that helps!!

  • Then how specifically does 2011-04-30 come to be an end date in the expected results?


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Having looked at your question again, I think I see what you're trying to do.

    Not sure how well this will perform but for what it's worth, you can try this:

    DECLARE @Days INT

    ,@MinDate DATE

    ,@MaxDate DATE;

    SELECT @Days=DATEDIFF(day, MIN([Date]), MAX([Date]))

    ,@MinDate=MIN([Date])

    ,@MaxDate=DATEADD(day, 1, MAX([Date]))

    FROM #_tempData a

    CROSS APPLY

    (

    VALUES (StartDate), (EndDate)

    ) b ([Date])

    WHERE [Date] <> '9999-12-31';

    WITH CalendarTable (d) AS

    (

    SELECT DATEADD(day, n, @MinDate)

    FROM

    (

    SELECT 0 UNION ALL

    SELECT TOP (@Days) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns a CROSS JOIN sys.all_columns b

    ) a (n)

    )

    SELECT ID, StartDate=MIN(d)

    ,EndDate=CASE MAX(d) WHEN DATEADD(day, -1, @MaxDate) THEN '9999-12-31' ELSE MAX(d) END

    ,Value1=MIN(Value1), Value2=MAX(Value2)

    FROM

    (

    SELECT ID, d, Value1, Value2

    ,x=SUM(x) OVER (PARTITION BY ID ORDER BY d ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

    FROM

    (

    SELECT ID, d

    ,Value1, Value2

    ,x=CASE WHEN Value1=LAG(Value1, 1, Value1) OVER (PARTITION BY ID ORDER BY d) AND

    Value2=LAG(Value2, 1, Value2) OVER (PARTITION BY ID ORDER BY d) THEN 0

    ELSE 1 END

    FROM

    (

    SELECT ID, Value1=SUM(Value1), Value2=SUM(Value2), d

    FROM #_tempData a

    CROSS APPLY

    (

    SELECT d

    FROM CalendarTable

    WHERE d BETWEEN StartDate AND CASE EndDate WHEN '9999-12-31' THEN @MaxDate ELSE EndDate END

    ) b

    GROUP BY ID, d

    ) a

    ) a

    ) a

    GROUP BY ID, x

    ORDER BY ID, StartDate;


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply