Merging date ranges

  • Hi

    This may be a version of a gap and island type of problem...

    I would like to merge records for a each user based on start and end dates and sum a column (amount) for that user.

    The rules for merging rows are:

    1) Merge where the start and end dates of a row are between the start and end dates of another row

    2) Merge where the start date of one row is the next day of another row

    ie the end date for the first row for user 1111 is the 2010-08-07 and the following row is the 2010-08-08

    Here is the test data

    IF OBJECT_ID('tempdb..#user') IS NOT NULL

    DROP TABLE #user;

    CREATE TABLE #user

    (

    user_id INT

    , start_date DATETIME

    , end_date DATETIME

    , amount DECIMAL(10, 2)

    );

    --TEST DATA

    INSERT INTO #user SELECT 1111 ,'2010-08-06', '2010-08-07', 0.84;

    INSERT INTO #user SELECT 1111 ,'2010-08-08', '2010-08-08', 4.73;

    INSERT INTO #user SELECT 1111 ,'2010-08-09', '2010-08-09', 0.93;

    INSERT INTO #user SELECT 1111 ,'2010-08-27', '2010-08-27', 2.40;

    INSERT INTO #user SELECT 2222 ,'2010-08-09', '2010-08-09', 0.33;

    INSERT INTO #user SELECT 2222 ,'2010-08-29', '2010-08-30', 2.10;

    INSERT INTO #user SELECT 3333 ,'2010-08-08', '2010-08-08', 1.13;

    INSERT INTO #user SELECT 3333 ,'2010-08-09', '2010-08-09', 0.73;

    Here is the desired output from the dataset

    --user_id start_date end_date amount

    --1111 2010-08-06 2010-08-09 6.50

    --1111 2010-08-27 2010-08-27 2.40

    --2222 2010-08-09 2010-08-09 0.33

    --2222 2010-08-29 2010-08-30 2.10

    --3333 2010-08-08 2010-08-09 1.86

    I would appreciate any assistance in terms of code and or links...

    Cheers

  • OK....

    I managed to find a solution... however it doesn't seem very elegant to me....

    I would welcome any input to make the query more concise or more efficient.

    WITH cteContiguous

    AS

    (

    --define the date range for contiguous canidates

    SELECT

    t.user_id,

    MIN(t.start_date) AS start_date,

    MAX(t2.end_date) AS end_date

    FROM #user t

    JOIN #user t2

    ON t.user_id = t2.user_id

    AND t.end_date + 1 = t2.start_date

    WHERE t2.user_id IS NOT NULL

    GROUP BY t.user_id

    ), cteBase

    AS

    (

    SELECT

    t.user_id,

    t.start_date,

    t.end_date

    FROM #user t

    --exclude contiguous rows returned in cte

    WHERE NOT EXISTS

    (

    SELECT

    user_id,

    start_date,

    end_date

    FROM cteContiguous

    WHERE cteContiguous.user_id = t.user_id

    AND t.start_date BETWEEN cteContiguous.start_date AND cteContiguous.end_date

    )

    )

    SELECT

    cteContiguous.user_id,

    cteContiguous.start_date,

    cteContiguous.end_date,

    SUM(t.amount) AS amount

    FROM cteContiguous

    JOIN #user t

    ON t.user_id = cteContiguous.user_id

    AND t.start_date BETWEEN cteContiguous.start_date AND cteContiguous.end_date

    GROUP BY

    cteContiguous.user_id,

    cteContiguous.start_date,

    cteContiguous.end_date

    UNION ALL

    SELECT

    cteBase.user_id,

    cteBase.start_date,

    cteBase.end_date,

    SUM(t.amount) AS amount

    FROM cteBase

    JOIN #user t

    ON t.user_id = cteBase.user_id

    AND t.start_date BETWEEN cteBase.start_date AND cteBase.end_date

    GROUP BY

    cteBase.user_id,

    cteBase.start_date,

    cteBase.end_date

    ORDER BY

    user_id,

    start_date,

    end_date

  • You might want to have a look at this blog. It will demonstrate an alternative solution that most probably will perform better than your current approach.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Go here and drop down the Publish Date list and get a copy of the March SolidQ Journal. Read Itzik Ben-Gan's Packing Intervals article. STUNNINGLY efficient solution for what you need. He was able to collapse 5M rows of hot data in just THREE SECONDS!

    http://www.solidq.com/sqj/Pages/Home.aspx

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Thanks Lutz and Kevin...

    This is a double WIN for me... not only did you point me to the exact solution I was looking for

    but you also introduced me to SolidQ. Their Journal is going to be another invaluable SQL Server resource

    and provide me with alot of interesting reading for this weekend.

    Cheers

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply