Merging date ranges

Question

Post reply

Merging date ranges

pYak

Ten Centuries

Points: 1027
More actions
May 30, 2011 at 4:29 pm

#251825

Hi
This may be a version of a gap and island type of problem...
I would like to merge records for a each user based on start and end dates and sum a column (amount) for that user.
The rules for merging rows are:
1) Merge where the start and end dates of a row are between the start and end dates of another row
2) Merge where the start date of one row is the next day of another row
ie the end date for the first row for user 1111 is the 2010-08-07 and the following row is the 2010-08-08
Here is the test data
IF OBJECT_ID('tempdb..#user') IS NOT NULL
DROP TABLE #user;
CREATE TABLE #user
(
user_id INT
, start_date DATETIME
, end_date DATETIME
, amount DECIMAL(10, 2)
);
--TEST DATA
INSERT INTO #user SELECT 1111 ,'2010-08-06', '2010-08-07', 0.84;
INSERT INTO #user SELECT 1111 ,'2010-08-08', '2010-08-08', 4.73;
INSERT INTO #user SELECT 1111 ,'2010-08-09', '2010-08-09', 0.93;
INSERT INTO #user SELECT 1111 ,'2010-08-27', '2010-08-27', 2.40;
INSERT INTO #user SELECT 2222 ,'2010-08-09', '2010-08-09', 0.33;
INSERT INTO #user SELECT 2222 ,'2010-08-29', '2010-08-30', 2.10;
INSERT INTO #user SELECT 3333 ,'2010-08-08', '2010-08-08', 1.13;
INSERT INTO #user SELECT 3333 ,'2010-08-09', '2010-08-09', 0.73;
Here is the desired output from the dataset
--user_id start_date end_date amount
--1111 2010-08-06 2010-08-09 6.50
--1111 2010-08-27 2010-08-27 2.40
--2222 2010-08-09 2010-08-09 0.33
--2222 2010-08-29 2010-08-30 2.10
--3333 2010-08-08 2010-08-09 1.86
I would appreciate any assistance in terms of code and or links...
Cheers

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

pYak Ten Centuries Points: 1027 More actions · Answer 1

OK....

I managed to find a solution... however it doesn't seem very elegant to me....

I would welcome any input to make the query more concise or more efficient.

WITH cteContiguous

AS

(

--define the date range for contiguous canidates

SELECT

t.user_id,

MIN(t.start_date) AS start_date,

MAX(t2.end_date) AS end_date

FROM #user t

JOIN #user t2

ON t.user_id = t2.user_id

AND t.end_date + 1 = t2.start_date

WHERE t2.user_id IS NOT NULL

GROUP BY t.user_id

), cteBase

AS

(

SELECT

t.user_id,

t.start_date,

t.end_date

FROM #user t

--exclude contiguous rows returned in cte

WHERE NOT EXISTS

(

SELECT

user_id,

start_date,

end_date

FROM cteContiguous

WHERE cteContiguous.user_id = t.user_id

AND t.start_date BETWEEN cteContiguous.start_date AND cteContiguous.end_date

)

SELECT

cteContiguous.user_id,

cteContiguous.start_date,

cteContiguous.end_date,

SUM(t.amount) AS amount

FROM cteContiguous

JOIN #user t

ON t.user_id = cteContiguous.user_id

AND t.start_date BETWEEN cteContiguous.start_date AND cteContiguous.end_date

GROUP BY

cteContiguous.user_id,

cteContiguous.start_date,

cteContiguous.end_date

UNION ALL

SELECT

cteBase.user_id,

cteBase.start_date,

cteBase.end_date,

SUM(t.amount) AS amount

FROM cteBase

JOIN #user t

ON t.user_id = cteBase.user_id

AND t.start_date BETWEEN cteBase.start_date AND cteBase.end_date

GROUP BY

cteBase.user_id,

cteBase.start_date,

cteBase.end_date

ORDER BY

user_id,

start_date,

end_date

LutzM SSC Guru Points: 107049 More actions · Answer 2

You might want to have a look at this blog. It will demonstrate an alternative solution that most probably will perform better than your current approach.

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 3

Go here and drop down the Publish Date list and get a copy of the March SolidQ Journal. Read Itzik Ben-Gan's Packing Intervals article. STUNNINGLY efficient solution for what you need. He was able to collapse 5M rows of hot data in just THREE SECONDS!

http://www.solidq.com/sqj/Pages/Home.aspx

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

pYak Ten Centuries Points: 1027 More actions · Answer 4

Thanks Lutz and Kevin...

This is a double WIN for me... not only did you point me to the exact solution I was looking for

but you also introduced me to SolidQ. Their Journal is going to be another invaluable SQL Server resource

and provide me with alot of interesting reading for this weekend.

Cheers