Grouping based on contiguous dates

Question

Grouping based on contiguous dates

philippe-546556

SSC Enthusiast

Points: 127
More actions
March 27, 2012 at 3:54 am

#255399

Hello All
Not sure this doesn't qualify as a 'newbie' question, but it certainly is beyond my capabilities!
I have an inventory-type problem where I need to track start and end dates for the state of particular items.
For example I get monthy updates of products:
IF OBJECT_ID('TempDB..#inventory_tbl','U') IS NOT NULL
DROP TABLE #inventory_tbl
CREATE TABLE #inventory_tbl(
[Month] [smalldatetime] NOT NULL,
[ItemID] [int] NOT NULL,
[count A] [int] NULL,
[count B] [int] NULL
)
insert into #inventory_tbl
([Month],[ItemID],[count A],[count B])
SELECT 'Jan 1 2011 12:00AM','1','10','20' UNION ALL
SELECT 'Jan 1 2011 12:00AM','2','100','200' UNION ALL
SELECT 'Feb 1 2011 12:00AM','1','10','20' UNION ALL
SELECT 'Feb 1 2011 12:00AM','2','100','200' UNION ALL
SELECT 'Mar 1 2011 12:00AM','1','15','25' UNION ALL
SELECT 'Mar 1 2011 12:00AM','2','100','250' UNION ALL
SELECT 'Apr 1 2011 12:00AM','1','15','25' UNION ALL
SELECT 'Apr 1 2011 12:00AM','2','200','250' UNION ALL
SELECT 'May 1 2011 12:00AM','1','15','25' UNION ALL
SELECT 'May 1 2011 12:00AM','2','200','250'
so we have
MonthItemIDCount ACount B
2011-01-01 00:00:0011020
2011-01-01 00:00:002100200
2011-02-01 00:00:0011020
2011-02-01 00:00:002100200
2011-03-01 00:00:0011525
2011-03-01 00:00:002100250
2011-04-01 00:00:0011525
2011-04-01 00:00:002200250
2011-05-01 00:00:0011525
2011-05-01 00:00:002200250
So the result I am after would be:
ItemID[Count A][Count B]Start Month End Month
1102001/01/2011 00:0001/03/2011 00:00
1152501/03/2011 00:00NULL
210020001/01/2011 00:0001/03/2011 00:00
210025001/03/2011 00:0001/04/2011 00:00
220025001/04/2011 00:00NULL
I have found some neat solutions here but 1) I don't know how to generalise to the case with several products and 2) I have more than 100 months of data so recursion seems not to be appropriate in this case (?).
If anyone has some pointers as to where to go from here I would be most grateful!
Philippe

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply

Eugene Elutin SSC Guru Points: 59322 More actions · Answer 1

so we have
MonthItemIDCount ACount B
2011-01-01 00:00:0011020
2011-01-01 00:00:002100200
2011-02-01 00:00:0011020
2011-02-01 00:00:002100200
2011-03-01 00:00:0011525
2011-03-01 00:00:002100250
2011-04-01 00:00:0011525
2011-04-01 00:00:002200250
2011-05-01 00:00:0011525
2011-05-01 00:00:002200250
So the result I am after would be:
ItemID[Count A][Count B]Start MonthEnd Month
1102001/01/201101/03/2011
1152501/03/2011NULL
210015001/01/201101/04/2011
220015001/04/2011NULL

Could you please check and correct your setup, and provide a bit more details:

1. Your sample data setup contains three entries for Id: 1, Count A: 15, Count b: 25, Why [End Month] expected as NULL?

2. There is data in setup for for Id: 2, Count A: 100, Count b: 200 and

2, Count A: 200, Count b: 250.

But in your expected results you don't have anything for them at all.

_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help[/url]

philippe-546556 SSC Enthusiast Points: 127 More actions · Answer 2

Apologies for that, I've edited the post with what the correct results should be.

The start month is the first month that an item as a particular state (values for count A and Count B)

The end month is the 1st day of the month AFTER the last appearance of the state. For example item 2 has values 100 & 200 up to and including February, so the end month is March. If there is no 'next month' the the value is NULL (indicating the current state for the item).

Hope that makes more sense

Philippe

Mark Cowne One Orange Chip Points: 26953 More actions · Answer 3

See the link here

http://www.sqlservercentral.com/articles/T-SQL/71550/

WITH CTE AS (

SELECT [Month],[ItemID],[count A],[count B],

ROW_NUMBER() OVER(PARTITION BY [ItemID] ORDER BY [Month]) -

ROW_NUMBER() OVER(PARTITION BY [ItemID],[count A],[count B] ORDER BY [Month]) AS rnDiff

FROM #inventory_tbl),

Results AS (

SELECT [ItemID],

[count A],

[count B],

MIN([Month]) AS [Start Month],

ROW_NUMBER() OVER(PARTITION BY [ItemID] ORDER BY MIN([Month])) AS rn

FROM CTE

GROUP BY [ItemID],[count A],[count B],rnDiff)

SELECT r1.[ItemID],

r1.[count A],

r1.[count B],

r1.[Start Month],

r2.[Start Month] AS [End Month]

FROM Results r1

LEFT OUTER JOIN Results r2 ON r2.[ItemID]=r1.[ItemID]

AND r2.rn=r1.rn+1;

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

Eugene Elutin SSC Guru Points: 59322 More actions · Answer 4

If there is no 'next month' the the value is NULL (indicating the current state for the item).

Which combination of id's in a sample data you gave has no "next month"?

_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help[/url]

philippe-546556 SSC Enthusiast Points: 127 More actions · Answer 5

Brilliant. I don't understand it yet, but still brilliant. Many thanks for your help!

Philippe

Mark Cowne One Orange Chip Points: 26953 More actions · Answer 6

philippe-546556 (3/27/2012)
Brilliant. I don't understand it yet, but still brilliant. Many thanks for your help!
Philippe

The "Group Islands of Contiguous Dates" article I referenced explains how this works.

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

philippe-546556 SSC Enthusiast Points: 127 More actions · Answer 7

Hi Eugene, to answer your question when records are in chronological order, if there are no more recent records with the same ID then that defines the current state of that ID and that will be used to define the state of that ID in the result set with 'next month' as NULL.

Thanks Mark, it's a little above what I'm used to but I'll work through it thoroughly.

Philippe

Eugene Elutin SSC Guru Points: 59322 More actions · Answer 8

Your sample data 1:

MonthItemIDCount ACount B

2011-01-01 00:00:0011020

2011-02-01 00:00:0011020

Expected:

ItemID[Count A][Count B]Start Month End Month

1102001/01/2011 00:0001/03/2011 00:00

Your sample data 2:

MonthItemIDCount ACount B

2011-04-01 00:00:002200250

2011-05-01 00:00:002200250

But Expected:

ItemID[Count A][Count B]Start Month End Month

220025001/04/2011 00:00NULL

Why the second sample should show NULL as End Month?

Both have two records per Id, Count A and Count B, both have "sequencial" dates! How the second sample is a different to the first one?

_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help[/url]

philippe-546556 SSC Enthusiast Points: 127 More actions · Answer 9

Hi Eugene

For item 1, there are records after February, so there needs to be an end month. For item 2 there are no more records after May so may is the current record and in the result set we set the last Month to NULL.

does that make sense?

Philippe

philippe-546556 SSC Enthusiast Points: 127 More actions · Answer 10

Hi Eugene

For item 1, there are records after February, so there needs to be an end month. For item 2 there are no more records after May so may is the current record and in the result set we set the last Month to NULL.

does that make sense?

Philippe

Eugene Elutin SSC Guru Points: 59322 More actions · Answer 11

I still couldn't get it from your explanation, but looks like I do see now what you want ...

Lets create some more test data (around 1.3 millions records)

declare @i int

set @i= 2

while @i < 262144

begin

insert #inventory_tbl select [Month],[ItemID]+@i,[count A],[count B] from #inventory_tbl

set @i = @i*2

end

Now, SQL windowed functions are great, but can be slower sometimes, for larger datasets, than other solutions...

The following code does the same, without use of windowed functions. It does run almost twice as faster:

;WITH CTE

AS

(

SELECT [ItemID],[count A],[count B], MIN([Month]) [Start Month]

FROM #inventory_tbl

Group BY [ItemID],[count A],[count B]

)

SELECT r1.*, MIN(r2.[Start Month]) [End Date]

FROM CTE r1

LEFT JOIN CTE r2

ON r2.ItemID = r1.ItemID AND r2.[Start Month] > r1.[Start Month]

Group BY r1.[ItemID],r1.[count A],r1.[count B],r1.[Start Month]

order by r1.[ItemID],r1.[count A],r1.[count B]

There are couple things about the way with windowed function which do make it quite slower:

1. To many of windowed functions used

2. Use of r2.rn=r1.rn+1 in LEFT JOIN.

Also, with a bit of tuning (eg. adding some indexes) you can get much better performance from both solutions...

_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help[/url]

Mark Cowne One Orange Chip Points: 26953 More actions · Answer 12

Eugene Elutin (3/27/2012)
I still couldn't get it from your explanation, but looks like I do see now what you want ...
Lets create some more test data (around 1.3 millions records)
declare @i int
set @i= 2
while @i < 262144
begin
insert #inventory_tbl select [Month],[ItemID]+@i,[count A],[count B] from #inventory_tbl
set @i = @i*2
end
Now, SQL windowed functions are great, but can be slower sometimes, for larger datasets, than other solutions...
The following code does the same, without use of windowed functions. It does run almost twice as faster:
;WITH CTE
AS
(
SELECT [ItemID],[count A],[count B], MIN([Month]) [Start Month]
FROM #inventory_tbl
Group BY [ItemID],[count A],[count B]
)
SELECT r1.*, MIN(r2.[Start Month]) [End Date]
FROM CTE r1
LEFT JOIN CTE r2
ON r2.ItemID = r1.ItemID AND r2.[Start Month] > r1.[Start Month]
Group BY r1.[ItemID],r1.[count A],r1.[count B],r1.[Start Month]
order by r1.[ItemID],r1.[count A],r1.[count B]
There are couple things about the way with windowed function which do make it quite slower:
1. To many of windowed functions used
2. Use of r2.rn=r1.rn+1 in LEFT JOIN.
Also, with a bit of tuning (eg. adding some indexes) you can get much better performance from both solutions...

Try adding this into the original sample data

SELECT 'Jun 1 2011 12:00AM','1','10','20' UNION ALL

The results of the two queries are different.

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

Eugene Elutin SSC Guru Points: 59322 More actions · Answer 13

...
Try adding this into the original sample data
SELECT 'Jun 1 2011 12:00AM','1','10','20' UNION ALL
The results of the two queries are different.

My query will only work if status never returns to something it was before (as per OP sample data).

If it does, I would try to use "quirky update" method.

_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help[/url]

philippe-546556 SSC Enthusiast Points: 127 More actions · Answer 14

Cheers Eugene, yes I'd figured that one out at least, but that was precisely the problem; a simple grouping won't work because the months have to be contiguous, otherwise you can get groupings over non-contiguous periods if states can revert to previous states. I guess the 'quirky update' should be called the 'right solution' instead.

Can I ask what you call 'windowed functions'?

Philippe