March 27, 2012 at 3:54 am
Hello All
Not sure this doesn't qualify as a 'newbie' question, but it certainly is beyond my capabilities!
I have an inventory-type problem where I need to track start and end dates for the state of particular items.
For example I get monthy updates of products:
IF OBJECT_ID('TempDB..#inventory_tbl','U') IS NOT NULL
DROP TABLE #inventory_tbl
CREATE TABLE #inventory_tbl(
[Month] [smalldatetime] NOT NULL,
[ItemID] [int] NOT NULL,
[count A] [int] NULL,
[count B] [int] NULL
)
insert into #inventory_tbl
([Month],[ItemID],[count A],[count B])
SELECT 'Jan 1 2011 12:00AM','1','10','20' UNION ALL
SELECT 'Jan 1 2011 12:00AM','2','100','200' UNION ALL
SELECT 'Feb 1 2011 12:00AM','1','10','20' UNION ALL
SELECT 'Feb 1 2011 12:00AM','2','100','200' UNION ALL
SELECT 'Mar 1 2011 12:00AM','1','15','25' UNION ALL
SELECT 'Mar 1 2011 12:00AM','2','100','250' UNION ALL
SELECT 'Apr 1 2011 12:00AM','1','15','25' UNION ALL
SELECT 'Apr 1 2011 12:00AM','2','200','250' UNION ALL
SELECT 'May 1 2011 12:00AM','1','15','25' UNION ALL
SELECT 'May 1 2011 12:00AM','2','200','250'
so we have
MonthItemIDCount ACount B
2011-01-01 00:00:0011020
2011-01-01 00:00:002100200
2011-02-01 00:00:0011020
2011-02-01 00:00:002100200
2011-03-01 00:00:0011525
2011-03-01 00:00:002100250
2011-04-01 00:00:0011525
2011-04-01 00:00:002200250
2011-05-01 00:00:0011525
2011-05-01 00:00:002200250
So the result I am after would be:
ItemID[Count A][Count B]Start Month End Month
1102001/01/2011 00:0001/03/2011 00:00
1152501/03/2011 00:00NULL
210020001/01/2011 00:0001/03/2011 00:00
210025001/03/2011 00:0001/04/2011 00:00
220025001/04/2011 00:00NULL
I have found some neat solutions here but 1) I don't know how to generalise to the case with several products and 2) I have more than 100 months of data so recursion seems not to be appropriate in this case (?).
If anyone has some pointers as to where to go from here I would be most grateful!
Philippe
March 27, 2012 at 4:19 am
so we have
MonthItemIDCount ACount B
2011-01-01 00:00:0011020
2011-01-01 00:00:002100200
2011-02-01 00:00:0011020
2011-02-01 00:00:002100200
2011-03-01 00:00:0011525
2011-03-01 00:00:002100250
2011-04-01 00:00:0011525
2011-04-01 00:00:002200250
2011-05-01 00:00:0011525
2011-05-01 00:00:002200250
So the result I am after would be:
ItemID[Count A][Count B]Start MonthEnd Month
1102001/01/201101/03/2011
1152501/03/2011NULL
210015001/01/201101/04/2011
220015001/04/2011NULL
Could you please check and correct your setup, and provide a bit more details:
1. Your sample data setup contains three entries for Id: 1, Count A: 15, Count b: 25, Why [End Month] expected as NULL?
2. There is data in setup for for Id: 2, Count A: 100, Count b: 200 and
2, Count A: 200, Count b: 250.
But in your expected results you don't have anything for them at all.
March 27, 2012 at 4:35 am
Apologies for that, I've edited the post with what the correct results should be.
The start month is the first month that an item as a particular state (values for count A and Count B)
The end month is the 1st day of the month AFTER the last appearance of the state. For example item 2 has values 100 & 200 up to and including February, so the end month is March. If there is no 'next month' the the value is NULL (indicating the current state for the item).
Hope that makes more sense
Philippe
March 27, 2012 at 4:41 am
See the link here
http://www.sqlservercentral.com/articles/T-SQL/71550/
WITH CTE AS (
SELECT [Month],[ItemID],[count A],[count B],
ROW_NUMBER() OVER(PARTITION BY [ItemID] ORDER BY [Month]) -
ROW_NUMBER() OVER(PARTITION BY [ItemID],[count A],[count B] ORDER BY [Month]) AS rnDiff
FROM #inventory_tbl),
Results AS (
SELECT [ItemID],
[count A],
[count B],
MIN([Month]) AS [Start Month],
ROW_NUMBER() OVER(PARTITION BY [ItemID] ORDER BY MIN([Month])) AS rn
FROM CTE
GROUP BY [ItemID],[count A],[count B],rnDiff)
SELECT r1.[ItemID],
r1.[count A],
r1.[count B],
r1.[Start Month],
r2.[Start Month] AS [End Month]
FROM Results r1
LEFT OUTER JOIN Results r2 ON r2.[ItemID]=r1.[ItemID]
AND r2.rn=r1.rn+1;
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537March 27, 2012 at 4:51 am
If there is no 'next month' the the value is NULL (indicating the current state for the item).
Which combination of id's in a sample data you gave has no "next month"?
March 27, 2012 at 4:52 am
Brilliant. I don't understand it yet, but still brilliant. Many thanks for your help!
Philippe
March 27, 2012 at 4:58 am
philippe-546556 (3/27/2012)
Brilliant. I don't understand it yet, but still brilliant. Many thanks for your help!Philippe
The "Group Islands of Contiguous Dates" article I referenced explains how this works.
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537March 27, 2012 at 5:00 am
Hi Eugene, to answer your question when records are in chronological order, if there are no more recent records with the same ID then that defines the current state of that ID and that will be used to define the state of that ID in the result set with 'next month' as NULL.
Thanks Mark, it's a little above what I'm used to but I'll work through it thoroughly.
Philippe
March 27, 2012 at 5:31 am
Your sample data 1:
MonthItemIDCount ACount B
2011-01-01 00:00:0011020
2011-02-01 00:00:0011020
Expected:
ItemID[Count A][Count B]Start Month End Month
1102001/01/2011 00:0001/03/2011 00:00
Your sample data 2:
MonthItemIDCount ACount B
2011-04-01 00:00:002200250
2011-05-01 00:00:002200250
But Expected:
ItemID[Count A][Count B]Start Month End Month
220025001/04/2011 00:00NULL
Why the second sample should show NULL as End Month?
Both have two records per Id, Count A and Count B, both have "sequencial" dates! How the second sample is a different to the first one?
March 27, 2012 at 6:13 am
Hi Eugene
For item 1, there are records after February, so there needs to be an end month. For item 2 there are no more records after May so may is the current record and in the result set we set the last Month to NULL.
does that make sense?
Philippe
March 27, 2012 at 6:46 am
Hi Eugene
For item 1, there are records after February, so there needs to be an end month. For item 2 there are no more records after May so may is the current record and in the result set we set the last Month to NULL.
does that make sense?
Philippe
March 27, 2012 at 7:21 am
I still couldn't get it from your explanation, but looks like I do see now what you want ...
Lets create some more test data (around 1.3 millions records)
declare @i int
set @i= 2
while @i < 262144
begin
insert #inventory_tbl select [Month],[ItemID]+@i,[count A],[count B] from #inventory_tbl
set @i = @i*2
end
Now, SQL windowed functions are great, but can be slower sometimes, for larger datasets, than other solutions...
The following code does the same, without use of windowed functions. It does run almost twice as faster:
;WITH CTE
AS
(
SELECT [ItemID],[count A],[count B], MIN([Month]) [Start Month]
FROM #inventory_tbl
Group BY [ItemID],[count A],[count B]
)
SELECT r1.*, MIN(r2.[Start Month]) [End Date]
FROM CTE r1
LEFT JOIN CTE r2
ON r2.ItemID = r1.ItemID AND r2.[Start Month] > r1.[Start Month]
Group BY r1.[ItemID],r1.[count A],r1.[count B],r1.[Start Month]
order by r1.[ItemID],r1.[count A],r1.[count B]
There are couple things about the way with windowed function which do make it quite slower:
1. To many of windowed functions used
2. Use of r2.rn=r1.rn+1 in LEFT JOIN.
Also, with a bit of tuning (eg. adding some indexes) you can get much better performance from both solutions...
March 27, 2012 at 7:27 am
Eugene Elutin (3/27/2012)
I still couldn't get it from your explanation, but looks like I do see now what you want ...Lets create some more test data (around 1.3 millions records)
declare @i int
set @i= 2
while @i < 262144
begin
insert #inventory_tbl select [Month],[ItemID]+@i,[count A],[count B] from #inventory_tbl
set @i = @i*2
end
Now, SQL windowed functions are great, but can be slower sometimes, for larger datasets, than other solutions...
The following code does the same, without use of windowed functions. It does run almost twice as faster:
;WITH CTE
AS
(
SELECT [ItemID],[count A],[count B], MIN([Month]) [Start Month]
FROM #inventory_tbl
Group BY [ItemID],[count A],[count B]
)
SELECT r1.*, MIN(r2.[Start Month]) [End Date]
FROM CTE r1
LEFT JOIN CTE r2
ON r2.ItemID = r1.ItemID AND r2.[Start Month] > r1.[Start Month]
Group BY r1.[ItemID],r1.[count A],r1.[count B],r1.[Start Month]
order by r1.[ItemID],r1.[count A],r1.[count B]
There are couple things about the way with windowed function which do make it quite slower:
1. To many of windowed functions used
2. Use of r2.rn=r1.rn+1 in LEFT JOIN.
Also, with a bit of tuning (eg. adding some indexes) you can get much better performance from both solutions...
Try adding this into the original sample data
SELECT 'Jun 1 2011 12:00AM','1','10','20' UNION ALL
The results of the two queries are different.
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537March 27, 2012 at 8:24 am
...
Try adding this into the original sample data
SELECT 'Jun 1 2011 12:00AM','1','10','20' UNION ALL
The results of the two queries are different.
My query will only work if status never returns to something it was before (as per OP sample data).
If it does, I would try to use "quirky update" method.
March 27, 2012 at 3:50 pm
Cheers Eugene, yes I'd figured that one out at least, but that was precisely the problem; a simple grouping won't work because the months have to be contiguous, otherwise you can get groupings over non-contiguous periods if states can revert to previous states. I guess the 'quirky update' should be called the 'right solution' instead.
Can I ask what you call 'windowed functions'?
Philippe
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply