July 24, 2008 at 7:33 am
I have a series of dates with values assigned to them. I want to pull only the records that have the same value for 3 consecutive days.
CREATE TABLE [dbo].[TestTable](
[keyfield] [int] IDENTITY(1,1) NOT NULL,
[datefield] [datetime] NOT NULL,
[valuefield] [int] NOT NULL,
CONSTRAINT [PK_TestTable] PRIMARY KEY CLUSTERED
(
[keyfield] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
SET IDENTITY_INSERT [dbo].[TestTable] ON
INSERT INTO [dbo].[TestTable] ([keyfield], [datefield], [valuefield]) VALUES (1, '20080713 00:00:00.000', 1)
INSERT INTO [dbo].[TestTable] ([keyfield], [datefield], [valuefield]) VALUES (2, '20080714 00:00:00.000', 4)
INSERT INTO [dbo].[TestTable] ([keyfield], [datefield], [valuefield]) VALUES (3, '20080715 00:00:00.000', 6)
INSERT INTO [dbo].[TestTable] ([keyfield], [datefield], [valuefield]) VALUES (4, '20080720 00:00:00.000', 6)
INSERT INTO [dbo].[TestTable] ([keyfield], [datefield], [valuefield]) VALUES (5, '20080717 00:00:00.000', 6)
INSERT INTO [dbo].[TestTable] ([keyfield], [datefield], [valuefield]) VALUES (6, '20080718 00:00:00.000', 4)
INSERT INTO [dbo].[TestTable] ([keyfield], [datefield], [valuefield]) VALUES (7, '20080721 00:00:00.000', 2)
INSERT INTO [dbo].[TestTable] ([keyfield], [datefield], [valuefield]) VALUES (8, '20080716 00:00:00.000', 6)
INSERT INTO [dbo].[TestTable] ([keyfield], [datefield], [valuefield]) VALUES (9, '20080801 00:00:00.000', 7)
INSERT INTO [dbo].[TestTable] ([keyfield], [datefield], [valuefield]) VALUES (10, '20080805 00:00:00.000', 8)
INSERT INTO [dbo].[TestTable] ([keyfield], [datefield], [valuefield]) VALUES (11, '20080806 00:00:00.000', 8)
INSERT INTO [dbo].[TestTable] ([keyfield], [datefield], [valuefield]) VALUES (12, '20080807 00:00:00.000', 8)
INSERT INTO [dbo].[TestTable] ([keyfield], [datefield], [valuefield]) VALUES (13, '20080808 00:00:00.000', 4)
SET IDENTITY_INSERT [dbo].[TestTable] OFF
I should be returning the following records
Date Value
7/15/2008 6
7/16/2008 6
7/17/2008 6
8/05/2008 8
8/06/2008 8
8/07/2008 8
My thought was that if I could figure out the start and end date for each range I would be good. But I can't seem to get there.
This is what I have so far. I'm trying not to use any loops or cursors.
;with TableWRowNum
as
(
select row_number() over ( Order By Datefield,valuefield) as row,datefield,valuefield
from testtable
)
select TableWRowNum.row,TableWRowNum.datefield curRecDay,
PrevRow.datefield prevRecDay,NextRow.datefield nextRecDay,
TableWRowNum.valuefield,
datediff(d,PrevRow.datefield,TableWRowNum.datefield) daysBefore,
datediff(d,TableWRowNum.datefield,NextRow.datefield) daysAfter,
case when datediff(d,PrevRow.datefield,TableWRowNum.datefield)=1 or datediff(d,TableWRowNum.datefield,NextRow.datefield) = 1 then 1 else 0 end sequential
from TableWRowNum left join TableWRowNum PrevRow on TableWRowNum.row=PrevRow.row+1
left join TableWRowNum NextRow on TableWRowNum.row = NextRow.row-1
Any help would be appreciated.
July 24, 2008 at 7:56 am
(Your sample data doesn't seem to match your requirements. Take a look at the dates where the value is 6, they go 15, 17, 20, but you have a 4 assigned on the 18th.)
Here's how I'd do it, at least as a rough draft:
;with
CTE1 (Row, Date, Val) as -- Date sequence
(select row_number() over (order by datefield),
datefield, valuefield
from dbo.TestTable),
CTE2 (SDate, EDate) as -- Same value 3 in a row
(select c1.date, c3.date
from cte1 c1
inner join cte1 c2
on c1.val = c2.val
and c1.row = c2.row-1
inner join cte1 c3
on c1.val = c3.val
and c1.row = c3.row-2)
select datefield, valuefield -- Final select
from dbo.TestTable
inner join cte2
on datefield between sdate and edate;
Now, that doesn't quite handle situations like four (or more) in a row properly. Is that a possibility? If so, how do you want it to work? Give all four row, give overlapping sets of 3, something else?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 24, 2008 at 8:55 am
The data is correct. On the 15,16,17,20 there is a 6. I don't want to pull the 20th because it is not sequential. The 18th is a 4 because I wanted there to be a gap so the 20th won't be part of the result.
I can't base it on always pulling where there are 3 sequential dates with the same results. That number will be variable. It would be 4...20...or even 100.
Your SQL did give me the correct results. But I will need to now make the "3" a variable number.
I don't know if this simplifies or complicates things more.
Date Value Start End Days
7/13/2008 1 7/13/2008 7/13/2008 1
7/14/2008 4 7/14/2008 7/14/2008 1
7/15/2008 6 7/15/2008 7/17/2008 3
7/16/2008 6 7/15/2008 7/17/2008 3
7/17/2008 6 7/15/2008 7/17/2008 3
7/18/2008 4 7/18/2008 7/18/2008 1
7/20/2008 6 7/20/2008 7/20/2008 1
7/21/2008 1 7/21/2008 7/21/2008 1
8/01/2008 7 8/01/2008 8/01/2008 1
8/05/2008 8 8/05/2008 8/07/2008 3
8/06/2008 8 8/05/2008 8/07/2008 3
8/07/2008 8 8/05/2008 8/07/2008 3
8/08/2008 4 8/08/2008 8/08/2008 1
I need all records with a 3...but 3 could be a variable. I could be looking for 2 or 5 or 12 etc.
I hope that explains it better.
July 24, 2008 at 12:17 pm
I didn't have time to test this, but give it a shot and tell me what's broken, and I'll have time to get back to it in a little while.
;with
CTE1 (Row, Date, Val) as -- Date sequence
(select row_number() over (order by datefield),
datefield, valuefield
from dbo.TestTable),
CTE2 (Row, Val, Date, ValSet) as -- Sets
(select c1.row, c1.val, c1.date,
row_number() over (order by c1.row)
from cte1 c1
inner join cte1 c2
on c1.val != c2.val
and c1.row = c2.row+1
union all
select c3.row, c3.val, c3.valset, cte2.valset
from cte1 c3
inner join cte2
on c3.val = cte2.val
and c3.row = cte2.row+1),
CTE3 (SDate, EDate, Dates) as -- Ranges and Qty
(select min(date), max(date), count(*)
from cte2
group by valset)
select datefield, valuefield, dates -- Final select
from dbo.TestTable
inner join cte3
on datefield between sdate and edate;
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 24, 2008 at 12:32 pm
Msg 207, Level 16, State 1, Line 38
Invalid column name 'valset'.
You are calling cte2 from withing cte2. I'm trying to figure out what you are trying to do.
July 24, 2008 at 1:57 pm
You can do that with CTEs (Common Table Expression). They can be recursive.
Not sure why it would give you an invalid column name. I'll test it.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 24, 2008 at 2:03 pm
In the second part of CTE2 (after Union All), change "c3.valset" to "c3.date". That should give you what you need, regardless of the number of dates in a row. I tested it on your test data and it seems to do what's needed.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 24, 2008 at 2:48 pm
I finally got it. Yours worked but when I added some more data it broke it.
INSERT INTO [dbo].[TestTable] ([keyfield], [datefield], [valuefield]) VALUES (14, '20080902 00:00:00.000', 4)
INSERT INTO [dbo].[TestTable] ([keyfield], [datefield], [valuefield]) VALUES (15, '20080903 00:00:00.000', 4)
INSERT INTO [dbo].[TestTable] ([keyfield], [datefield], [valuefield]) VALUES (16, '20080904 00:00:00.000', 4)
INSERT INTO [dbo].[TestTable] ([keyfield], [datefield], [valuefield]) VALUES (17, '20080910 00:00:00.000', 4)
But this works!
;with TableWRowNum
as
(
select row_number() over ( Order By Datefield,valuefield) as row,datefield,valuefield
from testtable
)
,
CTE2 as (
select TableWRowNum.row,TableWRowNum.datefield curRecDay,
PrevRow.datefield prevRecDay,NextRow.datefield nextRecDay,
TableWRowNum.valuefield,
case when PrevRow.ValueField != TableWRowNum.ValueField or PrevRow.ValueField is null then TableWRowNum.datefield
when datediff(d,PrevRow.datefield,TableWRowNum.datefield) > 1 then TableWRowNum.datefield
else ''
end as StartGroup,
case when NextRow.ValueField != TableWRowNum.ValueField or NextRow.ValueField is null then TableWRowNum.datefield
when datediff(d,TableWRowNum.datefield,NextRow.datefield) > 1 then TableWRowNum.datefield
else ''
end as EndGroup
from TableWRowNum left join TableWRowNum PrevRow on TableWRowNum.row=PrevRow.row+1
left join TableWRowNum NextRow on TableWRowNum.row = NextRow.row-1
)
,
CTE3 as (
select row,curRecDay,valuefield,startgroup,endgroup,
case startgroup when '1900-01-01 00:00:00.000' then (select max(startgroup) from cte2 where row < c2.row) else startgroup end as StartDate,
case endgroup when '1900-01-01 00:00:00.000' then (select min(endgroup) from cte2 where row > c2.row and endgroup != '1900-01-01 00:00:00.000') else endgroup end as EndDate
from CTE2 c2
)
select row,curRecDay,valuefield,startdate,enddate,datediff(d,startdate,enddate)+1 as Days
from cte3
Returns:
row curRecDay valuefield startdate enddate Days
--------------- ------------------------ -------- ----------------------- ------------------------ ----------
1 2008-07-13 00:00:00.000 1 2008-07-13 00:00:00.000 2008-07-13 00:00:00.000 1
2 2008-07-14 00:00:00.000 4 2008-07-14 00:00:00.000 2008-07-14 00:00:00.000 1
3 2008-07-15 00:00:00.000 6 2008-07-15 00:00:00.000 2008-07-17 00:00:00.000 3
4 2008-07-16 00:00:00.000 6 2008-07-15 00:00:00.000 2008-07-17 00:00:00.000 3
5 2008-07-17 00:00:00.000 6 2008-07-15 00:00:00.000 2008-07-17 00:00:00.000 3
6 2008-07-18 00:00:00.000 4 2008-07-18 00:00:00.000 2008-07-18 00:00:00.000 1
7 2008-07-20 00:00:00.000 6 2008-07-20 00:00:00.000 2008-07-20 00:00:00.000 1
8 2008-07-21 00:00:00.000 2 2008-07-21 00:00:00.000 2008-07-21 00:00:00.000 1
9 2008-08-01 00:00:00.000 7 2008-08-01 00:00:00.000 2008-08-01 00:00:00.000 1
10 2008-08-05 00:00:00.000 8 2008-08-05 00:00:00.000 2008-08-07 00:00:00.000 3
11 2008-08-06 00:00:00.000 8 2008-08-05 00:00:00.000 2008-08-07 00:00:00.000 3
12 2008-08-07 00:00:00.000 8 2008-08-05 00:00:00.000 2008-08-07 00:00:00.000 3
13 2008-08-08 00:00:00.000 4 2008-08-08 00:00:00.000 2008-08-08 00:00:00.000 1
14 2008-09-02 00:00:00.000 4 2008-09-02 00:00:00.000 2008-09-04 00:00:00.000 3
15 2008-09-03 00:00:00.000 4 2008-09-02 00:00:00.000 2008-09-04 00:00:00.000 3
16 2008-09-04 00:00:00.000 4 2008-09-02 00:00:00.000 2008-09-04 00:00:00.000 3
17 2008-09-10 00:00:00.000 4 2008-09-10 00:00:00.000 2008-09-10 00:00:00.000 1
Thanks for helping today.
July 24, 2008 at 2:51 pm
Glad I could help.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply