April 13, 2015 at 1:44 am
Hi, thanks for reading my topic.
I have a query question.
Consider a table with the following structure:
RecordID (PK - int) - RecordDate (DateTime)
I need to find all records that fall within a 7 day period slot based on the first RecordDate of a specific slot.
Example, consider the following records:
RecordID - RecordDate
1 - 2015-04-01 14:00
2 - 2015-04-03 15:00
3 - 2015-04-03 16:05
4 - 2015-04-03 19:23
5 - 2015-04-06 09:15
6 - 2015-04-06 11:30
7 - 2015-04-07 12:00
8 - 2015-04-09 15:15
The result of the query I'd like should look something like this
1
2
5
7
8
So basically I'd like to leave record 3 and 4 out because they fall within 24 hours of record 2 and I'd like to leave record 6 out because it falls within 24 hours of record 5.
I'd tried working with a CTE and set a dateadd(d, 1, recorddate), join it on itself and use a between From / To filter on the join but that didn't work. I don't think NTILE will work with this?
I'd love to hear some suggestion on how to approach this problem.
Thanks and have a great day.
April 13, 2015 at 2:11 am
Try this:
if object_id('tempdb..#dates', 'U') is not null
drop table #dates;
create table #dates
(
RecordId int primary key clustered
,RecordDate datetime
);
insert #dates
(RecordId, RecordDate)
values (1, '2015-04-01 14:00'),
(2, '2015-04-03 15:00'),
(3, '2015-04-03 16:05'),
(4, '2015-04-03 19:23'),
(5, '2015-04-06 09:15'),
(6, '2015-04-06 11:30'),
(7, '2015-04-07 12:00'),
(8, '2015-04-09 15:15');
with recs
as (select d.*
,date1 = lag(RecordDate, 1, '19000101') over (order by d.RecordId)
from #dates d
)
select recs.RecordId
from recs
where datediff(hour, recs.date1, recs.RecordDate) > 24
And please post consumable DDL next time.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
April 13, 2015 at 4:07 am
Thank you very much Phil for point out the LAG function.
I will post proper DDL next time.
Have a nice day.
April 13, 2015 at 8:31 am
I'm not sure if this more simple option would fit you.
What happens if a row falls into the following 24 hours but it's on a different date? This option would include row 9 but Phil's would exclude it.
if object_id('tempdb..#dates', 'U') is not null
drop table #dates;
create table #dates
(
RecordId int primary key clustered
,RecordDate datetime
);
insert #dates
(RecordId, RecordDate)
values (1, '2015-04-01 14:00'),
(2, '2015-04-03 15:00'),
(3, '2015-04-03 16:05'),
(4, '2015-04-03 19:23'),
(5, '2015-04-06 09:15'),
(6, '2015-04-06 11:30'),
(7, '2015-04-07 12:00'),
(8, '2015-04-09 15:15'),
(9, '2015-04-10 09:15');
SELECT MIN(RecordId) AS RecordId
FROM #dates
GROUP BY CAST( RecordDate AS date);
April 14, 2015 at 2:35 am
So I'd like to take this a little further, hopefully you can help me with this issue again. So basically I solved the problem of marking records with fall within 7*24 hours of the previous record. I've marked those records with 1 in the column IsDoubleLag.
Unfornately this solution is only based on the previous record. When you have a number of records that all fall within 7*24 hour of the previous records, all records would be marked as IsDoubleLag = 1.
Would it be possible somehow, without the use of a cursor, to have a 7*24 hour break mark.
Consider the following table:
CREATE TABLE #TEMP (ID int, DateCreated datetime, IsDoubleLag bit)
INSERT INTO #TEMP VALUES (1,'2015-01-01 08:11:40.490','0')
INSERT INTO #TEMP VALUES (2,'2015-01-04 02:29:47.777','1')
INSERT INTO #TEMP VALUES (3,'2015-01-04 16:07:12.887','1')
INSERT INTO #TEMP VALUES (4,'2015-01-06 07:26:52.377','1')
INSERT INTO #TEMP VALUES (5,'2015-01-11 00:46:37.117','1')
INSERT INTO #TEMP VALUES (6,'2015-01-11 09:58:24.640','1')
INSERT INTO #TEMP VALUES (7,'2015-01-12 15:43:24.280','1')
INSERT INTO #TEMP VALUES (8,'2015-01-12 19:41:46.213','1')
INSERT INTO #TEMP VALUES (9,'2015-01-15 17:31:49.297','1')
INSERT INTO #TEMP VALUES (10,'2015-01-19 01:20:35.487','1')
INSERT INTO #TEMP VALUES (11,'2015-01-21 15:30:31.100','1')
INSERT INTO #TEMP VALUES (12,'2015-01-26 15:27:54.880','1')
INSERT INTO #TEMP VALUES (13,'2015-03-21 23:52:09.707','0')
INSERT INTO #TEMP VALUES (14,'2015-03-23 13:07:29.353','1')
INSERT INTO #TEMP VALUES (15,'2015-04-01 07:28:20.613','0')
INSERT INTO #TEMP VALUES (16,'2015-04-05 04:56:28.927','1')
In this scenario it starts with RecordID 1 with a date of 2015-01-01 08:11 .. Then RecordID 2, 3 and 4 fall within 7x24 hours. Unfortunately RecordID 5 with a date of 2015-01-11 00:46:37 also falls within the range of the previous record. Which is logical, because this is how it's set up.
But I'd like to have RecordID 5 be marked as a new range record. So basically I'd like it to have the first marked IsDoubleLag record to be leading in marking the subsequent records. When a new break has been marked, it should restart the lag marking.
I know how to fix it through cursor and use a running variable, but I'd like to solve it set based.
Thanks!
April 14, 2015 at 9:36 am
Would you be comfortable using the Quirky Update? http://www.sqlservercentral.com/articles/T-SQL/68467/
April 14, 2015 at 11:16 am
This is a possibility using the Quirky Update.
CREATE TABLE #TEMP (ID int PRIMARY KEY, DateCreated datetime, IsDoubleLag bit)
INSERT INTO #TEMP VALUES (1,'2015-01-01 08:11:40.490','0')
INSERT INTO #TEMP VALUES (2,'2015-01-04 02:29:47.777','1')
INSERT INTO #TEMP VALUES (3,'2015-01-04 16:07:12.887','1')
INSERT INTO #TEMP VALUES (4,'2015-01-06 07:26:52.377','1')
INSERT INTO #TEMP VALUES (5,'2015-01-11 00:46:37.117','1')
INSERT INTO #TEMP VALUES (6,'2015-01-11 09:58:24.640','1')
INSERT INTO #TEMP VALUES (7,'2015-01-12 15:43:24.280','1')
INSERT INTO #TEMP VALUES (8,'2015-01-12 19:41:46.213','1')
INSERT INTO #TEMP VALUES (9,'2015-01-15 17:31:49.297','1')
INSERT INTO #TEMP VALUES (10,'2015-01-19 01:20:35.487','1')
INSERT INTO #TEMP VALUES (11,'2015-01-21 15:30:31.100','1')
INSERT INTO #TEMP VALUES (12,'2015-01-26 15:27:54.880','1')
INSERT INTO #TEMP VALUES (13,'2015-03-21 23:52:09.707','0')
INSERT INTO #TEMP VALUES (14,'2015-03-23 13:07:29.353','1')
INSERT INTO #TEMP VALUES (15,'2015-04-01 07:28:20.613','0')
INSERT INTO #TEMP VALUES (16,'2015-04-05 04:56:28.927','1');
SELECT *
FROM #TEMP
DECLARE @LastDateCreated datetime = '2000', @IsDoubleLag bit, @ID int
UPDATE T SET
@IsDoubleLag = IsDoubleLag = CASE WHEN DATEDIFF( MI, @LastDateCreated, DateCreated) > 24*7*60
THEN 0 ELSE 1 END,
@LastDateCreated = CASE WHEN DATEDIFF( MI, @LastDateCreated, DateCreated) > 24*7*60
THEN DateCreated ELSE @LastDateCreated END,
@ID = ID
FROM #TEMP T WITH( TABLOCKX)
OPTION( MAXDOP 1);
SELECT *
FROM #TEMP
GO
DROP TABLE #TEMP
April 14, 2015 at 1:42 pm
Recursive CTE will do the job and it can be faster then cursor but hardly outperforms quirk update.
CREATE TABLE #TEMP (ID int, DateCreated datetime, IsDoubleLag bit)
INSERT INTO #TEMP VALUES (1,'2015-01-01 08:11:40.490','0')
INSERT INTO #TEMP VALUES (2,'2015-01-04 02:29:47.777','1')
INSERT INTO #TEMP VALUES (3,'2015-01-04 16:07:12.887','1')
INSERT INTO #TEMP VALUES (4,'2015-01-06 07:26:52.377','1')
INSERT INTO #TEMP VALUES (5,'2015-01-11 00:46:37.117','1')
INSERT INTO #TEMP VALUES (6,'2015-01-11 09:58:24.640','1')
INSERT INTO #TEMP VALUES (7,'2015-01-12 15:43:24.280','1')
INSERT INTO #TEMP VALUES (8,'2015-01-12 19:41:46.213','1')
INSERT INTO #TEMP VALUES (9,'2015-01-15 17:31:49.297','1')
INSERT INTO #TEMP VALUES (10,'2015-01-19 01:20:35.487','1')
INSERT INTO #TEMP VALUES (11,'2015-01-21 15:30:31.100','1')
INSERT INTO #TEMP VALUES (12,'2015-01-26 15:27:54.880','1')
INSERT INTO #TEMP VALUES (13,'2015-03-21 23:52:09.707','0')
INSERT INTO #TEMP VALUES (14,'2015-03-23 13:07:29.353','1')
INSERT INTO #TEMP VALUES (15,'2015-04-01 07:28:20.613','0')
INSERT INTO #TEMP VALUES (16,'2015-04-05 04:56:28.927','1');
with steps as (
select t1.id as root, t1.id, t1.IsDoubleLag, nId
from #TEMP t1
outer apply (select top(1) id as nId
from #TEMP t2
where t2.DateCreated>dateadd(day,7, t1.DateCreated)
order by t2.DateCreated) t2
), path as (
--choose roots
select * from steps
where IsDoubleLag = 0
--stepping down the paths
union all
select t1.root, t2.id, t2.IsDoubleLag, t2.nid
from path t1
join steps t2 on t2.id = t1.nId
)
select root, id as fromid, nid-1 as toid, IsDoubleLag
from path
order by root, id
And well, it leaves #temp intact for further investigations. 🙂
April 15, 2015 at 12:53 am
Interesting solution, though this only works if the ID's are sequential. I've tested this solution when ID's are not sequential (+1) .. Then then nid-1 doesn't work obviously.
Maybe I should replace the nid-1 with a LAG() function of some kind.
April 15, 2015 at 2:39 am
It may be you needn't. Toid is meant to be used in a expression kinda select.. where .. fromid <x.id<=toid. And yes, LAG() or something is needed to get exactly the last row within (fromid,toid) when ids aren't sequential.
P.S. And if the oder of ids doesn't follow the order of DateCreated,
then query should use those dates, fromdate, todate instead of ids.
April 15, 2015 at 2:43 am
Great, thanks so much for all you guys' help! Much appreciated. I'll try to give some karma back.
April 24, 2015 at 4:18 am
This works with sql server 2008 too.
create table #dates
(
RecordId int primary key clustered
,RecordDate datetime
);
insert #dates
(RecordId, RecordDate)
values (1, '2015-04-01 14:00'),
(2, '2015-04-03 15:00'),
(3, '2015-04-03 16:05'),
(4, '2015-04-03 19:23'),
(5, '2015-04-06 09:15'),
(6, '2015-04-06 11:30'),
(7, '2015-04-07 12:00'),
(8, '2015-04-09 15:15');
with S as (select *,row_number() over (partition by convert(date,RecordDate) order by RecordId ) as Num from #dates)
select RecordId, RecordDate from S where num=1
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply