Find double records within DateTime range

  • Hi, thanks for reading my topic.

    I have a query question.

    Consider a table with the following structure:

    RecordID (PK - int) - RecordDate (DateTime)

    I need to find all records that fall within a 7 day period slot based on the first RecordDate of a specific slot.

    Example, consider the following records:

    RecordID - RecordDate

    1 - 2015-04-01 14:00

    2 - 2015-04-03 15:00

    3 - 2015-04-03 16:05

    4 - 2015-04-03 19:23

    5 - 2015-04-06 09:15

    6 - 2015-04-06 11:30

    7 - 2015-04-07 12:00

    8 - 2015-04-09 15:15

    The result of the query I'd like should look something like this

    1

    2

    5

    7

    8

    So basically I'd like to leave record 3 and 4 out because they fall within 24 hours of record 2 and I'd like to leave record 6 out because it falls within 24 hours of record 5.

    I'd tried working with a CTE and set a dateadd(d, 1, recorddate), join it on itself and use a between From / To filter on the join but that didn't work. I don't think NTILE will work with this?

    I'd love to hear some suggestion on how to approach this problem.

    Thanks and have a great day.

  • Try this:

    if object_id('tempdb..#dates', 'U') is not null

    drop table #dates;

    create table #dates

    (

    RecordId int primary key clustered

    ,RecordDate datetime

    );

    insert #dates

    (RecordId, RecordDate)

    values (1, '2015-04-01 14:00'),

    (2, '2015-04-03 15:00'),

    (3, '2015-04-03 16:05'),

    (4, '2015-04-03 19:23'),

    (5, '2015-04-06 09:15'),

    (6, '2015-04-06 11:30'),

    (7, '2015-04-07 12:00'),

    (8, '2015-04-09 15:15');

    with recs

    as (select d.*

    ,date1 = lag(RecordDate, 1, '19000101') over (order by d.RecordId)

    from #dates d

    )

    select recs.RecordId

    from recs

    where datediff(hour, recs.date1, recs.RecordDate) > 24

    And please post consumable DDL next time.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Thank you very much Phil for point out the LAG function.

    I will post proper DDL next time.

    Have a nice day.

  • I'm not sure if this more simple option would fit you.

    What happens if a row falls into the following 24 hours but it's on a different date? This option would include row 9 but Phil's would exclude it.

    if object_id('tempdb..#dates', 'U') is not null

    drop table #dates;

    create table #dates

    (

    RecordId int primary key clustered

    ,RecordDate datetime

    );

    insert #dates

    (RecordId, RecordDate)

    values (1, '2015-04-01 14:00'),

    (2, '2015-04-03 15:00'),

    (3, '2015-04-03 16:05'),

    (4, '2015-04-03 19:23'),

    (5, '2015-04-06 09:15'),

    (6, '2015-04-06 11:30'),

    (7, '2015-04-07 12:00'),

    (8, '2015-04-09 15:15'),

    (9, '2015-04-10 09:15');

    SELECT MIN(RecordId) AS RecordId

    FROM #dates

    GROUP BY CAST( RecordDate AS date);

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • So I'd like to take this a little further, hopefully you can help me with this issue again. So basically I solved the problem of marking records with fall within 7*24 hours of the previous record. I've marked those records with 1 in the column IsDoubleLag.

    Unfornately this solution is only based on the previous record. When you have a number of records that all fall within 7*24 hour of the previous records, all records would be marked as IsDoubleLag = 1.

    Would it be possible somehow, without the use of a cursor, to have a 7*24 hour break mark.

    Consider the following table:

    CREATE TABLE #TEMP (ID int, DateCreated datetime, IsDoubleLag bit)

    INSERT INTO #TEMP VALUES (1,'2015-01-01 08:11:40.490','0')

    INSERT INTO #TEMP VALUES (2,'2015-01-04 02:29:47.777','1')

    INSERT INTO #TEMP VALUES (3,'2015-01-04 16:07:12.887','1')

    INSERT INTO #TEMP VALUES (4,'2015-01-06 07:26:52.377','1')

    INSERT INTO #TEMP VALUES (5,'2015-01-11 00:46:37.117','1')

    INSERT INTO #TEMP VALUES (6,'2015-01-11 09:58:24.640','1')

    INSERT INTO #TEMP VALUES (7,'2015-01-12 15:43:24.280','1')

    INSERT INTO #TEMP VALUES (8,'2015-01-12 19:41:46.213','1')

    INSERT INTO #TEMP VALUES (9,'2015-01-15 17:31:49.297','1')

    INSERT INTO #TEMP VALUES (10,'2015-01-19 01:20:35.487','1')

    INSERT INTO #TEMP VALUES (11,'2015-01-21 15:30:31.100','1')

    INSERT INTO #TEMP VALUES (12,'2015-01-26 15:27:54.880','1')

    INSERT INTO #TEMP VALUES (13,'2015-03-21 23:52:09.707','0')

    INSERT INTO #TEMP VALUES (14,'2015-03-23 13:07:29.353','1')

    INSERT INTO #TEMP VALUES (15,'2015-04-01 07:28:20.613','0')

    INSERT INTO #TEMP VALUES (16,'2015-04-05 04:56:28.927','1')

    In this scenario it starts with RecordID 1 with a date of 2015-01-01 08:11 .. Then RecordID 2, 3 and 4 fall within 7x24 hours. Unfortunately RecordID 5 with a date of 2015-01-11 00:46:37 also falls within the range of the previous record. Which is logical, because this is how it's set up.

    But I'd like to have RecordID 5 be marked as a new range record. So basically I'd like it to have the first marked IsDoubleLag record to be leading in marking the subsequent records. When a new break has been marked, it should restart the lag marking.

    I know how to fix it through cursor and use a running variable, but I'd like to solve it set based.

    Thanks!

  • Would you be comfortable using the Quirky Update? http://www.sqlservercentral.com/articles/T-SQL/68467/

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • This is a possibility using the Quirky Update.

    CREATE TABLE #TEMP (ID int PRIMARY KEY, DateCreated datetime, IsDoubleLag bit)

    INSERT INTO #TEMP VALUES (1,'2015-01-01 08:11:40.490','0')

    INSERT INTO #TEMP VALUES (2,'2015-01-04 02:29:47.777','1')

    INSERT INTO #TEMP VALUES (3,'2015-01-04 16:07:12.887','1')

    INSERT INTO #TEMP VALUES (4,'2015-01-06 07:26:52.377','1')

    INSERT INTO #TEMP VALUES (5,'2015-01-11 00:46:37.117','1')

    INSERT INTO #TEMP VALUES (6,'2015-01-11 09:58:24.640','1')

    INSERT INTO #TEMP VALUES (7,'2015-01-12 15:43:24.280','1')

    INSERT INTO #TEMP VALUES (8,'2015-01-12 19:41:46.213','1')

    INSERT INTO #TEMP VALUES (9,'2015-01-15 17:31:49.297','1')

    INSERT INTO #TEMP VALUES (10,'2015-01-19 01:20:35.487','1')

    INSERT INTO #TEMP VALUES (11,'2015-01-21 15:30:31.100','1')

    INSERT INTO #TEMP VALUES (12,'2015-01-26 15:27:54.880','1')

    INSERT INTO #TEMP VALUES (13,'2015-03-21 23:52:09.707','0')

    INSERT INTO #TEMP VALUES (14,'2015-03-23 13:07:29.353','1')

    INSERT INTO #TEMP VALUES (15,'2015-04-01 07:28:20.613','0')

    INSERT INTO #TEMP VALUES (16,'2015-04-05 04:56:28.927','1');

    SELECT *

    FROM #TEMP

    DECLARE @LastDateCreated datetime = '2000', @IsDoubleLag bit, @ID int

    UPDATE T SET

    @IsDoubleLag = IsDoubleLag = CASE WHEN DATEDIFF( MI, @LastDateCreated, DateCreated) > 24*7*60

    THEN 0 ELSE 1 END,

    @LastDateCreated = CASE WHEN DATEDIFF( MI, @LastDateCreated, DateCreated) > 24*7*60

    THEN DateCreated ELSE @LastDateCreated END,

    @ID = ID

    FROM #TEMP T WITH( TABLOCKX)

    OPTION( MAXDOP 1);

    SELECT *

    FROM #TEMP

    GO

    DROP TABLE #TEMP

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Recursive CTE will do the job and it can be faster then cursor but hardly outperforms quirk update.

    CREATE TABLE #TEMP (ID int, DateCreated datetime, IsDoubleLag bit)

    INSERT INTO #TEMP VALUES (1,'2015-01-01 08:11:40.490','0')

    INSERT INTO #TEMP VALUES (2,'2015-01-04 02:29:47.777','1')

    INSERT INTO #TEMP VALUES (3,'2015-01-04 16:07:12.887','1')

    INSERT INTO #TEMP VALUES (4,'2015-01-06 07:26:52.377','1')

    INSERT INTO #TEMP VALUES (5,'2015-01-11 00:46:37.117','1')

    INSERT INTO #TEMP VALUES (6,'2015-01-11 09:58:24.640','1')

    INSERT INTO #TEMP VALUES (7,'2015-01-12 15:43:24.280','1')

    INSERT INTO #TEMP VALUES (8,'2015-01-12 19:41:46.213','1')

    INSERT INTO #TEMP VALUES (9,'2015-01-15 17:31:49.297','1')

    INSERT INTO #TEMP VALUES (10,'2015-01-19 01:20:35.487','1')

    INSERT INTO #TEMP VALUES (11,'2015-01-21 15:30:31.100','1')

    INSERT INTO #TEMP VALUES (12,'2015-01-26 15:27:54.880','1')

    INSERT INTO #TEMP VALUES (13,'2015-03-21 23:52:09.707','0')

    INSERT INTO #TEMP VALUES (14,'2015-03-23 13:07:29.353','1')

    INSERT INTO #TEMP VALUES (15,'2015-04-01 07:28:20.613','0')

    INSERT INTO #TEMP VALUES (16,'2015-04-05 04:56:28.927','1');

    with steps as (

    select t1.id as root, t1.id, t1.IsDoubleLag, nId

    from #TEMP t1

    outer apply (select top(1) id as nId

    from #TEMP t2

    where t2.DateCreated>dateadd(day,7, t1.DateCreated)

    order by t2.DateCreated) t2

    ), path as (

    --choose roots

    select * from steps

    where IsDoubleLag = 0

    --stepping down the paths

    union all

    select t1.root, t2.id, t2.IsDoubleLag, t2.nid

    from path t1

    join steps t2 on t2.id = t1.nId

    )

    select root, id as fromid, nid-1 as toid, IsDoubleLag

    from path

    order by root, id

    And well, it leaves #temp intact for further investigations. 🙂

  • Interesting solution, though this only works if the ID's are sequential. I've tested this solution when ID's are not sequential (+1) .. Then then nid-1 doesn't work obviously.

    Maybe I should replace the nid-1 with a LAG() function of some kind.

  • It may be you needn't. Toid is meant to be used in a expression kinda select.. where .. fromid <x.id<=toid. And yes, LAG() or something is needed to get exactly the last row within (fromid,toid) when ids aren't sequential.

    P.S. And if the oder of ids doesn't follow the order of DateCreated,

    then query should use those dates, fromdate, todate instead of ids.

  • Great, thanks so much for all you guys' help! Much appreciated. I'll try to give some karma back.

  • This works with sql server 2008 too.

    create table #dates

    (

    RecordId int primary key clustered

    ,RecordDate datetime

    );

    insert #dates

    (RecordId, RecordDate)

    values (1, '2015-04-01 14:00'),

    (2, '2015-04-03 15:00'),

    (3, '2015-04-03 16:05'),

    (4, '2015-04-03 19:23'),

    (5, '2015-04-06 09:15'),

    (6, '2015-04-06 11:30'),

    (7, '2015-04-07 12:00'),

    (8, '2015-04-09 15:15');

    with S as (select *,row_number() over (partition by convert(date,RecordDate) order by RecordId ) as Num from #dates)

    select RecordId, RecordDate from S where num=1

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply