Help in Date spans for continuoes and regular dates spans

Question

Post reply

Help in Date spans for continuoes and regular dates spans

dhanekulakalyan

Default port

Points: 1462

June 16, 2020 at 2:31 am

#3763345

Hello all,

I would like to take input scripts and generate output data as given below. output is expected as if member is continuously enrolled without any gap in enrollment get min and max and if there is enrollment gap leave as is. for any given member there are no multiple spans just leave as is. for id 333 there are no multiple spans just leave as is and for id 222 bring one final line as given in the output data. for id 111 merge one span but not other span because one span is continues where as other spans is not continous , please refer output table for reference in the below scripts.

Any help is appreciated.

create table #test1

(id int,startdate datetime ,enddate datetime)

insert into #test1

select 111,'2016-12-01 00:00:00','2020-03-31 00:00:00'

insert into #test1

select 111,'2020-05-01 00:00:00','2020-05-31 00:00:00'

insert into #test1

select 111,'2020-06-01 00:00:00','2078-12-31 00:00:00'

insert into #test1

select 222,'2020-01-01 00:00:00','2020-02-29 00:00:00'

insert into #test1

select 222,'2020-03-01 00:00:00','2078-12-31 00:00:00'

insert into #test1

select 333,'2015-01-01 00:00:00', '2078-12-31 00:00:00'

--input table

select * from #test1

create table #test2

(id int,startdate datetime ,enddate datetime)

insert into #test2

select 111,'2016-12-01 00:00:00','2020-03-31 00:00:00'

insert into #test2

select 111,'2020-05-01 00:00:00','2078-12-31 00:00:00'

insert into #test2

select 222,'2020-01-01 00:00:00','2078-12-31 00:00:00'

insert into #test2

select 333,'2015-01-01 00:00:00', '2078-12-31 00:00:00'

--output table

select * from #test2?

This topic was modified 4 years, 5 months ago by dhanekulakalyan.
This topic was modified 4 years, 5 months ago by Steve Jones - SSC Editor. Reason: Formatted code

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

ktflash Default port Points: 1400 More actions · Answer 1

The first thing i could think of seems to work but its not pretty i guess and im sure someone has a better solution, so lets see:

My Approach:

get the next startdate partitioned by ID
get difference between Enddate of current Row and Startdate of next Row, set differece = 1 if there is no next row so it counts as next day (continuesly)
Group the data by ID and Datedifference

select id,startdate=min(startdate),enddate=max(enddate)
from (select *,Diff=isnull(DATEDIFF(DD,enddate,Test),1)
from (
select *,Test=LEAD(startdate) OVER(PARTITION BY ID ORDER BY ID,startdate)
from #test1
 ) Previous_Date
  ) Date_Difference

group by id,Diff
order by 1,3

I want to be the very best
Like no one ever was

https://www.sqlservercentral.com/articles/forum-etiquette-how-to-post-datacode-on-a-forum-to-get-the-best-help

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 2

https://www.sqlservercentral.com/forums/topic/consolidate-overlapping-date-periods

SELECT ID, StartDate = MIN(ts), EndDate = MAX(ts)
FROM (
SELECT ID, ts, Snapper = (ROW_NUMBER() OVER(ORDER BY ID, ts)+1)/2
FROM (
SELECT *, se = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ts2, Type DESC)
FROM ( 
SELECT ID, ts = StartDate, ts2 = StartDate, Type = 1
,e = NULL
,s = (2*ROW_NUMBER() OVER (PARTITION BY ID ORDER BY StartDate))-1
FROM #test1
UNION ALL
SELECT ID, ts = EndDate, ts2 = DATEADD(DAY,1,EndDate), Type = -1
,e = (2*ROW_NUMBER() OVER (PARTITION BY ID ORDER BY EndDate))
,s = NULL
FROM #test1
) d
) e
WHERE se IN (e,s)
) f
GROUP BY ID, Snapper
ORDER BY id, StartDate

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

dhanekulakalyan Default port Points: 1462 More actions · Answer 3

Thank you for your reply and help, this is working but I cannot understand why we are multiplying with 2* to the row number for e and s columns and then again /2 row number for snapper columns, it is so confusing.

Also I have 3 miliion rows of actual data, do you think this approach is good to follow and will work fine with any issues. Thanks again

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 4

dhanekulakalyan wrote:

Thank you for your reply and help, this is working but I cannot understand why we are multiplying with 2* to the row number for e and s columns and then again /2 row number for snapper columns, it is so confusing.
Also I have 3 miliion rows of actual data, do you think this approach is good to follow and will work fine with any issues. Thanks again

Try testing with say 100,000 rows first to get an idea of performance - select them out into a temp table ordered by ID so that you get all members of each ID partition except perhaps the last. A clustered index on ID will help a little with the query. I can't say how long it will take. The query is explained in the link I posted earlier, but the 2* for a and s columns is self explanatory if you execute the innermost query - I wrote it like this rather than as chained CTE's so you could do this. Any further questions, just holler.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Steve Collins SSCrazy Points: 2157 More actions · Answer 5

To resolve for arbitrary relationships in the data requires hierarchical recursion, no? The OP's question and Phil's too

drop table if exists dbo.test_1;
go
create table dbo.test_1(id int,startdate datetime ,enddate datetime)
go

insert dbo.test_1(id, startdate, enddate) values
(111,'2016-12-01 00:00:00','2020-03-31 00:00:00'),
(111,'2020-05-01 00:00:00','2020-05-31 00:00:00'),
(111,'2020-06-01 00:00:00','2078-12-31 00:00:00'),
(111,'2079-06-01 00:00:00','2080-12-31 00:00:00'),
(111,'2081-01-01 00:00:00','2083-11-21 00:00:00'),
(111,'2083-11-22 00:00:00','2086-12-31 00:00:00'),
(111,'2089-06-01 00:00:00','2090-12-31 00:00:00'),
(222,'2020-01-01 00:00:00','2020-02-29 00:00:00'),
(222,'2020-03-01 00:00:00','2078-12-31 00:00:00'),
(333,'2015-01-01 00:00:00', '2078-12-31 00:00:00');

drop function if exists dbo.test_downlines;
go
create function dbo.test_downlines(
  @idint,
  @startdate    datetime)
returns table as 
return
with 
base_cte(parent_id, match_dt, id, startdate, enddate) as (
   select
      lag(id) over (partition by id order by id, startdate),
      iif((lag(id) over (partition by id order by id, startdate) is null), null, dateadd(dd, -1, startdate)),
      t1.*
from
  dbo.test_1 t1),
recur_cte(parent_id, match_dt, id, startdate, enddate, h_level) as (
   select
    *,
      cast(0 as int) 
from
  base_cte t1
    where
      id=@id
      and startdate=@startdate
    union all
   select
    t1.*,
      rc.h_level+1
from
  base_cte t1
     join
      recur_cte rc on t1.parent_id=rc.id
                      and t1.match_dt=rc.enddate)
select
  max(rc.enddate) enddate
from
  recur_cte rc;
go

;with
base_cte(id, startdate, enddate) as (
   select t1.*
from
  dbo.test_1 t1
where not exists
  (select 1 from dbo.test_1 in_t1 where t1.id=in_t1.id and t1.startdate=dateadd(dd, 1, in_t1.enddate)))
select
  bc.*, td.enddate range_end_dt
from
  base_cte bc
 cross apply
  dbo.test_downlines(bc.id, bc.startdate) td;

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Jeffrey Williams SSC Guru Points: 90303 More actions · Answer 6

This seems to be a typical gaps & islands issue...here is another method:

Declare @test1 Table (id int, startdate datetime, enddate datetime);

 Insert Into @test1 (id, startdate, enddate)
 Values (111, '2016-12-01 00:00:00', '2020-03-31 00:00:00')
      , (111,'2020-05-01 00:00:00','2020-05-31 00:00:00')
      , (111,'2020-06-01 00:00:00','2078-12-31 00:00:00')
      , (222,'2020-01-01 00:00:00','2020-02-29 00:00:00')
      , (222,'2020-03-01 00:00:00','2020-05-31 00:00:00')
      , (222,'2020-07-01 00:00:00','2078-12-31 00:00:00')
      , (333,'2015-01-01 00:00:00', '2015-05-31 00:00:00')
      , (333,'2015-06-01 00:00:00', '2015-12-31 00:00:00')
      , (333,'2016-01-01 00:00:00', '2019-10-31 00:00:00')
      , (333,'2020-01-01 00:00:00', '2078-12-31 00:00:00');

Declare @minStartDate datetime = (Select min(t.startdate) From @test1 t)
      , @maxEndDate datetime = (Select max(t.enddate) From @test1 t);

   With t (n)
     As (
 Select t.n
   From (Values (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) As t(n)
        )
      , dates (FromDate)
     As (
 Select Top (datediff(day, @minStartDate, @maxEndDate) + 1)
        dateadd(day, checksum(row_number() over(Order By @@spid)) - 1, @minStartDate)
   From t t1, t t2, t t3, t t4
        )
      , dateGroups
     As (
 Select *
      , dateGroup = dateadd(day, -row_number() over(Partition By t1.id Order By d.FromDate), d.FromDate)
   From @test1          t1
  Inner Join dates       d On d.FromDate Between t1.startdate And t1.enddate 
        )
 Select dg.id
      , StartDate = min(dg.startdate)
      , EndDate = max(dg.enddate)
   From dateGroups      dg
  Group By
        dg.id
      , dg.dateGroup
  Order By
        dg.id
      , StartDate

I added a few additional rows for 222 and 333 - for 333 there are 3 rows that are continuous and the 4th row with a gap.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 7

Here's a little sample data generator for anyone wishing to play along:

IF OBJECT_ID('TEMPDB..#test1') IS NOT NULL DROP TABLE #test1
SELECT RowID = ROW_NUMBER() OVER(ORDER BY ID, Startdate, Enddate), 
ID, Startdate, Enddate
INTO #test1
FROM (VALUES 
(3565, '20111101', '20130128'),(3565, '20120101', '20130128'),(3565, '20120405', '20120630'), 
(3565, '20120525', '20120630'),(3565, '20130125', '20130128'),(3565, '20140220', '20140313'), 
(3577, '20101202', '20150120'),(3612, '20111101', '20130206'),(3612, '20111101', '20130807'), 
(3612, '20111101', '20140731'),(3612, '20120823', '20120905'),(3612, '20120914', '20121031'), 
(3612, '20121101', '20121128'),(3612, '20121206', '20121224'),(3612, '20121226', '20130109'), 
(3612, '20130207', '20140731'),(3724, '20111101', '20130807'),(3724, '20120426', '20120508'), 
(3724, '20120515', '20120531'),(3724, '20120524', '20120531'),(3724, '20120607', '20120706'), 
(3724, '20120614', '20120713'),(3724, '20120802', '20120831'),(3724, '20120901', '20120930'), 
(3730, '20130516', '20130605'),(3730, '20130516', '20130807'),(3730, '20130516', '20131219'), 
(3730, '20130620', '20130704'),(3730, '20130711', '20130724'),(3730, '20130808', '20131219'), 
(3730, '20131212', '20140612'),(3730, '20140501', '20140528'),(3730, '20140619', '20141217'), 
(3730, '20140724', '20140820'),(3730, '20140814', '20140910'),(3730, '20140904', '20141001'), 
(3730, '20140911', '20141008'),(3730, '20141002', '20141029'),(3730, '20141106', '20141203'), 
(3730, '20141218', '20150617'),(3730, '20150129', '20150225'),(3730, '20150205', '20150304'), 
(3730, '20150305', '20150401'),(3730, '20150402', '20150429'),(3730, '20150430', '20150527'), 
(3730, '20150618', '20151223'),(3730, '20150723', '20150819'),(3730, '20150820', '20150916'), 
(3730, '20150917', '20151014'),(3730, '20151022', '20151118'),(3730, '20160107', '20160203'), 
(3730, '20160107', '20160629'),(3730, '20160204', '20160302'),(3730, '20160303', '20160316'), 
(3730, '20160303', '20160406'),(3730, '20160317', '20160330'),(3730, '20160414', '20160629'), 
(3730, '20160707', '20160803'),(3730, '20160804', '20160817'),(3730, '20160804', '20160831'), 
(3730, '20160804', '20160914'),(3730, '20160901', '20160914'),(3730, '20160901', '20160928'), 
(3730, '20160915', '20170201'),(3730, '20160915', '20170215'),(3730, '20161027', '20161109'), 
(3730, '20161027', '20161123'),(3730, '20161124', '20161207'),(3730, '20161124', '20161221'), 
(3730, '20170105', '20170201'),(3730, '20170119', '20170215'),(3730, '20170202', '20170215'), 
(3730, '20170216', '20170510'),(3730, '20170427', '20170524'),(3730, '20170511', '20170719'), 
(3730, '20170629', '20170630'),(3730, '20170720', '20170802'),(3730, '20170728', '20170729'), 
(3730, '20170803', '20171025'),(3730, '20170817', '20171025'),(3730, '20171026', '20171206'), 
(3730, '20171207', '20180103'),(3730, '20180104', '20180228'),(3730, '20180301', '20180523'), 
(3730, '20180524', '20180801'),(3730, '20180802', '20181024'),(3730, '20181025', '20190619')
) d (ID, Startdate, Enddate) 

CREATE CLUSTERED INDEX cx_Stuff ON #test1 (ID, StartDate)

SELECT * FROM #test1

My query generates the following output which looks correct:

IDStartDateEndDate
35652011-11-012013-01-28
35652014-02-202014-03-13
35772010-12-022015-01-20
36122011-11-012014-07-31
37242011-11-012013-08-07
37302013-05-162014-06-12
37302014-06-192015-12-23
37302016-01-072016-06-29
37302016-07-072019-06-19

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 8

dhanekulakalyan wrote:

Thank you for your reply and help, this is working but I cannot understand why we are multiplying with 2* to the row number for e and s columns and then again /2 row number for snapper columns, it is so confusing.
Also I have 3 miliion rows of actual data, do you think this approach is good to follow and will work fine with any issues. Thanks again

A sample data set of 4,000 rows completes in about 40ms using my query, on this arbitrary system.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Steve Collins SSCrazy Points: 2157 More actions · Answer 9

The OP wrote: "output is expected as if member is continuously enrolled without any gap in enrollment get min and max and if there is enrollment gap leave as is." Is the sample data above representative of the OP's question? None of the sample data provided by the OP has overlapping date ranges. Can a "membership" overlap itslef? The OP's definition of "continuous" appears to be the difference between startdate and enddate (between "memberships") is 1 day. If that condition isn't met then leave the row(s) as is. According to that definition my code appears to work and the result is:

idstartdateenddate        range_end_dt
35652011-11-012013-01-282013-01-28
35652012-01-012013-01-282013-01-28
35652012-04-052012-06-302012-06-30
35652012-05-252012-06-302012-06-30
35652013-01-252013-01-282013-01-28
35652014-02-202014-03-132014-03-13
35772010-12-022015-01-202015-01-20
36122011-11-012013-02-062014-07-31
36122011-11-012013-08-072014-07-31
36122011-11-012014-07-312014-07-31
36122012-08-232012-09-052012-09-05
36122012-09-142012-10-312012-11-28
36122012-12-062012-12-242012-12-24
36122012-12-262013-01-092013-01-09
37242011-11-012013-08-072013-08-07
37242012-04-262012-05-082012-05-08
37242012-05-152012-05-312012-05-31
37242012-05-242012-05-312012-05-31
37242012-06-072012-07-062012-07-06
37242012-06-142012-07-132012-07-13
37242012-08-022012-08-312012-09-30
37302013-05-162013-06-052013-12-19
37302013-05-162013-08-072013-12-19
37302013-05-162013-12-192013-12-19
37302013-06-202013-07-042013-07-04
37302013-07-112013-07-242013-07-24
37302013-12-122014-06-122014-06-12
37302014-05-012014-05-282014-05-28
37302014-06-192014-12-172015-12-23
37302014-07-242014-08-202014-08-20
37302014-08-142014-09-102014-10-08
37302014-09-042014-10-012014-10-29
37302014-11-062014-12-032014-12-03
37302015-01-292015-02-252015-02-25
37302015-02-052015-03-042015-05-27
37302015-07-232015-08-192015-10-14
37302015-10-222015-11-182015-11-18
37302016-01-072016-02-032016-06-29
37302016-01-072016-06-292016-06-29
37302016-04-142016-06-292016-06-29
37302016-07-072016-08-032019-06-19
37302016-10-272016-11-092016-12-21
37302016-10-272016-11-232016-12-21
37302017-01-052017-02-012019-06-19
37302017-01-192017-02-152019-06-19
37302017-04-272017-05-242017-05-24
37302017-06-292017-06-302017-06-30
37302017-07-282017-07-292017-07-29
37302017-08-172017-10-252019-06-19

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Steve Collins SSCrazy Points: 2157 More actions · Answer 10

The query could be without the cte

select
  t1.id, cast(t1.startdate as date) startdate, cast(t1.enddate as date) enddate, cast(td.enddate as date) range_end_dt
from
  dbo.test_1 t1
 cross apply
  dbo.test_downlines(t1.id, t1.startdate) td
where not exists
  (select 1 from dbo.test_1 in_t1 where t1.id=in_t1.id and t1.startdate=dateadd(dd, 1, in_t1.enddate))
order by 1, 2;

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 11

Steve Collins wrote:

The query could be without the cte

select
  t1.id, cast(t1.startdate as date) startdate, cast(t1.enddate as date) enddate, cast(td.enddate as date) range_end_dt
from
  dbo.test_1 t1
 cross apply
  dbo.test_downlines(t1.id, t1.startdate) td
where not exists
  (select 1 from dbo.test_1 in_t1 where t1.id=in_t1.id and t1.startdate=dateadd(dd, 1, in_t1.enddate))
order by 1, 2;

Can you write it without the function?

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Steve Collins SSCrazy Points: 2157 More actions · Answer 12

ChrisM@Work wrote:

Can you write it without the function?

In theory it should be possible but maybe not easily. The tvf is parameterized so I think to remove it would require splitting it into two nested CROSS APPLY statements. Yes the OP asked to resolve the example in temp table and my code only works against a physical table. Fudged it because the OP also said "I have 3 miliion rows of actual data" which seems likely 🙂 to be in a physical table somewhere 🙂

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

jcelko212 32090 SSCrazy Eights Points: 9303 More actions · Answer 13

Did you know that a table must have a key. by definition? What you posted can never have a key goes all the columns can be NULL! Furthermore, an identifier cannot be numeric because you don't do calculations on it and it is measured on a nominal scale. We also have a date data type now, so there's no need to use the old Sybase datetime from 30 years ago. Nor should you be using the old row at a time. Insertion syntax from Sybase. Finally, I'm going to assume that you don't want an event to end before it starts. So you need a constraint. Remember, SQL is a declarative language. Most of the work is done in the DDL.

In short, you just described a deck of old punchcards written in SQL. Let's fix that:

CREATE TABLE Events

(event_id CHAR(3) NOT NULL,

event_start_date DATE NOT NULL,

PRIMARY KEY (event_id, event_start_date), -- not an option!!

event_end_date DATE NOT NULL,

CHECK(event_start_date <= event_end_date));

INSERT INTO Events

VALUES

('111', '2016-12-01', '2020-03-31'),

('111', '2020-05-01', '2020-05-31'),

('111', '2020-06-01', '2078-12-31'),

('222', '2020-01-01', '2020-02-29'),

('222', '2020-03-01', '2078-12-31')'

('333', '2015-01-01', '2078-12-31'),

Your mindset is still thinking of punchcards or magnetic tape files, where records are inserted one at a time. But SQL is based on tables and rose which are completely different. The table constructor, values, puts them all in at once and the insertion can be optimized.

Please read some of the books by Chris Date. He discusses why you should never have two tables with the same structure in the same schema. What you are doing is mimicking the way we did magnetic tape files back the 1960s. In SQL we can do a view, or update a base table. You are using SQL as if you were hanging a new tape on a new tape drive and doing a file merge. Your mindset is still locked in physical models of data.

My guess is you don't want that second table, but would rather update the existing Events base table. Google around for "islands and gaps" for various solutions. The lead () and lag () functions will be very useful for you. You might want to write a stored procedure that will find an existing event duration and do an update to it. If it qualifies

Please post DDL and follow ANSI/ISO standards when asking for help.