Convert Date List to Date Range

Question

Post reply

Convert Date List to Date Range

Coconut

Ten Centuries

Points: 1087
More actions
June 30, 2016 at 11:13 pm

#323664

Hello,
I have a table that a record gets inserted into every time a member subscribes or unsubscribes a list.
Below is the sample data
create table ##TEST (
IDint,
FLAGchar(1),
DTdate)
insert into ##TEST select 1, 'S', '2013-01-01'-- SUBSCRIBED
insert into ##TEST select 1, 'U', '2013-06-01'-- UNSUBSCRIBED
insert into ##TEST select 1, 'S', '2014-07-01'-- SUBSCRIBED
insert into ##TEST select 1, 'U', '2014-12-01'-- UNSUBSCRIBED
insert into ##TEST select 1, 'S', '2016-01-01'-- SUBSCRIBED
It looks as below
ID FLAG DT
----------- ---- ----------
1 S 2013-01-01
1 U 2013-06-01
1 S 2014-07-01
1 U 2014-12-01
1 S 2016-01-01
I'd like to convert the data list into a range as below
IDEffectiveDateTermDate
-----------------------------------
12013-01-012013-06-01
12014-07-012014-12-01
12016-01-01null
Can someone please help me how to do this? I'd greatly appreciate your help.

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply

Coconut Ten Centuries Points: 1087 More actions · Answer 1

This is what I currently have but it will have a performance issue as the source table might have millions records.

select

a.ID,

a.EfftDT,

isnull(a.TermDT,'9999-01-01')

from (

select

a.ID,

a.DT as EfftDT,

b.DT as TermDT,

row_number() over (partition by a.ID, a.DT order by datediff(day,a.DT,b.DT)) as RNK

from ##TEST a with (nolock)

left join ##TEST b with (nolock) on a.ID = b.ID and b.FLAG = 'U' and a.DT < b.DT

where

a.FLAG = 'S') a

where

a.RNK = 1

Coconut Ten Centuries Points: 1087 More actions · Answer 2

Another version. I think this will minimize the number of records returned from the join. I am wondering if there is a better way.

with CTE (ID, DT, RNK)

as (

select

a.ID,

a.DT,

row_number() over (partition by ID order by a.DT) as RNK

from ##TEST a

where

a.FLAG = 'U')

select

a.ID,

a.DT as CURR_DT,

isnull(b.DT,'1900-01-01') as PREV_DT

into ##TEMP

from CTE a

left join CTE b on a.ID = b.ID and a.RNK = b.RNK + 1

select

a.ID,

a.DT as EFFT_DT,

isnull(b.CURR_DT,'9999-01-01') as TERM_DT

from ##TEST a

left join ##TEMP b on a.ID = b.ID and a.DT between b.PREV_DT and b.CURR_DT

where

a.FLAG = 'S'

pietlinden SSC Guru Points: 63633 More actions · Answer 3

If you're using 2012 or later, you can use LAG.

SELECT ID

, FLAG

, LAG(DT, 1) OVER (PARTITION BY ID ORDER BY DT) AS EffectiveDate

, DT AS TermDate

FROM ##Test;

Note that I'm assuming (and that's a dangerous thing to do) that the Flag alternates between 'S' and 'U'. If that assumption can ever be false, then LAG won't work for you, and you'd probably have to use a CTE to get the latest previous record that was not equal to the current status/flag.

Zidar SSChasing Mays Points: 651 More actions · Answer 4

@Pietlinden: your query returns this:

ID EffectiveDate TermDate
----------- ------------- ----------
1 NULL 2013-01-01
1 2013-01-01 2013-06-01
1 2013-06-01 2014-07-01
1 2014-07-01 2014-12-01
1 2014-12-01 2016-01-01
(5 row(s) affected)

The first row, ( 1, NULL, 2013-01-01) it totally off. 2013-01-01 is the first S date, and it cannot be TerminationDate. Then second row says that ID=1 started subscription on 2013-06-01 as EffectiveDate, and ended it on 2013-06-01.

The third row is again incorrect - it seems that ID=1 subscribed on 2013-06-01, then unsubscribed on 2014-07-01. the opposite is true. In other words, every second row returned by your query is incorrect. To make it work, you should simply hide odd rows (1,3,5) and leave even rows (2,4,6). Still, subscription on 2016 will not be in the result set.

Instead of trying to fix the query, I suggest this solution. First, add some more rows, to be closer to reality:

insert into ##TEST select 2, 'S', '2013-02-01'-- SUBSCRIBED

insert into ##TEST select 2, 'U', '2013-08-01'-- UNSUBSCRIBED

insert into ##TEST select 2, 'S', '2014-09-01'-- SUBSCRIBED

insert into ##TEST select 2, 'U', '2014-12-01'-- UNSUBSCRIBED

insert into ##TEST select 2, 'S', '2016-01-01'-- SUBSCRIBED

insert into ##TEST select 2, 'U', '2016-01-29'-- SUBSCRIBED

ID = 2 has all subscriptions terminated, unlike ID 1 where the last subscription is still in effect.

Here is the query that does the work:

WITH SubscribeDates AS

(

SELECT

ID, FLAG, DT

, Seq = row_number() OVER (ORDER BY DT)

FROM ##TEST

WHERE Flag = 'S'

) , UnsubscribeDAtes AS

(

SELECT

ID, FLAG, DT

, Seq = row_number() OVER (ORDER BY DT)

FROM ##TEST

WHERE Flag = 'U'

)

SELECT S.ID, S.DT AS EffectiveDate

, U.DT AS TermDate

FROM SubscribeDates AS S

LEFT JOIN UnsubscribeDAtes AS U ON S.Seq = U.Seq

ORDER BY ID, EffectiveDate

;

which returns result:

ID EffectiveDate TermDate

----------- ------------- ----------

1 2013-01-01 2013-06-01

1 2014-07-01 2014-12-01

1 2016-01-01 NULL

2 2013-02-01 2013-08-01

2 2014-09-01 2014-12-01

2 2016-01-01 2016-01-29

(6 row(s) affected)

Returned rows are telling us:

ID = 1 subscribed on 2013-01-01 then terminated subscription on 2013-06-01.

ID = 1 subscribed on 2014-07-01 then terminated subscription on 2014-12-01.

ID = 1 subscribed on 2016-01-01 and the subscription has been active since.

ID = 2 subscribed on 2013-02-01, then terminated subscription on 2013-08-01.

and so on.

We must assume that the system will somehow enforce 'S' being first FLAG for each new ID, and after that, 'S' and 'F' alternate each other. In other words, start with S and never allow two S or two U for two adjacent rows for any given ID.

🙂

Coconut Ten Centuries Points: 1087 More actions · Answer 5

Coconut

Ten Centuries

Points: 1087

July 5, 2016 at 2:40 pm

#1888568

Thank you for your reply. It was very helpful.

Coconut Ten Centuries Points: 1087 More actions · Answer 6

Coconut

Ten Centuries

Points: 1087

July 5, 2016 at 2:40 pm

#1888569

Thank you.

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 7

If you have the ability to change your transactional code I would suggest this, only because subscriptions are always paired with cancellations.

create table ##TEST (ID int, SubscribeDate date, UnsubscribeDate date)

New subscriptions would be inserted, leaving UnsubscribeDate null.

When someone unsubscribes, this row is updated to populate the UnsubscribeDate.

Since subscriptions, unsubscriptions are paired, you would find this far more convenient for reporting purposes.

- Active subscriptions are those where UnsubscribeDate is null.

- Cancelled subscriptions have Unsubscribe date populated.

- Counts of active subscriptions over time become simpler.

- Calculating the length of subscriptions (and average length) become simpler

- Reporting PERFORMANCE will be better than any of the solutions using your current schema.

Just a thought.

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Zidar SSChasing Mays Points: 651 More actions · Answer 8

Proposed flat design does not solve any of probable issues with existing design. I mentioned the assumptions:

We must assume that the system will somehow enforce 'S' being first FLAG for each new ID, and after that, 'S' and 'F' alternate each other. In other words, start with S and never allow two S or two U for two adjacent rows for any given ID.[\quote]

The same applies to flat design:

- subscription date must not be NULL

- unsubscribe date must be after subscription date, or NULL

Even if the designer is aware of these constraints and is capable of enforcing them, information gathered from flat design will be incomplete. Multiple terminations and re-subscriptions are possible with suggested design, but no history is kept. Existing design, separate rows for each transaction (subscribe, unsubscribe) can be improved, without touching what is already there, by adding few more elements to enforce given constraints. Joe Celko has written several good articles on the topic of state transitions, so has Alex Kuznetsov and Dwaine Champs, either on this site or Small Talk.

Forum posts are not convenient place to discuss serious design topics. If I find some time I might write an article on state transitions and their proper use. Maybe then flat designs would not look as appealing.

🙂

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 9

I don't have time to write a paper at the moment, so this will have to suffice. 🙂

I've read Mr. Celko's books, and I have seen some truly execrable flat designs that failed miserably at trying to track a complicated process on a single row. I wholeheartedly agree that they are unsuitable for tracking a process of multiple steps. But consider the characteristics of a subscription. They include:

An identifier (SubscriptionID)

Who is subscribing (SubscriberID)

What service(s) they are subscribing to (omitted in the example below)

When the subscription begins (SubscribedDate) and

When the subscription ends (UnsubscribedDate)

-- when a subscription ends might even be known in advance, as in a one-year magazine subscription

When someone resubscribes after a subscription has ended, it is in fact a new subscription. So in this case I do not consider the inclusion of both start and stop times on the same row as being a "flat" table.

Perhaps my first post was too simplistic. Using a schema dictated by the above characteristics.

declare @SubscriberID int, @UnsubDate date

Create table #Subscriptions (SubscriptionID int identity(1,1), SubscriberID int, SubscribedDate date, UnSubscribedDate date)

Insert into #Subscriptions(SubscriberID, SubscribedDate)

values (1,'2/23/2015'), (2,'2/26/2015'), (3, '3/3/2015')

set @SubscriberID = 2

set @UnsubDate = '11/1/2015'

;with cte as (select top (1)-- with proper constraints, the top(1

SubscriptionID, UnSubscribedDate

from #Subscriptions

where SubscriberID = @SubscriberID

and SubscribedDate < = @UnsubDate

order by SubscriptionID desc)

update cte

set UnSubscribedDate = @UnsubDate

insert into #Subscriptions(SubscriberID, SubscribedDate)

values (4,'5/23/2016'), (2,'5/26/2016'), (5, '6/29/2016')

set @SubscriberID = 1

set @UnsubDate = '3/1/2016'

;with cte as (select top (1) SubscriptionID, UnSubscribedDate

from #Subscriptions

where SubscriberID = @SubscriberID

and SubscribedDate < = @UnsubDate

order by SubscriptionID desc)

update cte

set UnSubscribedDate = @UnsubDate

select * from #Subscriptions

select * from #Subscriptions order by SubscriberID, SubscribedDate

select sum(1) as ActiveSubscriptions

from #Subscriptions

where UnSubscribedDate is null

select SubscriberID as ActiveSubscriber, SubscriptionID as ActiveSubscription, SubscribedDate

from #Subscriptions

where UnSubscribedDate is null

order by SubscribedDate

select SubscriberID as Resubscriber, SubscriptionID as LastestSubscription, SubscribedDate as DateOfLatestSubscription

from #Subscriptions s1

where UnSubscribedDate is null

and exists(select 1 from #Subscriptions s2 where s1.SubscriberID = s2.SubscriberID and s1.SubscriptionID > s2.SubscriptionID)

select *, datediff(Day, SubscribedDate, COALESCE(UnsubScribedDate,Getdate())) as SubscriptionLength

from #Subscriptions

select Avg(datediff(Day, SubscribedDate, COALESCE(UnsubScribedDate,Getdate()))) as AverageSubscriptionLength

from #Subscriptions

Given, proper constraints should ensure that a Subscriber doesn't wind up with multiple active subscriptions. Not only is history preserved, but look at how simple the queries against that history become. All the select statements together took less than five minutes to create. They will also run efficiently. This is valuable, because simple questions should not take complex queries to answer. We write once and read a hundred times.

Sorry for being long-winded here. I just hate to see doctrine applied blindly.

Best regards, all.

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Zidar SSChasing Mays Points: 651 More actions · Answer 10

I agree with The Dixie Flatline. Flat design does make answering the question "What is the subscription status of any customer at any given time". Queries are definitely simpler than the ones provided for the original solution. Update when unsubscribing looks as bad as CTE queries for the original solution. In order to have easy query, we made updates more difficult.

The design is founded on assumption: "With proper constraints....". What would be those constraints? How can we implement them? In no particular order, I could think of:

1) There cannot be un-subscription without subscription (SubscribedDate IS NULL AND UnSubscribedDate IS NOT NULL)

2) UnSubscribedDate must be after SubscribedDate

3) At any given moment, each customer can have only one subscription (no overlap)

4) New subscription for a customer can be added only if the previous one has been canceled (this follows from 1)

Let's see what happens if we do not implement any constraints:

/*---

This is what we have if we run the script:

SubscriptionID SubscriberID SubscribedDate UnSubscribedDate

-------------- ------------ -------------- ----------------

1 1 2015-02-23 2016-03-01

2 2 2015-02-26 2015-11-01

3 3 2015-03-03 NULL

4 4 2016-05-23 NULL

5 2 2016-05-26 NULL

6 5 2016-06-29 NULL

(6 row(s) affected)

---*/

After adding a few records I got this:

SubscriptionID SubscriberID SubscribedDate UnSubscribedDate

-------------- ------------ -------------- ----------------

1 1 2015-02-23 2016-03-01

8 2 NULL 2015-10-24

2 2 2015-02-26 2015-11-01

7 2 2015-05-26 NULL

5 2 2016-05-26 NULL

3 3 2015-03-03 NULL

4 4 2016-05-23 NULL

6 5 2016-06-29 NULL

(8 row(s) affected)

We see 4 rows for customer ID = 2. The first one is OK - SubscriptionID = 2, subscribed on 2015-02-26, unsubscribed on 2015-11-01. Then we have one row where customer unsubscribed before any subscription at all, then one with overlapping subscription, and one subscription beginning before the previous finished. We don't want bad data, even if querying for reports is easy, wrong report is worse than no report at all. Therefore, we must implement given constraints.

How to implement given constraints?

1) There cannot be un-subscription without subscription:

CHECK ( SubscribedDate IS NOT NULL)

2) UnSubscribedDate must be after SubscribedDate :

CHECK ( SubscribedDate < UnSubscribedDate OR UnSubscribedDate IS NULL)

Constraints 2 and 3 are actually the same one. Good, one less to worry about. However, our life is not twice as easy because of that. How to enforce 2 or 3? That is where Joe's articles offer some hope. I'll add next post for that part.

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 11

Thank you for putting in the effort to create the constraints.

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 12

How to enforce 2 or 3?

CREATE UNIQUE INDEX IX_#subscriptions ON #subscriptions (SubscriberID, UnsubScribedDate)

will work for our example where the definition of an active subscription is one without an UnsubscribeDate. The unique index prevents insertion of a second row with a NULL UnsubscribeDate for a given user.

With constraints in place, the update statement is simply

update #subscriptions

set UnSubscribedDate = @UnsubDate

where SubscriberID = @SubscriberID

and UnsubScribeDate is null

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Zidar SSChasing Mays Points: 651 More actions · Answer 13

Very nice touch with UNIQUE (SubscriberID, UnsubScribedDate). That does solve part of the problem, but not everything. To recap, this is what happened so far:

-- Fresh start, with all correct updates:

-- Table, with all constraints so far:

IF Object_ID('tempdb..#Subscriptions') IS NOT NULL DROP TABLE #Subscriptions

GO

Create table #Subscriptions

( SubscriptionID int identity(1,1) NOT NULL

, SubscriberID int NOT NULL

, SubscribedDate date NOT NULL

, UnSubscribedDate date

, CONSTRAINT [PK_#Subscriptions] PRIMARY KEY (SubscriberID,SubscribedDate)

, CONSTRAINT [UNIQUE #Subscriptions (SubscriberID,UnSubscribedDate)]

UNIQUE (SubscriberID,UnSubscribedDate)

, CONSTRAINT [CK #Subscriptions (SubscribedDate < UnSubscribedDate)]

CHECK (SubscribedDate < UnSubscribedDate)

)

;

GO

Insert into #Subscriptions(SubscriberID, SubscribedDate)

values (1,'2/23/2015'), (2,'2/26/2015'), (3, '3/3/2015')

;

-- (3 row(s) affected)

;

UPDATE #Subscriptions

set UnSubscribedDate = '20151101'

WHERE SubscriptionID = 2

;

insert into #Subscriptions(SubscriberID, SubscribedDate)

values (4,'5/23/2016'), (2,'5/26/2016'), (5, '6/29/2016')

;

UPDATE #Subscriptions

set UnSubscribedDate = '20151101'

WHERE SubscriberID = 1 AND UnSubscribedDate IS NULL

;

-- So far so good. Cool.

SELECT *

FROM #Subscriptions

ORDER BY SubscriberID, SubscribedDate

;

/*--

SubscriptionID SubscriberID SubscribedDate UnSubscribedDate

-------------- ------------ -------------- ----------------

1 1 2015-02-23 2015-11-01

2 2 2015-02-26 2015-11-01

5 2 2016-05-26 NULL

3 3 2015-03-03 NULL

4 4 2016-05-23 NULL

6 5 2016-06-29 NULL

(6 row(s) affected)

--*/

Now let's try to break some rules - add subscription before previous one expired, for SubscriberID=2, SubscribedDate = '2016-05-26':

insert into #Subscriptions(SubscriberID, SubscribedDate)

values (2,'6/01/2016')

;

/*--

Msg 2627, Level 14, State 1, Line 1

Violation of UNIQUE KEY constraint

'UNIQUE #Subscriptions (SubscriberID,UnSubscribedDate)'.

Cannot insert duplicate key in object 'dbo.#Subscriptions'.

The duplicate key value is (2, <NULL>).

The statement has been terminated.

--*/

Cool, constraint [UNIQUE #Subscriptions (SubscriberID,UnSubscribedDate)]

prevented additional subscription before ending the previous one.

Let's close the subscription and try again.

-- This will work:

UPDATE #Subscriptions-- THsi should work because

set UnSubscribedDate = '20160610'-- UnSubscribedDate after '2016-05-26'

WHERE SubscriberID = 2 AND UnSubscribedDate IS NULL

;

/*--

SubscriptionID SubscriberID SubscribedDate UnSubscribedDate

-------------- ------------ -------------- ----------------

1 1 2015-02-23 2015-11-01

2 2 2015-02-26 2015-11-01

5 2 2016-05-26 2016-06-10

3 3 2015-03-03 NULL

4 4 2016-05-23 NULL

6 5 2016-06-29 NULL

(6 row(s) affected)

--*/

Let's try to add a subscription for SubscriberID, starting on 2016-05-30 which is before end of previous subscription ( 2016-06-10). The following UPADTE will work, no existing constraints validated. However, it will create illegal entry - new subscription starting before UnSubscribedDate from the previous one.

insert into #Subscriptions(SubscriberID, SubscribedDate)

values (2,'2016-05-30')

;

/*--

SubscriptionID SubscriberID SubscribedDate UnSubscribedDate

-------------- ------------ -------------- ----------------

1 1 2015-02-23 2015-11-01

2 2 2015-02-26 2015-11-01

5 2 2016-05-26 2016-06-10 = Jun 10, 2016

8 2 2016-05-30 NULL , started on May 30 2016

3 3 2015-03-03 NULL

4 4 2016-05-23 NULL

6 5 2016-06-29 NULL

(7 row(s) affected)

--*/

Obviously, we need more constraints. What we have so far is OK, but we need more. Time permitting, new post is coming this afternoon, if not, then on Monday I guess.

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 14

Zidar (7/8/2016)
Obviously, we need more constraints. What we have so far is OK, but we need more. Time permitting, new post is coming this afternoon, if not, then on Monday I guess.

maybe someone will knock up a decent test harness ??

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day