June 30, 2016 at 11:13 pm
Hello,
I have a table that a record gets inserted into every time a member subscribes or unsubscribes a list.
Below is the sample data
create table ##TEST (
IDint,
FLAGchar(1),
DTdate)
insert into ##TEST select 1, 'S', '2013-01-01'-- SUBSCRIBED
insert into ##TEST select 1, 'U', '2013-06-01'-- UNSUBSCRIBED
insert into ##TEST select 1, 'S', '2014-07-01'-- SUBSCRIBED
insert into ##TEST select 1, 'U', '2014-12-01'-- UNSUBSCRIBED
insert into ##TEST select 1, 'S', '2016-01-01'-- SUBSCRIBED
It looks as below
ID FLAG DT
----------- ---- ----------
1 S 2013-01-01
1 U 2013-06-01
1 S 2014-07-01
1 U 2014-12-01
1 S 2016-01-01
I'd like to convert the data list into a range as below
IDEffectiveDateTermDate
-----------------------------------
12013-01-012013-06-01
12014-07-012014-12-01
12016-01-01null
Can someone please help me how to do this? I'd greatly appreciate your help.
June 30, 2016 at 11:20 pm
This is what I currently have but it will have a performance issue as the source table might have millions records.
select
a.ID,
a.EfftDT,
isnull(a.TermDT,'9999-01-01')
from (
select
a.ID,
a.DT as EfftDT,
b.DT as TermDT,
row_number() over (partition by a.ID, a.DT order by datediff(day,a.DT,b.DT)) as RNK
from ##TEST a with (nolock)
left join ##TEST b with (nolock) on a.ID = b.ID and b.FLAG = 'U' and a.DT < b.DT
where
a.FLAG = 'S') a
where
a.RNK = 1
June 30, 2016 at 11:42 pm
Another version. I think this will minimize the number of records returned from the join. I am wondering if there is a better way.
with CTE (ID, DT, RNK)
as (
select
a.ID,
a.DT,
row_number() over (partition by ID order by a.DT) as RNK
from ##TEST a
where
a.FLAG = 'U')
select
a.ID,
a.DT as CURR_DT,
isnull(b.DT,'1900-01-01') as PREV_DT
into ##TEMP
from CTE a
left join CTE b on a.ID = b.ID and a.RNK = b.RNK + 1
select
a.ID,
a.DT as EFFT_DT,
isnull(b.CURR_DT,'9999-01-01') as TERM_DT
from ##TEST a
left join ##TEMP b on a.ID = b.ID and a.DT between b.PREV_DT and b.CURR_DT
where
a.FLAG = 'S'
July 1, 2016 at 12:05 am
If you're using 2012 or later, you can use LAG.
SELECT ID
, FLAG
, LAG(DT, 1) OVER (PARTITION BY ID ORDER BY DT) AS EffectiveDate
, DT AS TermDate
FROM ##Test;
Note that I'm assuming (and that's a dangerous thing to do) that the Flag alternates between 'S' and 'U'. If that assumption can ever be false, then LAG won't work for you, and you'd probably have to use a CTE to get the latest previous record that was not equal to the current status/flag.
July 5, 2016 at 2:08 pm
@pietlinden: your query returns this:
ID EffectiveDate TermDate
----------- ------------- ----------
1 NULL 2013-01-01
1 2013-01-01 2013-06-01
1 2013-06-01 2014-07-01
1 2014-07-01 2014-12-01
1 2014-12-01 2016-01-01
(5 row(s) affected)
The first row, ( 1, NULL, 2013-01-01) it totally off. 2013-01-01 is the first S date, and it cannot be TerminationDate. Then second row says that ID=1 started subscription on 2013-06-01 as EffectiveDate, and ended it on 2013-06-01.
The third row is again incorrect - it seems that ID=1 subscribed on 2013-06-01, then unsubscribed on 2014-07-01. the opposite is true. In other words, every second row returned by your query is incorrect. To make it work, you should simply hide odd rows (1,3,5) and leave even rows (2,4,6). Still, subscription on 2016 will not be in the result set.
Instead of trying to fix the query, I suggest this solution. First, add some more rows, to be closer to reality:
insert into ##TEST select 2, 'S', '2013-02-01'-- SUBSCRIBED
insert into ##TEST select 2, 'U', '2013-08-01'-- UNSUBSCRIBED
insert into ##TEST select 2, 'S', '2014-09-01'-- SUBSCRIBED
insert into ##TEST select 2, 'U', '2014-12-01'-- UNSUBSCRIBED
insert into ##TEST select 2, 'S', '2016-01-01'-- SUBSCRIBED
insert into ##TEST select 2, 'U', '2016-01-29'-- SUBSCRIBED
ID = 2 has all subscriptions terminated, unlike ID 1 where the last subscription is still in effect.
Here is the query that does the work:
WITH SubscribeDates AS
(
SELECT
ID, FLAG, DT
, Seq = row_number() OVER (ORDER BY DT)
FROM ##TEST
WHERE Flag = 'S'
) , UnsubscribeDAtes AS
(
SELECT
ID, FLAG, DT
, Seq = row_number() OVER (ORDER BY DT)
FROM ##TEST
WHERE Flag = 'U'
)
SELECT S.ID, S.DT AS EffectiveDate
, U.DT AS TermDate
FROM SubscribeDates AS S
LEFT JOIN UnsubscribeDAtes AS U ON S.Seq = U.Seq
ORDER BY ID, EffectiveDate
;
which returns result:
ID EffectiveDate TermDate
----------- ------------- ----------
1 2013-01-01 2013-06-01
1 2014-07-01 2014-12-01
1 2016-01-01 NULL
2 2013-02-01 2013-08-01
2 2014-09-01 2014-12-01
2 2016-01-01 2016-01-29
(6 row(s) affected)
Returned rows are telling us:
ID = 1 subscribed on 2013-01-01 then terminated subscription on 2013-06-01.
ID = 1 subscribed on 2014-07-01 then terminated subscription on 2014-12-01.
ID = 1 subscribed on 2016-01-01 and the subscription has been active since.
ID = 2 subscribed on 2013-02-01, then terminated subscription on 2013-08-01.
and so on.
We must assume that the system will somehow enforce 'S' being first FLAG for each new ID, and after that, 'S' and 'F' alternate each other. In other words, start with S and never allow two S or two U for two adjacent rows for any given ID.
🙂
July 5, 2016 at 2:40 pm
Thank you for your reply. It was very helpful.
July 5, 2016 at 2:40 pm
Thank you.
July 7, 2016 at 2:43 pm
If you have the ability to change your transactional code I would suggest this, only because subscriptions are always paired with cancellations.
create table ##TEST (ID int, SubscribeDate date, UnsubscribeDate date)
New subscriptions would be inserted, leaving UnsubscribeDate null.
When someone unsubscribes, this row is updated to populate the UnsubscribeDate.
Since subscriptions, unsubscriptions are paired, you would find this far more convenient for reporting purposes.
- Active subscriptions are those where UnsubscribeDate is null.
- Cancelled subscriptions have Unsubscribe date populated.
- Counts of active subscriptions over time become simpler.
- Calculating the length of subscriptions (and average length) become simpler
- Reporting PERFORMANCE will be better than any of the solutions using your current schema.
Just a thought.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
July 7, 2016 at 3:01 pm
Proposed flat design does not solve any of probable issues with existing design. I mentioned the assumptions:
We must assume that the system will somehow enforce 'S' being first FLAG for each new ID, and after that, 'S' and 'F' alternate each other. In other words, start with S and never allow two S or two U for two adjacent rows for any given ID.[\quote]
The same applies to flat design:
- subscription date must not be NULL
- unsubscribe date must be after subscription date, or NULL
Even if the designer is aware of these constraints and is capable of enforcing them, information gathered from flat design will be incomplete. Multiple terminations and re-subscriptions are possible with suggested design, but no history is kept. Existing design, separate rows for each transaction (subscribe, unsubscribe) can be improved, without touching what is already there, by adding few more elements to enforce given constraints. Joe Celko has written several good articles on the topic of state transitions, so has Alex Kuznetsov and Dwaine Champs, either on this site or Small Talk.
Forum posts are not convenient place to discuss serious design topics. If I find some time I might write an article on state transitions and their proper use. Maybe then flat designs would not look as appealing.
🙂
July 7, 2016 at 5:01 pm
I don't have time to write a paper at the moment, so this will have to suffice. 🙂
I've read Mr. Celko's books, and I have seen some truly execrable flat designs that failed miserably at trying to track a complicated process on a single row. I wholeheartedly agree that they are unsuitable for tracking a process of multiple steps. But consider the characteristics of a subscription. They include:
An identifier (SubscriptionID)
Who is subscribing (SubscriberID)
What service(s) they are subscribing to (omitted in the example below)
When the subscription begins (SubscribedDate) and
When the subscription ends (UnsubscribedDate)
-- when a subscription ends might even be known in advance, as in a one-year magazine subscription
When someone resubscribes after a subscription has ended, it is in fact a new subscription. So in this case I do not consider the inclusion of both start and stop times on the same row as being a "flat" table.
Perhaps my first post was too simplistic. Using a schema dictated by the above characteristics.
declare @SubscriberID int, @UnsubDate date
Create table #Subscriptions (SubscriptionID int identity(1,1), SubscriberID int, SubscribedDate date, UnSubscribedDate date)
Insert into #Subscriptions(SubscriberID, SubscribedDate)
values (1,'2/23/2015'), (2,'2/26/2015'), (3, '3/3/2015')
set @SubscriberID = 2
set @UnsubDate = '11/1/2015'
;with cte as (select top (1)-- with proper constraints, the top(1
SubscriptionID, UnSubscribedDate
from #Subscriptions
where SubscriberID = @SubscriberID
and SubscribedDate < = @UnsubDate
order by SubscriptionID desc)
update cte
set UnSubscribedDate = @UnsubDate
insert into #Subscriptions(SubscriberID, SubscribedDate)
values (4,'5/23/2016'), (2,'5/26/2016'), (5, '6/29/2016')
set @SubscriberID = 1
set @UnsubDate = '3/1/2016'
;with cte as (select top (1) SubscriptionID, UnSubscribedDate
from #Subscriptions
where SubscriberID = @SubscriberID
and SubscribedDate < = @UnsubDate
order by SubscriptionID desc)
update cte
set UnSubscribedDate = @UnsubDate
select * from #Subscriptions
select * from #Subscriptions order by SubscriberID, SubscribedDate
select sum(1) as ActiveSubscriptions
from #Subscriptions
where UnSubscribedDate is null
select SubscriberID as ActiveSubscriber, SubscriptionID as ActiveSubscription, SubscribedDate
from #Subscriptions
where UnSubscribedDate is null
order by SubscribedDate
select SubscriberID as Resubscriber, SubscriptionID as LastestSubscription, SubscribedDate as DateOfLatestSubscription
from #Subscriptions s1
where UnSubscribedDate is null
and exists(select 1 from #Subscriptions s2 where s1.SubscriberID = s2.SubscriberID and s1.SubscriptionID > s2.SubscriptionID)
select *, datediff(Day, SubscribedDate, COALESCE(UnsubScribedDate,Getdate())) as SubscriptionLength
from #Subscriptions
select Avg(datediff(Day, SubscribedDate, COALESCE(UnsubScribedDate,Getdate()))) as AverageSubscriptionLength
from #Subscriptions
Given, proper constraints should ensure that a Subscriber doesn't wind up with multiple active subscriptions. Not only is history preserved, but look at how simple the queries against that history become. All the select statements together took less than five minutes to create. They will also run efficiently. This is valuable, because simple questions should not take complex queries to answer. We write once and read a hundred times.
Sorry for being long-winded here. I just hate to see doctrine applied blindly.
Best regards, all.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
July 8, 2016 at 9:30 am
I agree with The Dixie Flatline. Flat design does make answering the question "What is the subscription status of any customer at any given time". Queries are definitely simpler than the ones provided for the original solution. Update when unsubscribing looks as bad as CTE queries for the original solution. In order to have easy query, we made updates more difficult.
The design is founded on assumption: "With proper constraints....". What would be those constraints? How can we implement them? In no particular order, I could think of:
1) There cannot be un-subscription without subscription (SubscribedDate IS NULL AND UnSubscribedDate IS NOT NULL)
2) UnSubscribedDate must be after SubscribedDate
3) At any given moment, each customer can have only one subscription (no overlap)
4) New subscription for a customer can be added only if the previous one has been canceled (this follows from 1)
Let's see what happens if we do not implement any constraints:
/*---
This is what we have if we run the script:
SubscriptionID SubscriberID SubscribedDate UnSubscribedDate
-------------- ------------ -------------- ----------------
1 1 2015-02-23 2016-03-01
2 2 2015-02-26 2015-11-01
3 3 2015-03-03 NULL
4 4 2016-05-23 NULL
5 2 2016-05-26 NULL
6 5 2016-06-29 NULL
(6 row(s) affected)
---*/
After adding a few records I got this:
SubscriptionID SubscriberID SubscribedDate UnSubscribedDate
-------------- ------------ -------------- ----------------
1 1 2015-02-23 2016-03-01
8 2 NULL 2015-10-24
2 2 2015-02-26 2015-11-01
7 2 2015-05-26 NULL
5 2 2016-05-26 NULL
3 3 2015-03-03 NULL
4 4 2016-05-23 NULL
6 5 2016-06-29 NULL
(8 row(s) affected)
We see 4 rows for customer ID = 2. The first one is OK - SubscriptionID = 2, subscribed on 2015-02-26, unsubscribed on 2015-11-01. Then we have one row where customer unsubscribed before any subscription at all, then one with overlapping subscription, and one subscription beginning before the previous finished. We don't want bad data, even if querying for reports is easy, wrong report is worse than no report at all. Therefore, we must implement given constraints.
How to implement given constraints?
1) There cannot be un-subscription without subscription:
CHECK ( SubscribedDate IS NOT NULL)
2) UnSubscribedDate must be after SubscribedDate :
CHECK ( SubscribedDate < UnSubscribedDate OR UnSubscribedDate IS NULL)
Constraints 2 and 3 are actually the same one. Good, one less to worry about. However, our life is not twice as easy because of that. How to enforce 2 or 3? That is where Joe's articles offer some hope. I'll add next post for that part.
July 8, 2016 at 10:12 am
Thank you for putting in the effort to create the constraints.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
July 8, 2016 at 10:26 am
How to enforce 2 or 3?
CREATE UNIQUE INDEX IX_#subscriptions ON #subscriptions (SubscriberID, UnsubScribedDate)
will work for our example where the definition of an active subscription is one without an UnsubscribeDate. The unique index prevents insertion of a second row with a NULL UnsubscribeDate for a given user.
With constraints in place, the update statement is simply
update #subscriptions
set UnSubscribedDate = @UnsubDate
where SubscriberID = @SubscriberID
and UnsubScribeDate is null
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
July 8, 2016 at 12:42 pm
Very nice touch with UNIQUE (SubscriberID, UnsubScribedDate). That does solve part of the problem, but not everything. To recap, this is what happened so far:
-- Fresh start, with all correct updates:
-- Table, with all constraints so far:
IF Object_ID('tempdb..#Subscriptions') IS NOT NULL DROP TABLE #Subscriptions
GO
Create table #Subscriptions
( SubscriptionID int identity(1,1) NOT NULL
, SubscriberID int NOT NULL
, SubscribedDate date NOT NULL
, UnSubscribedDate date
, CONSTRAINT [PK_#Subscriptions] PRIMARY KEY (SubscriberID,SubscribedDate)
, CONSTRAINT [UNIQUE #Subscriptions (SubscriberID,UnSubscribedDate)]
UNIQUE (SubscriberID,UnSubscribedDate)
, CONSTRAINT [CK #Subscriptions (SubscribedDate < UnSubscribedDate)]
CHECK (SubscribedDate < UnSubscribedDate)
)
;
GO
Insert into #Subscriptions(SubscriberID, SubscribedDate)
values (1,'2/23/2015'), (2,'2/26/2015'), (3, '3/3/2015')
;
-- (3 row(s) affected)
;
UPDATE #Subscriptions
set UnSubscribedDate = '20151101'
WHERE SubscriptionID = 2
;
insert into #Subscriptions(SubscriberID, SubscribedDate)
values (4,'5/23/2016'), (2,'5/26/2016'), (5, '6/29/2016')
;
UPDATE #Subscriptions
set UnSubscribedDate = '20151101'
WHERE SubscriberID = 1 AND UnSubscribedDate IS NULL
;
-- So far so good. Cool.
SELECT *
FROM #Subscriptions
ORDER BY SubscriberID, SubscribedDate
;
/*--
SubscriptionID SubscriberID SubscribedDate UnSubscribedDate
-------------- ------------ -------------- ----------------
1 1 2015-02-23 2015-11-01
2 2 2015-02-26 2015-11-01
5 2 2016-05-26 NULL
3 3 2015-03-03 NULL
4 4 2016-05-23 NULL
6 5 2016-06-29 NULL
(6 row(s) affected)
--*/
Now let's try to break some rules - add subscription before previous one expired, for SubscriberID=2, SubscribedDate = '2016-05-26':
insert into #Subscriptions(SubscriberID, SubscribedDate)
values (2,'6/01/2016')
;
/*--
Msg 2627, Level 14, State 1, Line 1
Violation of UNIQUE KEY constraint
'UNIQUE #Subscriptions (SubscriberID,UnSubscribedDate)'.
Cannot insert duplicate key in object 'dbo.#Subscriptions'.
The duplicate key value is (2, <NULL>).
The statement has been terminated.
--*/
Cool, constraint [UNIQUE #Subscriptions (SubscriberID,UnSubscribedDate)]
prevented additional subscription before ending the previous one.
Let's close the subscription and try again.
-- This will work:
UPDATE #Subscriptions-- THsi should work because
set UnSubscribedDate = '20160610'-- UnSubscribedDate after '2016-05-26'
WHERE SubscriberID = 2 AND UnSubscribedDate IS NULL
;
/*--
SubscriptionID SubscriberID SubscribedDate UnSubscribedDate
-------------- ------------ -------------- ----------------
1 1 2015-02-23 2015-11-01
2 2 2015-02-26 2015-11-01
5 2 2016-05-26 2016-06-10
3 3 2015-03-03 NULL
4 4 2016-05-23 NULL
6 5 2016-06-29 NULL
(6 row(s) affected)
--*/
Let's try to add a subscription for SubscriberID, starting on 2016-05-30 which is before end of previous subscription ( 2016-06-10). The following UPADTE will work, no existing constraints validated. However, it will create illegal entry - new subscription starting before UnSubscribedDate from the previous one.
insert into #Subscriptions(SubscriberID, SubscribedDate)
values (2,'2016-05-30')
;
/*--
SubscriptionID SubscriberID SubscribedDate UnSubscribedDate
-------------- ------------ -------------- ----------------
1 1 2015-02-23 2015-11-01
2 2 2015-02-26 2015-11-01
5 2 2016-05-26 2016-06-10 = Jun 10, 2016
8 2 2016-05-30 NULL , started on May 30 2016
3 3 2015-03-03 NULL
4 4 2016-05-23 NULL
6 5 2016-06-29 NULL
(7 row(s) affected)
--*/
Obviously, we need more constraints. What we have so far is OK, but we need more. Time permitting, new post is coming this afternoon, if not, then on Monday I guess.
July 8, 2016 at 12:50 pm
Zidar (7/8/2016)
Obviously, we need more constraints. What we have so far is OK, but we need more. Time permitting, new post is coming this afternoon, if not, then on Monday I guess.
maybe someone will knock up a decent test harness ??
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
Viewing 15 posts - 1 through 15 (of 19 total)
You must be logged in to reply to this topic. Login to reply