Trying to incorporate RANK or NTILE with LEAD function

Question

Post reply

Trying to incorporate RANK or NTILE with LEAD function

SoCal_DBD

Hall of Fame

Points: 3382
More actions
January 9, 2014 at 2:36 pm

#288113

My scenario is as follows: We have accounts that pay for a particular "premium" service. It's entirely possible an account paid for this service for three consecutive months in 2013, then stopped paying, then started paying again. Why I'm trying to establish is, for the FIRST period of time the accout paid for this service, for how many consecutive months did they pay? Here is my test data:
if object_id('tempdb..#SampleData') is not null
drop table #SampleData
go
if object_id('tempdb..#DateAnalysis') is not null
drop table #DateAnalysis
go
-- Create temp tables for sample data
create table #SampleData
(AccountID int,
RandomDate datetime)
create table #DateAnalysis
(RowID int identity(1,1),
AccountID int,
RandomDate datetime,
NextDate datetime,
LeadInMonths int)
-- Insert some sample data
insert into #SampleData values
(1, '1/1/13'), (1, '2/1/13'), (1, '3/1/13'), (1, '6/1/13'), (1, '10/1/13'), (1, '11/1/13'), (1, '12/1/13'),
(2, '1/1/13'), (2, '4/1/13'), (2, '5/1/13'), (2, '6/1/13'),
(3, '2/1/13'), (3, '3/1/13'), (3, '4/1/13'), (3, '9/1/13'), (3, '10/1/13'), (3, '11/1/13')
-- Use lead function to determine how many months are between
-- consecutive dates per account
; with DateInterval as
(select AccountID, RandomDate,
NextDate = lead (RandomDate, 1, NULL) over (partition by AccountID order by RandomDate)
from #SampleData)
insert into #DateAnalysis
select AccountID, RandomDate, NextDate,
datediff(mm, RandomDate, NextDate) as 'Lead'
from DateInterval
where NextDate is not null -- Last row will contain NULL for NextDate. Don't include these rows.
-- Show the results
select *,
'NTile' = NTILE(3) over (partition by AccountID order by RandomDate),
'RowNum' = row_number() over (partition by AccountID order by RandomDate)
from #DateAnalysis
Results (this is not getting me what I'm looking for):
RowIDAccountIDRandomDateNextDateLeadInMonthsNTileRowNum
11 2013-01-012013-02-01111
21 2013-02-012013-03-01112
31 2013-03-012013-06-01323
41 2013-06-012013-10-01424
51 2013-10-012013-11-01135
61 2013-11-012013-12-01136
72 2013-01-012013-04-01311
82 2013-04-012013-05-01122
92 2013-05-012013-06-01133
103 2013-02-012013-03-01111
113 2013-03-012013-04-01112
123 2013-04-012013-09-01523
133 2013-09-012013-10-01124
143 2013-10-012013-11-01135
This is what I'm trying to achieve:
RowIDAccountIDRandomDateNextDateLeadInMonthsRankForThisLeadInMonths
11 2013-01-012013-02-0111
21 2013-02-012013-03-0111
31 2013-03-012013-06-0132
41 2013-06-012013-10-0143
51 2013-10-012013-11-0114
61 2013-11-012013-12-0114
72 2013-01-012013-04-0131
82 2013-04-012013-05-0112
92 2013-05-012013-06-0112
103 2013-02-012013-03-0111
113 2013-03-012013-04-0111
123 2013-04-012013-09-0152
133 2013-09-012013-10-0113
143 2013-10-012013-11-0113
The problem comes with accounts like AccountID = 1. They paid consecutively to start, then skipped, then started paying consecutively again. When using window functions, I'm running into trouble attempting to partition by AccountID and LeadInMonths. It's putting all the LeadInMonths = 1 together and that will give me skewed results if I want to know the earliest and latest date within the FIRST consecutive range of dates where the account paid. I've tried NTILE but it expects an integer and there's no telling how many "tiles" would be in AccountID partition.
I've looked at the OVER clause and the new "ROWS BETWEEN" syntax and still cannot get the desired results. Perhaps I need an entirely different approach?
Thank you

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

jonharding2204 SSChasing Mays Points: 610 More actions · Answer 1

Hey there,

This solution gives you what you've described, but with different output columns:

--First I use LAG to find where a new payment period begins by checking the months between the current row and the previous row

--split by AccountID

WITH FlagPaymentPeriods AS

(

SELECT

AccountID

,RandomDate

,CASE WHEN LAG(RandomDate) OVER (PARTITION BY AccountID ORDER BY RandomDate) IS NULL

OR DATEDIFF(month, LAG(RandomDate) OVER (PARTITION BY AccountID ORDER BY RandomDate), RandomDate) > 1

THEN 1

ELSE 0

END AS NewPaymentPeriodFlag

FROM

SampleData

)

--Now I can effectively do a running total of the flags again split by AccountID

, PaymentPeriods AS

(

SELECT

AccountID

,RandomDate

,SUM(NewPaymentPeriodFlag) OVER (PARTITION BY AccountID ORDER BY RandomDate) AS PaymentPeriod

FROM

FlagPaymentPeriods

)

--Now we just find how many rows we have in account where the PaymentPeriod = 1 (the first payment period)

SELECT

AccountID

,COUNT(PaymentPeriod) AS MonthsPaid

FROM

PaymentPeriods

WHERE

PaymentPeriod = 1

GROUP BY

AccountID

,PaymentPeriod

*EDIT: Changed the aggregate in the last part of the query to use COUNT instead of SUM - makes more sense that way.

Results are:

AccountID MonthsPaid

----------- -----------

1 3

2 1

3 3

Hope that helps!

SoCal_DBD Hall of Fame Points: 3382 More actions · Answer 2

SoCal_DBD

Hall of Fame

Points: 3382

January 10, 2014 at 8:51 am

#1679214

Wow!!! That is fantastic! Thank you so much!

jonharding2204 SSChasing Mays Points: 610 More actions · Answer 3

No problem, happy to help 🙂

In case you wanted to extend this to see how many months people pay for in subsequent payment periods, you can change the last part of the query to this:

SELECT

AccountID

,PaymentPeriod

,COUNT(PaymentPeriod) AS MonthsPaid

FROM

PaymentPeriods

GROUP BY

AccountID

,PaymentPeriod

ORDER BY

AccountID

,PaymentPeriod

Dwain Camps SSC Guru Points: 86948 More actions · Answer 4

Are you looking for something like this?

SELECT RowID=ROW_NUMBER() OVER (ORDER BY AccountID, RandomDate)

,AccountID, RandomDate, NextDate, LeadInMonths

,RankForThisLeadInMonths=SUM(lm) OVER

(

PARTITION BY AccountID ORDER BY RandomDate

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

)

FROM

(

SELECT AccountID, RandomDate, NextDate, LeadInMonths

,lm=CASE WHEN LeadInMonths = LAG(LeadInMonths, 1, 0) OVER (PARTITION BY AccountID ORDER BY RandomDate)

THEN 0 ELSE 1 END

FROM

(

SELECT AccountID, RandomDate

,NextDate=LEAD(RandomDate, 1) OVER (PARTITION BY AccountID ORDER BY RandomDate)

,LeadInMonths=DATEDIFF(month, RandomDate

,LEAD(RandomDate, 1) OVER (PARTITION BY AccountID ORDER BY RandomDate))

FROM #SampleData

) a

WHERE NextDate IS NOT NULL

) a

ORDER BY AccountID, RandomDate;

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

SoCal_DBD Hall of Fame Points: 3382 More actions · Answer 5

Yes dwain.c, that would work as well! Thanks for the input.