January 9, 2014 at 2:36 pm
My scenario is as follows: We have accounts that pay for a particular "premium" service. It's entirely possible an account paid for this service for three consecutive months in 2013, then stopped paying, then started paying again. Why I'm trying to establish is, for the FIRST period of time the accout paid for this service, for how many consecutive months did they pay? Here is my test data:
if object_id('tempdb..#SampleData') is not null
drop table #SampleData
go
if object_id('tempdb..#DateAnalysis') is not null
drop table #DateAnalysis
go
-- Create temp tables for sample data
create table #SampleData
(AccountID int,
RandomDate datetime)
create table #DateAnalysis
(RowID int identity(1,1),
AccountID int,
RandomDate datetime,
NextDate datetime,
LeadInMonths int)
-- Insert some sample data
insert into #SampleData values
(1, '1/1/13'), (1, '2/1/13'), (1, '3/1/13'), (1, '6/1/13'), (1, '10/1/13'), (1, '11/1/13'), (1, '12/1/13'),
(2, '1/1/13'), (2, '4/1/13'), (2, '5/1/13'), (2, '6/1/13'),
(3, '2/1/13'), (3, '3/1/13'), (3, '4/1/13'), (3, '9/1/13'), (3, '10/1/13'), (3, '11/1/13')
-- Use lead function to determine how many months are between
-- consecutive dates per account
; with DateInterval as
(select AccountID, RandomDate,
NextDate = lead (RandomDate, 1, NULL) over (partition by AccountID order by RandomDate)
from #SampleData)
insert into #DateAnalysis
select AccountID, RandomDate, NextDate,
datediff(mm, RandomDate, NextDate) as 'Lead'
from DateInterval
where NextDate is not null -- Last row will contain NULL for NextDate. Don't include these rows.
-- Show the results
select *,
'NTile' = NTILE(3) over (partition by AccountID order by RandomDate),
'RowNum' = row_number() over (partition by AccountID order by RandomDate)
from #DateAnalysis
Results (this is not getting me what I'm looking for):
RowIDAccountIDRandomDateNextDateLeadInMonthsNTileRowNum
11 2013-01-012013-02-01111
21 2013-02-012013-03-01112
31 2013-03-012013-06-01323
41 2013-06-012013-10-01424
51 2013-10-012013-11-01135
61 2013-11-012013-12-01136
72 2013-01-012013-04-01311
82 2013-04-012013-05-01122
92 2013-05-012013-06-01133
103 2013-02-012013-03-01111
113 2013-03-012013-04-01112
123 2013-04-012013-09-01523
133 2013-09-012013-10-01124
143 2013-10-012013-11-01135
This is what I'm trying to achieve:
RowIDAccountIDRandomDateNextDateLeadInMonthsRankForThisLeadInMonths
11 2013-01-012013-02-0111
21 2013-02-012013-03-0111
31 2013-03-012013-06-0132
41 2013-06-012013-10-0143
51 2013-10-012013-11-0114
61 2013-11-012013-12-0114
72 2013-01-012013-04-0131
82 2013-04-012013-05-0112
92 2013-05-012013-06-0112
103 2013-02-012013-03-0111
113 2013-03-012013-04-0111
123 2013-04-012013-09-0152
133 2013-09-012013-10-0113
143 2013-10-012013-11-0113
The problem comes with accounts like AccountID = 1. They paid consecutively to start, then skipped, then started paying consecutively again. When using window functions, I'm running into trouble attempting to partition by AccountID and LeadInMonths. It's putting all the LeadInMonths = 1 together and that will give me skewed results if I want to know the earliest and latest date within the FIRST consecutive range of dates where the account paid. I've tried NTILE but it expects an integer and there's no telling how many "tiles" would be in AccountID partition.
I've looked at the OVER clause and the new "ROWS BETWEEN" syntax and still cannot get the desired results. Perhaps I need an entirely different approach?
Thank you
January 10, 2014 at 4:18 am
Hey there,
This solution gives you what you've described, but with different output columns:
--First I use LAG to find where a new payment period begins by checking the months between the current row and the previous row
--split by AccountID
WITH FlagPaymentPeriods AS
(
SELECT
AccountID
,RandomDate
,CASE WHEN LAG(RandomDate) OVER (PARTITION BY AccountID ORDER BY RandomDate) IS NULL
OR DATEDIFF(month, LAG(RandomDate) OVER (PARTITION BY AccountID ORDER BY RandomDate), RandomDate) > 1
THEN 1
ELSE 0
END AS NewPaymentPeriodFlag
FROM
SampleData
)
--Now I can effectively do a running total of the flags again split by AccountID
, PaymentPeriods AS
(
SELECT
AccountID
,RandomDate
,SUM(NewPaymentPeriodFlag) OVER (PARTITION BY AccountID ORDER BY RandomDate) AS PaymentPeriod
FROM
FlagPaymentPeriods
)
--Now we just find how many rows we have in account where the PaymentPeriod = 1 (the first payment period)
SELECT
AccountID
,COUNT(PaymentPeriod) AS MonthsPaid
FROM
PaymentPeriods
WHERE
PaymentPeriod = 1
GROUP BY
AccountID
,PaymentPeriod
*EDIT: Changed the aggregate in the last part of the query to use COUNT instead of SUM - makes more sense that way.
Results are:
AccountID MonthsPaid
----------- -----------
1 3
2 1
3 3
Hope that helps!
January 10, 2014 at 8:51 am
Wow!!! That is fantastic! Thank you so much!
January 10, 2014 at 9:00 am
No problem, happy to help 🙂
In case you wanted to extend this to see how many months people pay for in subsequent payment periods, you can change the last part of the query to this:
SELECT
AccountID
,PaymentPeriod
,COUNT(PaymentPeriod) AS MonthsPaid
FROM
PaymentPeriods
GROUP BY
AccountID
,PaymentPeriod
ORDER BY
AccountID
,PaymentPeriod
January 12, 2014 at 6:25 pm
Are you looking for something like this?
SELECT RowID=ROW_NUMBER() OVER (ORDER BY AccountID, RandomDate)
,AccountID, RandomDate, NextDate, LeadInMonths
,RankForThisLeadInMonths=SUM(lm) OVER
(
PARTITION BY AccountID ORDER BY RandomDate
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)
FROM
(
SELECT AccountID, RandomDate, NextDate, LeadInMonths
,lm=CASE WHEN LeadInMonths = LAG(LeadInMonths, 1, 0) OVER (PARTITION BY AccountID ORDER BY RandomDate)
THEN 0 ELSE 1 END
FROM
(
SELECT AccountID, RandomDate
,NextDate=LEAD(RandomDate, 1) OVER (PARTITION BY AccountID ORDER BY RandomDate)
,LeadInMonths=DATEDIFF(month, RandomDate
,LEAD(RandomDate, 1) OVER (PARTITION BY AccountID ORDER BY RandomDate))
FROM #SampleData
) a
WHERE NextDate IS NOT NULL
) a
ORDER BY AccountID, RandomDate;
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
January 13, 2014 at 10:33 am
Yes dwain.c, that would work as well! Thanks for the input.
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply