The simplest of requests are often the most difficult to execute. For example, a finance team needs to know every time a customer did not invoice for 90 days in the past 2 years. The simplicity of the ask is deceiving. Tracking differences across multiple dimensions (customer and invoice date in this case) and accounting for NULL values in the changing dimension (aka when a customer did not invoice on a day) appears to be hopeless without the support of a CRM code change. But have no fear, complicated SQL is here!
Testing Scenario: the business would like you to create a customer attrition report. To do this, you need to find gaps in invoice dates per customer and determine when and how many customers go “inactive” and are “reactivated” in the past two years. A customer is deemed “inactive” whenever there are greater than 90 days since the last invoice. This can occur multiple times in one year, so a customer can be “reactivated” multiple times in one year.
Resources Needed:
- SQL Server Access to the needed data elements
- In this scenario, this consists of invoice date by customer. You can swap this out for any other date range or any other unique ID.
- Business logic
- In this scenario, activations in a year = anytime a customer has invoiced first the first time in a 90 day period. You can swap customer field for any dimension such as sales rep, carrier, business segment, etc. You can also swap out invoice date for any date field such as creation date, pickup date, paid date, delivery date, etc.
- Start and End dates
- Ability to use CTE’s/Temp Tables
- This really comes into play if you are trying to create a Direct Query based report in Power BI or using any other reporting tools that do not allow calling Temp Tables. If you hit this limitation, then you will need to leverage a database/code solution instead of the method below.
Notes:
- If your SQL server instance is after 2016, then you will not need to use the custom date temp table and can use IGNORE NULL within the MAX OVER statement (see alternative line in the final SQL code below).
- The process below lays out each portion of the final query, but feel free to skip ahead to the end for the final sql statement if you don’t need each section explained.
Process:
- Set up parameters
DECLARE @StartDate DATE = '2019-01-01'
DECLARE @EndDate DATE = GETDATE()
DECLARE @ActivationRange INT = 90 --notates how many days can be between invoice dates before a customer is deemed "inactive"
.
- Create a date/calendar table. Check with your DBA’s first to make sure they haven’t already created something similar that you can use, all you need is a list of sequential calendar dates with no gaps.
;WITH cte AS (
SELECT @StartDate AS myDate
UNION ALL
|SELECT DATEADD(day,1,myDate) as myDate
FROM cte
WHERE DATEADD(day,1,myDate) <= @EndDate
)
SELECT myDate 'CalendarDate'
INTO #Calendar
FROM cte
OPTION (MAXRECURSION 0)
–this works around the usual 100 recursion row limit
- If you need to partition by a dimension other than date, such as customer in this scenario, you will need to create a table to grab that dimension’s values as well. After this, you’ll need to create a bridge table that will have a value for every date in your range and every customer (or other dimension) value as well.
- –Customer Table
SELECT DISTINCT
DA.AccountsKey
,DA.CompanyID
,DA.CompanyName
,MIN(FSF.InvoiceDateKey) 'FirstInvoiceDate'
INTO #Companies
FROM DimAccount DA
JOIN ShipmentFacts FSF ON FSF.AccountKey = DA.AccountsKey
WHERE FSF.InvoiceDateKey IS NOT NULL
GROUP BY
DA.AccountsKey
,DA.CompanyID
,DA.CompanyName - –Bridge Table that combines both Customer and Date values
SELECT DISTINCT
C.CalendarDate
,Comp.CompanyID
,Comp.CompanyName
,MIN(Comp.FirstInvoiceDate) 'FirstInvoiceDate'
,CONCAT(C.CalendarDate,Comp.CompanyID) 'ID'
INTO #Bridge
FROM #Calendar C, #Companies Comp
GROUP BY
C.CalendarDate
,Comp.CompanyID
,Comp.CompanyName
,CONCAT(C.CalendarDate,Comp.CompanyID)
- –Customer Table
- Next, we need to create our unique ID’s that combine all the dimensions we are hoping to account for in our “IGNORE NULLS” scenario. In this test case, we need to create one ID that grabs the actual dates a customer invoiced on and another for all the dates in our range that a customer could have possibly invoiced on. Then, we join the two together to grab the last time a customer invoiced and get ignore those pesky NULL values. This is the section where having SQL Server 2016 and later will do you a lot of favors (see code below).
- –Actual Invoiced Dates by Customer
SELECT DISTINCT
FSF.InvoiceDateKey
,DA.CompanyName
,DA.CompanyID
,CONCAT(FSF.InvoiceDateKey,DA.CompanyId) 'ID'
INTO #ShipmentData
FROM ShipmentFacts FSF
JOIN #Companies DA ON DA.AccountsKey = FSF.AccountKey
WHERE FSF.InvoiceDateKey BETWEEN @StartDate AND @EndDate - –Joining together and filling in the NULLS with the previous invoiced date by customer
SELECT DISTINCT
C.ID
,S.ID 'ShipData'
,CAST( SUBSTRING( MAX( CAST (C.ID AS BINARY(4)) + CAST(S.ID AS BINARY(20))) OVER (PARTITION BY C.CompanyID ORDER BY C.ID ROWS UNBOUNDED PRECEDING),5,20) AS varchar) 'PreviousInvoiceDateKey'
--ALTERNATIVE FOR POST SQL Server 2012--
--,CAST( SUBSTRING( MAX( CAST (C.ID AS BINARY(4)) + CAST(S.ID AS BINARY(20))) IGNORE NULLS OVER (PARTITION BY C.CompanyID ORDER BY C.ID ROWS UNBOUNDED PRECEDING),5,20) AS varchar) 'PreviousInvoiceDateKey'INTO #RunningDates
FROM #Bridge C
LEFT JOIN #ShipmentData S ON S.ID = C.ID
- –Actual Invoiced Dates by Customer
- The rest of the code is based on business logic, so please use at your discretion and edit for your own needs.
Full SQL Code:
DECLARE @StartDate DATE = '2019-01-01'
DECLARE @EndDate DATE = GETDATE()
DECLARE @ActivationRange INT = 90 --notates how many days can be between invoice dates before a customer is deemed "inactive"
;WITH cte AS (
SELECT @StartDate AS myDate
UNION ALL
SELECT DATEADD(day,1,myDate) as myDate
FROM cte
WHERE DATEADD(day,1,myDate) <= @EndDate
)
SELECT myDate 'CalendarDate'
INTO #Calendar
FROM cte
OPTION (MAXRECURSION 0)
SELECT DISTINCT
DA.AccountsKey
,DA.CompanyID
,DA.CompanyName
,MIN(FSF.InvoiceDateKey) 'FirstInvoiceDate'
INTO #Companies
FROM DimAccount DA
JOIN ShipmentFacts FSF ON FSF.AccountKey = DA.AccountsKey
WHERE FSF.InvoiceDateKey >= '2000-01-01'
GROUP BY
DA.AccountsKey
,DA.CompanyID
,DA.CompanyName
SELECT DISTINCT
C.CalendarDate
,Comp.CompanyID
,Comp.CompanyName
,MIN(Comp.FirstInvoiceDate) 'FirstInvoiceDate'
,CONCAT(C.CalendarDate,Comp.CompanyID) 'ID'
INTO #Bridge
FROM #Calendar C, #Companies Comp
GROUP BY
C.CalendarDate
,Comp.CompanyID
,Comp.CompanyName
,CONCAT(C.CalendarDate,Comp.CompanyID)
SELECT DISTINCT
FSF.InvoiceDateKey
,DA.CompanyName
,DA.CompanyID
,CONCAT(FSF.InvoiceDateKey,DA.CompanyId) 'ID'
INTO #ShipmentData
FROM ShipmentFacts FSF
JOIN #Companies DA ON DA.AccountsKey = FSF.AccountKey
WHERE FSF.InvoiceDateKey BETWEEN @StartDate AND @EndDate
SELECT DISTINCT
C.ID
,S.ID 'ShipData'
,CAST( SUBSTRING( MAX( CAST (C.ID AS BINARY(4)) + CAST(S.ID AS BINARY(20))) OVER (PARTITION BY C.CompanyID ORDER BY C.ID ROWS UNBOUNDED PRECEDING),5,20) AS varchar) 'PreviousInvoiceDateKey'
--ALTERNATIVE FOR POST SQL Server 2012--
--,CAST( SUBSTRING( MAX( CAST (C.ID AS BINARY(4)) + CAST(S.ID AS BINARY(20))) IGNORE NULLS OVER (PARTITION BY C.CompanyID ORDER BY C.ID ROWS UNBOUNDED PRECEDING),5,20) AS varchar) 'PreviousInvoiceDateKey'
INTO #RunningDates
FROM #Bridge C
LEFT JOIN #ShipmentData S ON S.ID = C.ID
SELECT DISTINCT
R.ID
,R.ShipData
,R.PreviousInvoiceDateKey
,LEFT(R.PreviousInvoiceDateKey,10) 'PreviousInvoiceDate'
,LEFT(R.ID,10) 'DateKey'
,RIGHT(R.ID,5) 'CompanyId'
,B.FirstInvoiceDate
INTO #ActivationData
FROM #RunningDates R
LEFT JOIN #Bridge B ON B.ID = R.ID
SELECT DISTINCT
A.ID
,A.DateKey
,A.CompanyId
,A.PreviousInvoiceDate
,YEAR(A.DateKey) 'Year'
,YEAR(A.FirstInvoiceDate) 'InitialActivationYear'
,CASE WHEN YEAR(A.DateKey) = YEAR(A.FirstInvoiceDate) THEN 1 ELSE 0 END 'IsActivationYear'
,DATEDIFF(Day,A.PreviousInvoiceDate,A.DateKey) 'DaysSinceInvoice'
,CASE WHEN DATEDIFF(Day,A.PreviousInvoiceDate,A.DateKey) = @ActivationRange THEN 1 ELSE 0 END 'IsInactive'
,CASE WHEN DATEDIFF(Day,A.PreviousInvoiceDate,A.DateKey) = @ActivationRange THEN A.DateKey ELSE NULL END 'InactiveDate'
INTO #ActivationDetails
FROM #ActivationData ASELECT DISTINCT
D.Year
,D.CompanyId
,SUM(D.IsInactive) 'InactivatedPeriods'
,MAX(D.IsActivationYear) 'IsFirstActivationYear'
,MAX(D.DaysSinceInvoice) 'BiggestGapInInvoicing (Days)'
,MAX(D.InactiveDate) 'LastInactiveDate'
,MAX(D.PreviousInvoiceDate) 'LastInvoiceDate'
,CASE WHEN MAX(D.InactiveDate) > MAX(D.PreviousInvoiceDate) THEN -1 ELSE 0 END 'NotActiveAtEndOfYear'--to grab the activations per customer per year follow equation below
-- Activations = [InactivatedPeriods] + [NotActiveAtEndOfYear] + [IsFirstActivationYear] --this part will be done in Power BI
FROM #ActivationDetails D
GROUP BY
D.Year
,D.CompanyId
DROP TABLE #Calendar
DROP TABLE #Companies
DROP TABLE #Bridge
DROP TABLE #ShipmentData
DROP TABLE #RunningDates
DROP TABLE #ActivationData
DROP TABLE #ActivationDetails
Additional Resource: