Most of the scripts I’ve used to populate date dimension uses a cursor. Since data is loaded only once to a date dimension in the ETL life cycle, using a cursor isn’t a sin.
Still, when I was reviewing my own code the other day, I wanted to get rid of the cursor. Why not, Right?
Here is a script that uses CTE and Window Functions to populate the date dimension.
DECLARE @startdate DATE = '20000101' , @enddate DATE = '20301231' ; WITH c AS ( SELECTNum = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - 1 FROM sys.columns c CROSS JOIN sys.columns c1 ) , d AS ( SELECT[date] = DATEADD(day, Num, @startdate) , Num FROM c WHERE Num >= 0 AND Num <= DATEDIFF(day, @startdate, @enddate) ) SELECT datekey = CAST(CONVERT(VARCHAR(8), DATEADD(day, Num, @startdate), 112) AS INT) , [date] , [DayOfMonth] = DATEPART(day, [Date]) , [DayName] = DATENAME(weekday, [Date]) , [DayOfYear] = DATEPART(dayofyear, [Date]) , [WeekOfYear] = DATEPART(week, [Date]) , [MonthName] = DATENAME(month, [Date]) , [MonthNumber] = DATEPART(month, [Date]) , [QuarterNumber] = DATEPART(quarter, [Date]) , [Year] = YEAR([date]) , [FiscalYear] = CASE WHEN DATEPART(month, [Date]) < 7 THEN YEAR([date]) ELSE YEAR([date]) + 1 END --Add more columns as needed. FROM d
Let’s take a closer look:
- Variables StartDate and EndDate hold the date range for the result set.
- We define a CTE (called c) that uses Row_Number() function to generate numbers starting from zero for each row in sys.columns. sys.columns is cross joined to itself to generate more number of [input] rows.
- We define another CTE (called d) that uses numbers from c, filters on date range, and converts numbers to dates based on the date variables.
- Finally, the Select statement uses Num and Date column aliases created by the two CTEs and date functions to create additional fields for the date dimension.
You’ll notice that the script doesn’t have all columns a date dimension typically has. Don’t call me lazy for that – it provides the logic and skeleton to add more columns. Just add them as needed.