Introduction
Don't Repeat Yourself (DRY) is an important principle of professional programming practice (How's that for alliteration!) In SQL queries, though, it is common to see repeated expressions. This article explores how we can use the SQL APPLY operator to reduce repetition and make our queries DRYer and easier to maintain.
Example of Repeated Expressions
A good example of repetition can be shown in the AdventureWorksDW2008R2 database. Consider the view [dbo].[vTimeSeries]:
CREATE VIEW [dbo].[vTimeSeries] AS SELECT CASE [Model] WHEN 'Mountain-100' THEN 'M200' WHEN 'Road-150' THEN 'R250' WHEN 'Road-650' THEN 'R750' WHEN 'Touring-1000' THEN 'T1000' ELSE Left([Model], 1) + Right([Model], 3) END + ' ' + [Region] AS [ModelRegion] ,(Convert(Integer, [CalendarYear]) * 100) + Convert(Integer, [Month]) AS [TimeIndex] ,Sum([Quantity]) AS [Quantity] ,Sum([Amount]) AS [Amount] ,CalendarYear ,[Month] ,[dbo].[udfBuildISO8601Date] ([CalendarYear], [Month], 25) as ReportingDate FROM [dbo].[vDMPrep] WHERE [Model] IN ('Mountain-100', 'Mountain-200', 'Road-150', 'Road-250', 'Road-650', 'Road-750', 'Touring-1000') GROUP BY CASE [Model] WHEN 'Mountain-100' THEN 'M200' WHEN 'Road-150' THEN 'R250' WHEN 'Road-650' THEN 'R750' WHEN 'Touring-1000' THEN 'T1000' ELSE Left(Model,1) + Right(Model,3) END + ' ' + [Region] ,(Convert(Integer, [CalendarYear]) * 100) + Convert(Integer, [Month]) ,CalendarYear ,[Month] ,[dbo].[udfBuildISO8601Date] ([CalendarYear], [Month], 25); GO
There are three expressions in this query that are repeated. They are given the aliases [ModelRegion], [TimeIndex] and [ReportingDate], respectively. You can see these expressions first in the SELECT list and later on in the GROUP by. We're going to eliminate this repetition using the APPLY operator.
Using APPLY to Encapsulate Expressions
When you read the official documentation about the APPLY operator, it is not immediately obvious how it can help us reduce repetition. (See "Using Apply" at https://msdn.microsoft.com/en-us/library/ms177634.aspx)
Many examples show how to use the APPLY operator to invoke a table-valued function for each row in the rowset. However, since it is possible to use SELECT queries in the APPLY context, it is also possible to use an expression operating on columns in each row in the rowset. How does this help us? Here's a simple example:
SELECT dt, dt_year FROM (VALUES (GetDate())) cur(dt) CROSS APPLY (SELECT YEAR(cur.dt)) _(dt_year)
This simple query produces:
dt dt_year 2015-03-27 11:09:11.680 2015
Note that the APPLY operator only contains an expression that uses a column in the rowset and aliases the result. The alias is then used in the main query. As an aside, note the use of '_' as the alias for the APPLY rowset. This is handy when you don't care about the rowset alias as in this case. It is a paradigm used in many programming languages as well.
We can also use the results of one APPLY operation in a subsequent APPLY. Note that the operations proceed top-to-bottom, left-to-right. This means that the results of one APPLY are not available until the operator appears in sequence. Let's expand our toy example a little:
SELECT dt, dt_year, LeapYear FROM (VALUES (GetDate())) cur(dt) CROSS APPLY (SELECT YEAR(cur.dt)) _(dt_year) CROSS APPLY ( SELECT CASE WHEN dt_year % 4 = 0 AND dt_year % 400 <> 0 THEN 'Leap Year' ELSE 'Not Leap Year' END ) _1(LeapYear)
This query produces:
dt dt_year LeapYear 2015-03-27 11:19:15.533 2015 Not Leap Year
Note that, if the two APPLY clauses were reversed in sequence, SQL would rightly complain that it can't find "dt_year" (Try it!)
Making [dbo].[vTimeSeries] DRYer
Armed with these tools, how can we simplify the AdventureWorks view? Let's take the three repeated expressions and build an APPLY clause to contain them:
... CROSS APPLY ( SELECT CASE [Model] WHEN 'Mountain-100' THEN 'M200' WHEN 'Road-150' THEN 'R250' WHEN 'Road-650' THEN 'R750' WHEN 'Touring-1000' THEN 'T1000' ELSE Left([Model], 1) + Right([Model], 3) END + ' ' + [Region] AS [ModelRegion] , (Convert(Integer, [CalendarYear]) * 100) + Convert(Integer, [Month]) AS [TimeIndex] , [dbo].[udfBuildISO8601Date] ([CalendarYear], [Month], 25) AS ReportingDate ) _ ...
Just like the toy example, above, the APPLY clause merely returns aliased expressions using columns in the rowset. How will this make the original query DRYer? We don't need to repeat these expressions!
SELECT [ModelRegion] ,[TimeIndex] ,Sum([Quantity]) AS [Quantity] ,Sum([Amount]) AS [Amount] ,[CalendarYear] ,[Month] ,[ReportingDate] FROM [dbo].[vDMPrep] -- APPLY clause ... GROUP BY [ModelRegion] ,[TimeIndex] ,[CalendarYear] ,[Month] ,[ReportingDate]
Putting it all together, the view definition becomes:
CREATE VIEW [dbo].[vTimeSeries] AS SELECT [ModelRegion] ,[TimeIndex] ,Sum([Quantity]) AS [Quantity] ,Sum([Amount]) AS [Amount] ,[CalendarYear] ,[Month] ,[ReportingDate] FROM [dbo].[vDMPrep] CROSS APPLY ( SELECT CASE [Model] WHEN 'Mountain-100' THEN 'M200' WHEN 'Road-150' THEN 'R250' WHEN 'Road-650' THEN 'R750' WHEN 'Touring-1000' THEN 'T1000' ELSE Left([Model], 1) + Right([Model], 3) END + ' ' + [Region] AS [ModelRegion] , (Convert(Integer, [CalendarYear]) * 100) + Convert(Integer, [Month]) AS [TimeIndex] , [dbo].[udfBuildISO8601Date] ([CalendarYear], [Month], 25) AS ReportingDate ) _ WHERE [Model] IN ('Mountain-100', 'Mountain-200', 'Road-150', 'Road-250', 'Road-650', 'Road-750', 'Touring-1000') GROUP BY [ModelRegion] ,[TimeIndex] ,[CalendarYear] ,[Month] ,[ReportingDate] GO
Now, if the expressions need to change, we have only one place to change them. The query has become DRYer.
Summary
We can use the APPLY operator to encapsulate and alias expressions that are repeated in our queries. This allows us to apply the principle of "Don't Repeat Yourself" and make our queries easier to maintain.