Over the last few years I’ve learned quite a bit about different techniques in SQL Server. This particular one has been available since SQL 2005 and looks like it could be really handy.
Let’s say I want to calculate the total sales for by month and at the same the percentage of sales that month represents for the year.
Without the OVER clause I would have to do something with a CTE or a Subquery. For example:
WITH YearTotals AS ( SELECT YEAR(OrderDate) AS OrderYear, SUM(TotalDue) AS YearTotal FROM Sales.SalesOrderHeader GROUP BY YEAR(OrderDate) ) SELECT YEAR(OrderDate) AS OrderYear, MONTH(OrderDate)AS OrderDate, SUM(TotalDue) AS MonthTotal, YearTotals.YearTotal, SUM(TotalDue)/YearTotals.YearTotal*100 AS MonthlyPercentage FROM Sales.SalesOrderHeader JOIN YearTotals ON YearTotals.OrderYear = YEAR(SalesOrderHeader.OrderDate) GROUP BY YEAR(OrderDate), MONTH(OrderDate), YearTotals.YearTotal ORDER BY YEAR(OrderDate), MONTH(OrderDate)
Using the OVER clause I’m able to specify the criteria for each aggregate like so.
SELECT DISTINCT YEAR(OrderDate) AS OrderYear, MONTH(OrderDate)AS OrderDate, SUM(TotalDue) OVER (PARTITION BY YEAR(OrderDate), MONTH(OrderDate)) AS MonthTotal, SUM(TotalDue) OVER (PARTITION BY YEAR(OrderDate)) AS YearTotal, SUM(TotalDue) OVER (PARTITION BY YEAR(OrderDate), MONTH(OrderDate)) / SUM(TotalDue) OVER (PARTITION BY YEAR(OrderDate))*100 AS MonthlyPercentage FROM Sales.SalesOrderHeader ORDER BY YEAR(OrderDate), MONTH(OrderDate)
The OVER version looks quite a bit simpler doesn’t it? And I’ve always thought simpler usually meant faster and more efficient. Interestingly enough however, when I looked at performance the CTE version was significantly more efficient. The OVER version used almost 200 times as many reads and took almost 7 times as long to run. Now this was just one particular example, but still something to be watched. So, while I can think of several cases where using the OVER clause would have been a quick way to add a total to a query, I think I’m still going to have to carefully test its performance before putting it into production.
Note: My initial version of the OVER query used a GROUP BY to calculate the MonthTotal and then the OVER to calculate the YearTotal. Unfortunately I kept getting an error. I’m not entirely sure if I’m making a mistake or if OVER and GROUP BY are just incompatible. I’ll continue to research and if I find out one way or the other I’ll add it to the post.
Filed under: Microsoft SQL Server, SQLServerPedia Syndication, T-SQL Tagged: code language, language sql, microsoft sql server, sql statements, T-SQL