Introduction
I was reviewing some VB code the other day when I saw that the developer had submitted two queries.
SELECT CT.CustomerTypeDescription , Count(*) AS Customers FROM Customer AS C INNER JOIN CustomerType AS CT ON C.CustomerTypeId = CT.CustomerTypeId GROUP BY CT.CustomerTypeDescription
and
SELECT Count(*) AS Customers FROM Customer
Nothing radical there. If the customers table had been large then I would probably have told the developer to write
SELECT MAX(rowcnt) AS Customers FROM dbo.sysindexes WHERE id = Object_Id( 'dbo.Customers') AND indid IN (0,1)
This is just one of the many useful tips I have picked up from www.sqlservercentral.com. However, as I delved deeper into the code I came across more query pairs where the first query produced a subtotal aggregate and the 2nd produced a grand total.
Many of the queries required joins across four or more tables in order to produce their results and therefore the SysIndexes query cannot work. Generally these queries would not be an issue, however if the queries are being run against a database in a high traffic web site you want to save every bit of resources that you can.
A NORTHWIND example
Let us look at a similar situation within the NorthWind database. We want to get data on the number of products bought by country and product category
The base query might be as follows
SELECT Cust.country , Cat.CategoryName , Count(*) AS ProductsBought FROM dbo.Customers AS Cust INNER JOIN dbo.Orders AS Ord ON Cust.CustomerId = Ord.CustomerId INNER JOIN dbo.[Order Details] AS OrdDet ON Ord.OrderId = OrdDet.OrderId INNER JOIN dbo.Products AS Prod ON OrdDet.ProductId = Product.ProductId INNER JOIN dbo.Categories AS Cat ON Product.CategoryId = Cat.CategoryId GROUP BY Cust.Country , Cat.CategoryName
This has a subtree cost of 0.241. The 2nd query would be as follows
SELECT Cust.country , Count(*) AS ProductsBought FROM dbo.Customers AS Cust INNER JOIN dbo.Orders AS Ord ON Cust.CustomerId = Ord.CustomerId INNER JOIN dbo.[Order Details] AS OrdDet ON Ord.OrderId = OrdDet.OrderId INNER JOIN dbo.Products AS Prod ON OrdDet.ProductId = Product.ProductId GROUP BY Cust.Country
This has a subtree cost of 0.249. The developer calculated the grand total for themselves in code but they could just as easilty fired off a third query adding another subtree cost of 0.214. So we are looking at a total cost for these queries of 0.490 for the subtotals and 0.704 if we want the grand total as well.
WITH ROLLUP
Fortunately SQL has an option on the GROUP BY clause WITH ROLLUP. If we add this clause to our first Northwind example then the returned recordset shows a number of differences
SELECT Cust.country , Cat.CategoryName , Count(*) AS ProductsBought FROM dbo.Customers AS Cust INNER JOIN dbo.Orders AS Ord ON Cust.CustomerId = Ord.CustomerId INNER JOIN dbo.[Order Details] AS OrdDet ON Ord.OrderId = OrdDet.OrderId INNER JOIN dbo.Products AS Prod ON OrdDet.ProductId = Product.ProductId INNER JOIN dbo.Categories AS Cat ON Product.CategoryId = Cat.CategoryId GROUP BY Cust.Country , Cat.CategoryName WITH ROLLUP
- Our recordset is now sorted in order of the GROUP BY columns
- The last record for each country has a NULLcategory with the ProductsBought column holding the total products bought for the country.
- The final record has a NULL for both Country and CategoryName and the ProductsBought column contains the grand total for all records.
And perhaps most important of all the subtree cost for the query is only 0.253.
GROUPING
In our examples so far we have been fortunate in that the data that we are aggregating contains no NULL values. This has meant that all we need to do to get our subtotal values is check for NULL values in the resulting recordset.
Fortunately we have an aggregate function called GROUPING whose purpose is to return a value of one if the row is returned as a result of a ROLLUP or CUBE operation. In all other cases it will return zero. So we alter our example query to the query shown below:
SELECT Cust.country , Cat.CategoryName , Count(*) AS ProductsBought, GROUPING(Cust.country) AS SubtotalForCountries GROUPING(Cat.CategoryName) AS SubtotalForCategories FROMdbo.Customers AS Cust INNER JOIN dbo.Orders AS Ord ON Cust.CustomerId = Ord.CustomerId NNER JOIN dbo.[Order Details] AS OrdDet ON Ord.OrderId = OrdDet.OrderId INNER JOIN dbo.Products AS Prod ON OrdDet.ProductId = Product.ProductId INNER JOIN dbo.Categories AS Cat ON Product.CategoryId = Cat.CategoryId GROUP BY Cust.Country , Cat.CategoryName
Our result set will now look like the following.
WITH CUBE
Changing the WITH ROLLUP to WITH CUBE produces a result set as follows
As you can see in addition to our subtotals for each country we have also gained subtotals for each category. Again the execution plan for this reveals a subtree cost of 0.288. This is dramatically lower than having to provide the same information with individual queries.
The downsides
There are only two downsides that I have come across.
- You cannot use COUNT(DISTINCT columnname) with these GROUP BY options
- Programmers may grumble because they have to write code to deal with the totals being included in the result set
With the increasing take up of .NET, however, the programmer has plenty of tools to be able to deal with the 2nd point.
COMPUTE and COMPUTE BY
So far we have seen that we can dramatically cut down the cost of producing multiple aggregates for a GROUP BY query, but what if we want aggregates for a straight SELECT query?
Again SQL contains two options for use with the ORDER BYclause:
- COMPUTE
- COMPUTE BY
Using the NorthWind database let us look at a query to provide us with the cost of goods in stock.
SELECT Prod.ProductId , Prod.ProductName , Cat.CategoryName , Prod.UnitsInStock * Prod.UnitPrice AS StockValue FROM dbo.Products AS Prod INNER JOIN dbo.Categories AS Cat ON Product.CategoryId = Cat.CategoryId ORDER BY Cat.CategoryName COMPUTE SUM(Prod.UnitsInStock * Prod.UnitPrice)
This query will actually produce two recordsets. One for our sorted results and a 2nd recordset containing the grand total stock value.
If we wanted to produce a subtotal for every category then we alter our query as follows:
SELECT Prod.ProductId , Prod.ProductName , Cat.CategoryName , Prod.UnitsInStock * Prod.UnitPrice AS StockValue FROMdbo.Products AS Prod INNER JOIN dbo.Categories AS Cat ON Product.CategoryId = Cat.CategoryId ORDER BY Cat.CategoryName COMPUTE SUM(Prod.UnitsInStock * Prod.UnitPrice) BY Cat.CategoryName
If you run this query you will find that you will get a pair of result sets for every category, one containing our ordered records and the other containing the subtotal for that category.
If we wanted a final grand total as well then we can add a 2nd COMPUTE to our query
SELECT Prod.ProductId , Prod.ProductName , Cat.CategoryName , Prod.UnitsInStock * Prod.UnitPrice AS StockValue FROM dbo.Products AS Prod INNER JOIN dbo.Categories AS Cat ON Product.CategoryId = Cat.CategoryId ORDER BY Cat.CategoryName COMPUTE SUM(Prod.UnitsInStock * Prod.UnitPrice) BY Cat.CategoryName COMPUTE SUM(Prod.UnitsInStock * Prod.UnitPrice)
It doesn't matter what order we put our COMPUTE statements, the grand total will always appear as the last recordset.
In this simple example the inclussion of the additional COMPUTE statements had no affect on the execution plan.
Conclusion
The use of WITH ROLLUP, WITH CUBE and GROUPING should not pose too much of a challenge for developers. The use of COMPUTE and COMPUTE BY will need to be handled with care. There is a danger that the system will produce multiple large results sets. In addition your developers must be familiar with accessing multiple results sets. Dealing with multiple recordsets isn't really complicated it is just a slight increase in complexity from the developer's perspective. See http://msdn.microsoft.com for examples.