Common Table Expression Misunderstandings I frequently use Common Table Expressions (CTEs) when I demo windowing functions and get questions about them. Windowing functions are limited to the SELECT and OVER clause which means that you can’t filter or group on them directly. Using a CTE is one way to separate the logic of the query into multiple steps so that you filter or group after the windowing function executes. Another reason might be nesting aggregate functions, such as needing the average of the sums. Again, you can’t do that directly and using a CTE is one way to solve it. I was once given a query containing nested subqueries (aka derived tables) a dozen layers deep. The team wanted me to verify that the query did what it was supposed to do, which was very difficult. I ended up translating the query into CTEs where one CTE queried the previous CTE instead of nesting. I also added comments at each level to explain what was going on. The team was thrilled that they could now make sense of the query. Are CTEs my ultimate T-SQL tool? Not at all. I do love how CTEs are organized and will often start with them. However, if there is a performance issue, I’ll try something else. I’ve run into a few people who would never use a CTE because they think they always cause performance issues. I’ve also talked to some others who believe they have some functionality that doesn’t exist. Surprisingly, these two are related. One big difference between a derived table and a CTE is that a derived table cannot be used multiple times in the query while a CTE can. You could define a CTE that joins to itself in the outer query, for example. A derived table must be defined for each time it’s used. Here’s where the misunderstandings begin. When a CTE is used more than once in a query, SQL Server executes that CTE the same number of times and possibly with different indexes and operators depending on how it's used each time. The results of the CTE are not cached in memory for reuse. Here’s a simple non-CTE example from AdventureWorks: SET STATISTICS IO ON;
GO
SELECT CustomerID
FROM Sales.Customer
UNION ALL
SELECT CustomerID
FROM Sales.Customer; It’s obvious that this will touch the Sales.Customer table twice. The query uses 74 logical reads and scans the table twice. Here’s another example where a CTE is used. Sales.Customer is in the query just once, and UNION ALL is used in the outer query. WITH Customers AS (
SELECT CustomerID
FROM Sales.Customer)
SELECT *
FROM Customers
UNION ALL
SELECT * FROM Customers; Some people think that the second query will only touch the Sales.Customer table once and cache the results, but it’s actually identical to the previous query. It also uses 74 logical reads and scans the table twice, and the execution plans are identical. There’s no difference in the two examples. SQL Server doesn’t save the results of the CTE for multiple uses in the outer query. I’m hopeful that they do work like this in some future version of SQL Server. When the CTE is used many times in the outer query, performance might deteriorate when you don’t expect it. This doesn’t mean that you should never use a CTE; it means that you need to understand how they work. Another misconception I’ve run into lately involves the scope of the CTE. It lives only within one statement. You can’t define a CTE and then use it in multiple queries like a temp table. I hope this has cleared up some of the misconceptions about CTEs and that they remain a useful tool in your T-SQL toolbox. Kathi Kellenberger Join the debate, and respond to the editorial on the forums |