There’s power in naming things. Supposedly some types of magic are even based on knowing the correct names for things. The name for the T-SQL clause Common Table Expression (CTE) is actually pretty accurate. It’s an expression that looks like a table and can be used in common across the entire query (at least I think that’s what the common part refers to). But note, I didn’t say it was a table. It’s not. It’s an expression. If you look at the T-SQL definition at the link, it refers to a “temporary” result set. Now, to a lot of people, that means table. But it isn’t. Let’s look at this in more detail.
Here’s a query that defines a simple CTE and then uses it to query the date in the next T-SQL statement:
WITH x AS (SELECT soh.OrderDate, soh.SalesOrderID, sod.LineTotal, sod.ProductID FROM Sales.SalesOrderHeader AS soh JOIN Sales.SalesOrderDetail AS sod ON sod.SalesOrderID = soh.SalesOrderID WHERE soh.SalesOrderID BETWEEN 43683 AND 43883 ) SELECT x.OrderDate, x.LineTotal, p.Name FROM x JOIN Production.Product AS p ON p.ProductID = x.ProductID;
If we run this, we get the following execution plan:
Let’s not worry about the plan for the moment. Instead, I want to look at a couple of more queries:
--Derived Table SELECT x.OrderDate, x.LineTotal, p.Name FROM (SELECT soh.OrderDate, soh.SalesOrderID, sod.LineTotal, sod.ProductID FROM Sales.SalesOrderHeader AS soh JOIN Sales.SalesOrderDetail AS sod ON sod.SalesOrderID = soh.SalesOrderID WHERE soh.SalesOrderID BETWEEN 43683 AND 43883 ) AS x JOIN Production.Product AS p ON p.ProductID = x.ProductID; --Just a JOIN SELECT soh.OrderDate, sod.LineTotal, p.Name FROM Sales.SalesOrderHeader AS soh JOIN Sales.SalesOrderDetail AS sod ON sod.SalesOrderID = soh.SalesOrderID JOIN Production.Product AS p ON p.ProductID = sod.ProductID WHERE soh.SalesOrderID BETWEEN 43683 AND 43883;
Three, nominally, different queries. Yet, if you run these queries, all of them return exactly the same data and all of them have exactly the same execution plan. That’s right, the CTE didn’t change the way that SQL Server derived these queries in any way. Nor would it. That’s because, the CTE is absolutely not a table. It’s an expression.
That name appears to hold magic. We see the word table and we think “Tables, I know all about them” and we proceed to start treating our CTEs as if they were tables, but they’re not. And Microsoft’s definition is completely accurate, if maybe just a touch misleading. These are not tables we’re dealing with when we work with Common Table Expressions.
Let’s look at the difference:
--Table Valued Parameter DECLARE @TVP AS TABLE (OrderDate DATETIME, SalesOrderID INT, LineTotal NUMERIC(38,6), ProductID INT, INDEX tt_ci CLUSTERED (SalesOrderID), INDEX tt_i NONCLUSTERED(ProductID)); INSERT @TVP (OrderDate, SalesOrderID, LineTotal, ProductID ) SELECT soh.OrderDate, soh.SalesOrderID, sod.LineTotal, sod.ProductID FROM Sales.SalesOrderHeader AS soh JOIN Sales.SalesOrderDetail AS sod ON sod.SalesOrderID = soh.SalesOrderID WHERE soh.SalesOrderID BETWEEN 43683 AND 43883; SELECT x.OrderDate, x.LineTotal, p.Name FROM @TVP AS x JOIN Production.Product AS p ON p.ProductID = x.ProductID; --Temporary Table CREATE TABLE #TT (OrderDate DATETIME, SalesOrderID INT, LineTotal NUMERIC(38,6), ProductID INT, INDEX tt_ci CLUSTERED (SalesOrderID), INDEX tt_i NONCLUSTERED (ProductID)); INSERT #TT (OrderDate, SalesOrderID, LineTotal, ProductID ) SELECT soh.OrderDate, soh.SalesOrderID, sod.LineTotal, sod.ProductID FROM Sales.SalesOrderHeader AS soh JOIN Sales.SalesOrderDetail AS sod ON sod.SalesOrderID = soh.SalesOrderID WHERE soh.SalesOrderID BETWEEN 43683 AND 43883; SELECT x.OrderDate, x.LineTotal, p.Name FROM #TT AS x JOIN Production.Product AS p ON p.ProductID = x.ProductID;
These two queries are using table objects, a table valued parameter and a temporary table. The execution plans are decidedly different from the one above and each other:
Interesting enough it chose to do a Merge instead of the Nested Loops and ordered the table scan to accomplish it. Yet, that query runs faster and has less reads than all the others… As long as you don’t count the time to load the temporary table. Then it just doesn’t work as well as the others, at all. Speaking of execution times, the first three queries all had exactly 27 reads and, within some variations, ran in about 30ms on my system. Anyway, query tuning is not the point of the discussion.
You can see the differences in the execution plans that you get between an expression, a CTE, and actual temporary storage, either table variables or temporary tables. Yet, you can also see that the CTE, an expression, just an expression, was resolved in the exact same way as the derived table and the simple JOIN. In short, a common table expression is not a table. Absolutely not. It’s an expression.
On a side-note, if you’re thinking of attending SQL Rally Nordic 2015 in Copenhagen and you want to spend some time talking about query tuning, I’m doing a full day pre-conference seminar. Click here now to register.
The post Common Table Expressions Are Not Tables appeared first on Home Of The Scary DBA.