CTE vs Derived Table

Question

Post reply

CTE vs Derived Table

Goldie Lesser

SSCertifiable

Points: 6155
More actions
November 9, 2008 at 4:48 pm

#383338

Are there any benefits in using a CTE over a derived table? Or vice versa?

Viewing 15 posts - 1 through 15 (of 26 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004355 More actions · Answer 1

yep... paste the following into the "help" url box...

ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/udb9/html/d4c26311-d9f4-432c-953a-1c1042b33c97.htm

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Goldie Lesser SSCertifiable Points: 6155 More actions · Answer 2

Let's say we have a case where the same functionality can be accomplished with either a CTE or a derived table.

Are there any performance benefits of the CTE?

Also, if you "Reference the resulting table multiple times in the same statement" are the results cached?

Or does the query get rebuilt on every call?

Mauve SSChampion Points: 11316 More actions · Answer 3

It depends upon what you are doing. Performance can vary and in certain cases be worse. See the following postings by Tony Rogerson related to performance:

http://sqlblogcasts.com/blogs/tonyrogerson/archive/2008/05/17/non-recursive-common-table-expressions-performance-sucks-1-cte-self-join-cte-sub-query-inline-expansion.aspx

http://sqlblogcasts.com/blogs/tonyrogerson/archive/2008/05/17/non-recursive-common-table-expressions-performance-sucks-2-row-number-is-executed-number-of-cte-references-x-number-of-rows-from-the-anchor.aspx

Edit: Spelling correction.

[font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

Goldie Lesser SSCertifiable Points: 6155 More actions · Answer 4

Thanks to both of you for your replies. The articles were very helpful.

Jeff Moden SSC Guru Points: 1004355 More actions · Answer 5

Thanks for the feedback... you sure you're all set?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Goldie Lesser SSCertifiable Points: 6155 More actions · Answer 6

Jeff Moden (11/11/2008)
Thanks for the feedback... you sure you're all set?

Not really, Jeff. Thanks for asking. 🙂

What I'm really looking for is performance comparison between CTE and Derived Tables.

I haven't been able to find any online.

The articles John provided are comparing CTE to Temp Tables which is not a valid comparison.

Mauve SSChampion Points: 11316 More actions · Answer 7

ggraber (11/12/2008)
The articles John provided are comparing CTE to Temp Tables which is not a valid comparison.

I believe that the 1st article by Tony showed that the result set of the CTE is not internally persisted (as a temporary result set) for later usage by other parts of the query. In other words, the syntax of the CTE query is "plugged into" the main query as a derived query.

So it doesn't matter. CTE or derived -- they are both the same. Personally, I am using CTEs a lot more as they improve readability and are a bit easier to work with. Oracle introduced CTEs quite a while back so I grew to like them.

[font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

Mauve SSChampion Points: 11316 More actions · Answer 8

Note: SQL Server 2005 SP2.

It just so happens that I'm coding a query for an application that was a prime candidate for a CTE as it had a complicated sub-query (derived table). Needless to say that in this case, Tony Rogerson is correct -- analysis of the query plan shows that it doesn't cache the results of the CTE but re-executes the CTE query as many times as it is referenced.

Now the query has a UNION ALL in it so one could argue that it actually has 2 "queries" (SELECTS) in it. But I'd say a query is a query regardless of how many SELECT verbs are in the query.

Therefore, in this case I'm better off populating a temp table (once) and then referencing it within the main query.

Psuedo-code of my query (the "winks" are closing parens):

WITH cte AS

(

SELECT (query from hell)

)

SELECT

TableA.stuff

FROM

TableA

WHERE NOT EXISTS

(

SELECT NULL

FROM

TableC

INNER JOIN cte

ON ...

WHERE ...

)

UNION ALL

SELECT

TableB.stuff

FROM

TableB

WHERE NOT EXISTS

(

SELECT NULL

FROM

TableC

INNER JOIN cte

ON ...

WHERE ...

)

[font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

Goldie Lesser SSCertifiable Points: 6155 More actions · Answer 9

JohnG (11/12/2008)
Note: SQL Server 2005 SP2.
It just so happens that I'm coding a query for an application that was a prime candidate for a CTE as it had a complicated sub-query (derived table). Needless to say that in this case, Tony Rogerson is correct -- analysis of the query plan shows that it doesn't cache the results of the CTE but re-executes the CTE query as many times as it is referenced.

Now, THAT's what I'm looking for!

Thanks a lot for the info John! 🙂

Jeff Moden SSC Guru Points: 1004355 More actions · Answer 10

ggraber (11/12/2008)
Jeff Moden (11/11/2008)
Thanks for the feedback... you sure you're all set?
Not really, Jeff. Thanks for asking. 🙂
What I'm really looking for is performance comparison between CTE and Derived Tables.
I haven't been able to find any online.
The articles John provided are comparing CTE to Temp Tables which is not a valid comparison.

Heh... you just didn't find what you're looking for... as with everything else, the answer is, "It Depends".

Here's one application of CTE's that prove slightly better performance... see the performance chart at the end of the article...

http://www.sqlservercentral.com/articles/T-SQL/63681/

With that in mind, here's another test that anyone can easily run...

DECLARE @TestRowCount INT

SET @TestRowCount = 1000000

PRINT '========== CTE Test =========='

SET STATISTICS IO ON

SET STATISTICS TIME ON

;WITH

cteTally AS

(

SELECT TOP (@TestRowCount)

ROW_NUMBER() OVER (ORDER BY sc1.ID) AS N

FROM Master.dbo.SysObjects sc1

CROSS JOIN Master.dbo.SysObjects sc2

)

SELECT t1.N, t2.N

FROM cteTally t1

INNER JOIN cteTally t2

ON t1.N = t2.N

WHERE t1.N % 10 = 0

SET STATISTICS TIME OFF

SET STATISTICS IO OFF

PRINT REPLICATE('-',80) ---------------------------------------------

PRINT '========== Derived Table Test =========='

SET STATISTICS IO ON

SET STATISTICS TIME ON

SELECT t1.N, t2.N

FROM

(SELECT TOP (@TestRowCount)

ROW_NUMBER() OVER (ORDER BY sc1.ID) AS N

FROM Master.dbo.SysObjects sc1

CROSS JOIN Master.dbo.SysObjects sc2) t1

INNER JOIN

(SELECT TOP (@TestRowCount)

ROW_NUMBER() OVER (ORDER BY sc1.ID) AS N

FROM Master.dbo.SysObjects sc1

CROSS JOIN Master.dbo.SysObjects sc2) t2

ON t1.N = t2.N

WHERE t1.N % 10 = 0

SET STATISTICS TIME OFF

SET STATISTICS IO OFF

PRINT REPLICATE('-',80) ---------------------------------------------

PRINT '========== Temp Table Test =========='

SET STATISTICS IO ON

SET STATISTICS TIME ON

SELECT TOP (@TestRowCount)

ISNULL(ROW_NUMBER() OVER (ORDER BY sc1.ID),0) AS N

INTO #MyHead

FROM Master.dbo.SysObjects sc1

CROSS JOIN Master.dbo.SysObjects sc2

ALTER TABLE #MyHead

ADD PRIMARY KEY CLUSTERED (N)

SELECT t1.N, t2.N

FROM #MyHead t1

INNER JOIN #MyHead t2

ON t1.N = t2.N

WHERE t1.N % 10 = 0

SET STATISTICS TIME OFF

SET STATISTICS IO OFF

DROP TABLE #MyHead

To summarize, on my box, the self-joined CTE and the double derived tables are virtually identical in just about every way except the CTE version has less actual code to maintain, doesn't require dual updates for changes, and is a wee bit more readable.

The surprise for most is that the TempTable version (if you add up all the times... adding the clustered PK creates to time lines but only the second one is valid so don't add the first one), is just about as fast even though I took the time to add a clustered key. BUT, if you need the same data a 3rd or more time, or you need the same data further on down in another snippet of code in the same proc, I think you'll find the temp table has certain advantages especially if it's indexed.

Here's the results of the code above on my machine... don't add the red stuff because that's where the clustered index adds a spurious time line...

========== CTE Test ==========

(100000 row(s) affected)

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'sysobjrdb'. Scan count 4, logical reads 52, physical reads 0, read-ahead reads 63, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'sysschobjs'. Scan count 4, logical reads 14, physical reads 0, read-ahead reads 23, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 3796 ms, elapsed time = 7055 ms.

--------------------------------------------------------------------------------

========== Derived Table Test ==========

(100000 row(s) affected)

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'sysobjrdb'. Scan count 4, logical reads 52, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'sysschobjs'. Scan count 4, logical reads 14, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 3750 ms, elapsed time = 6990 ms.

--------------------------------------------------------------------------------

========== Temp Table Test ==========

Table 'sysobjrdb'. Scan count 2, logical reads 26, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'sysschobjs'. Scan count 2, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 1375 ms, elapsed time = 1422 ms.

(1000000 row(s) affected)

SQL Server parse and compile time:

CPU time = 11 ms, elapsed time = 11 ms.

Table '#MyHead_____________________________________________________________________________________________________________00000000003F'. Scan count 1, logical reads 2102, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 2641 ms, elapsed time = 3764 ms. --Don't add this line!!!!

SQL Server Execution Times:

CPU time = 2657 ms, elapsed time = 3777 ms.

(100000 row(s) affected)

Table '#MyHead_____________________________________________________________________________________________________________00000000003F'. Scan count 2, logical reads 4216, physical reads 0, read-ahead reads 2, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 1032 ms, elapsed time = 2791 ms.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Goldie Lesser SSCertifiable Points: 6155 More actions · Answer 11

Jeff Moden (11/12/2008)
Heh... you just didn't find what you're looking for... as with everything else, the answer is, "It Depends".

Obviously every case is going to be different. I was really looking for some basic guidelines.

To summarize, on my box, the self-joined CTE and the double derived tables are virtually identical in just about every way except the CTE version has less actual code to maintain, doesn't require dual updates for changes, and is a wee bit more readable.

I really have come to love the readability of CTEs. I just wanted to make sure I wasn't compromising performance.

The surprise for most is that the TempTable version (if you add up all the times... adding the clustered PK creates to time lines but only the second one is valid so don't add the first one), is just about as fast even though I took the time to add a clustered key. BUT, if you need the same data a 3rd or more time, or you need the same data further on down in another snippet of code in the same proc, I think you'll find the temp table has certain advantages especially if it's indexed.

This is really interesting. We were having a very long and heated discussion about temp tables vs CTEs at work.

We were looking to come up with some "rules of thumb" about when to use each.

I think I'll pass along your test cases and conclusions. (In your name, of course.)

BTW Jeff, I've been meaning to ask you this:

How on earth do you have time to give such thorough answers to so many people on the forums???

I'm really amazed!!

Jeff Moden SSC Guru Points: 1004355 More actions · Answer 12

ggraber (11/12/2008)
BTW Jeff, I've been meaning to ask you this:
How on earth do you have time to give such thorough answers to so many people on the forums???
I'm really amazed!!

Heh... I just like doing this stuff... maybe a little too much.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004355 More actions · Answer 13

ggraber (11/12/2008)
I was really looking for some basic guidelines.
I really have come to love the readability of CTEs.
I just wanted to make sure I wasn't compromising performance.

I guess if I had to write some basic guidelines, I'd say that CTE's generally have the same performance as Derived Tables and are sometimes faster. Because of that, their extreme readability, and their logical top down logical flow, I can't see using anything else in 2k5. I imagine there are exceptions, but I just haven't found one yet. Of course, I try to avoid recursive CTE's as much as I do any form of RBAR.

So, I'd say, rest easy... you're not compromising performance by using CTE's over derived tables. However, if you need the same result set from a CTE in more than one Select in the same proc, a nicely indexed Temp table can pretty much blow away CTE performance over the long haul.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004355 More actions · Answer 14

Well I'll be damned... didn't even notice until just now... 12,000+ posts... Steve owes me a shirt or two! 😛

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)