March 18, 2015 at 1:30 pm
So, I've been reading the discussion on the latest Stairway article about CTEs (http://www.sqlservercentral.com/Forums/Topic1660966-34-1.aspx), and a bunch of people are talking about how CTEs are performance hogs. And now I'm really curious (and I don't have a dataset large enough to test this, and I don't actually have anything I need at the moment)...
Let's say that you have a SQL Server back-end and a Microsoft Access front-end (like most of my setups). In Microsoft Access, there are a lot of situations where stored procedures simply will not work for the task at hand - most notably, there's no way to use stored procedure results as a (direct) data source for a report. For that, well, there are a few different options, but definitely the most time-effective in terms of billable programming hours and highest data security and lowest network bottleneck, is to use table-valued-functions and pass-through queries.
So, AFAIK, there are two choices in SQL Server, in that case: inline table-valued function, or multi-statement table-valued function.
So here's my question (finally 😛 ... sorry about that): if CTEs are performance hogs relative to temp tables, and multi-statement table-valued functions are performance hogs relative to inline functions (as I've heard in one of my other threads, here: http://www.sqlservercentral.com/Forums/Topic1664381-1292-1.aspx), and, AFAIK, inline functions can't use temp tables, then which is better: a multi-statement table-valued function that uses temp tables, or an inline function that uses CTE's?
I'm guessing it depends on the frequency of run, right? Like if the potential CTE in question has GROUP BYs and aggregate functions and is being performed against a big table, it's probably better to use a multi-statement table value function and temp tables? But... does it really make a difference? Unless the CTE is also a correlated subquery, wouldn't it just run once, regardless of whether it's writing to a temp table or being run within the query itself?
Just curious. Thanks! 🙂
March 18, 2015 at 1:46 pm
Update: looks like I posted too soon. The consensus in that thread now seems to be that CTEs, in and of themselves (as long as they're not recursive) are no worse than regular subqueries (as long as they're not correlated subqueries)... that they are, in fact, the same. Correct?
Sorry about posting these general questions without waiting for the question to answer itself. :ermm:
March 18, 2015 at 2:26 pm
Katerine459 (3/18/2015)
So, I've been reading the discussion on the latest Stairway article about CTEs (http://www.sqlservercentral.com/Forums/Topic1660966-34-1.aspx), and a bunch of people are talking about how CTEs are performance hogs. And now I'm really curious (and I don't have a dataset large enough to test this, and I don't actually have anything I need at the moment)...
Here's how you can build one real, real fast using CTEs. Thus proving that CTEs, when used correctly can be used to create very fast queries.
IF OBJECT_ID('tempdb..#lds') IS NOT NULL DROP TABLE #lds;
CREATE TABLE #lds (lds_id int not null, lds_txt char(36) not null);
GO
-- here's a great example of where CTEs kick butt
-- this creates a million rows of data in ~1sec
WITH
L1 AS (SELECT n = 1 UNION ALL SELECT 1),--2 rows
L2 AS (SELECT n = 1 FROM L1 a CROSS JOIN L1 b),--4 rows
L3 AS (SELECT n = 1 FROM L2 a CROSS JOIN L2 b),--16 rows
L4 AS (SELECT n = 1 FROM L3 a CROSS JOIN L3 b),-- 256 rows
L5 AS (SELECT n = 1 FROM L4 a CROSS JOIN L4 b),-- 65K rows
ITally AS
(SELECT n = ROW_NUMBER() OVER (ORDER BY (SELECT 1))
FROM L5 a CROSS JOIN L5 b)-- 4billion+ rows
INSERT #lds
SELECT TOP (1000000)
ROW_NUMBER() OVER (ORDER BY (SELECT 1)), newid()
FROM ITally ;
Let's say that you have a SQL Server back-end and a Microsoft Access front-end (like most of my setups). In Microsoft Access, there are a lot of situations where stored procedures simply will not work for the task at hand - most notably, there's no way to use stored procedure results as a (direct) data source for a report. For that, well, there are a few different options, but definitely the most time-effective in terms of billable programming hours and highest data security and lowest network bottleneck, is to use table-valued-functions and pass-through queries.
Inline only. multiline TVFs are always bad for many, many reasons. Never use them. That's my advice.
So here's my question (finally 😛 ... sorry about that): if CTEs are performance hogs relative to temp tables
False, absolutely false. 100% false. Unless you can provide an example, I will aver that this is false.
, and multi-statement table-valued functions are performance hogs relative to inline functions
True, absolutely and completely true. iTVFs can be considered parameterized views. If you understand how views work, one that can accept parameters is pretty special. multi-statement ITVs are something totally different and inferior. Create and iTVF and an mTVF that does the same thing then highlight both queries in SSMS, right-click and select "display estimated execution plan" to understand whey mTVS are so terrible.
which is better: a multi-statement table-valued function that uses temp tables, or an inline function that uses CTE's?
An inline function that uses CTEs. Period. (unless your iTVF is using a recursive CTE for counting then they are both bad and now we're debating which of these two terrible choices is worse).
I'm guessing it depends on the frequency of run, right?
Wrong. Though you can write bad iTVFs you can never write a good mTVFs because they are always bad.
Like if the potential CTE in question has GROUP BYs and aggregate functions and is being performed against a big table, it's probably better to use a multi-statement table value function and temp tables?
CTEs don't have any negative impact on big tables. That's folklore. Again, mTVFs are always bad. Though I have never seen a good article about this topic - read both splitter articles in my signature area. Note that they don't use mTVFs. Read any Itzek Ben Gan book - note that, for TVFs, he always uses inline table value functions. This is not a coincidence.
-- Itzik Ben-Gan 2001
March 18, 2015 at 9:54 pm
Had a second to kill. I put together a test to demonstrate why multi statement TVFs (mTVF) are terrible
Using TempDB (a DB I know you have) this code will create a 1,000,000 row table with sample data. It will also create an inline table values function (iTVF) and a mTVF that does the same thing. They both take a literal string as a parameter (@pattern) and tell us if that pattern exists in the lds_txt column.
USE tempdb
GO
IF OBJECT_ID('tempdb..lds') IS NOT NULL DROP TABLE lds;
CREATE TABLE lds (lds_id int not null, lds_txt char(36) not null);
GO
-- (1) Create the sample data
WITH
L1 AS (SELECT n = 1 UNION ALL SELECT 1),--2 rows
L2 AS (SELECT n = 1 FROM L1 a CROSS JOIN L1 b),--4 rows
L3 AS (SELECT n = 1 FROM L2 a CROSS JOIN L2 b),--16 rows
L4 AS (SELECT n = 1 FROM L3 a CROSS JOIN L3 b),-- 256 rows
L5 AS (SELECT n = 1 FROM L4 a CROSS JOIN L4 b),-- 65K rows
ITally AS
(SELECT n = ROW_NUMBER() OVER (ORDER BY (SELECT 1))
FROM L5 a CROSS JOIN L5 b)-- 4billion+ rows
INSERT dbo.lds
SELECT TOP (1000000)
ROW_NUMBER() OVER (ORDER BY (SELECT 1)), newid()
FROM ITally ;
ALTER TABLE lds
ADD CONSTRAINT pk_lds_x PRIMARY KEY(lds_id);
GO
-- (2) Create and iTVF
IF OBJECT_ID('tempdb.dbo.mTVF_CheckString') IS NOT NULL DROP FUNCTION dbo.mTVF_CheckString;
GO
CREATE FUNCTION dbo.mTVF_CheckString(@pattern varchar(5))
RETURNS @x TABLE
(
lds_id int not null,
lds_txtvarchar(36) not null,
pattern_txt varchar(5) null,-- user can pass the function a null value
has_pattern bit
)
AS
BEGIN
INSERT @x
SELECT
lds_id,
lds_txt,
@pattern,
CASE WHEN charindex(@pattern, lds_txt) = 0 THEN 0 ELSE 1 END
FROM dbo.lds;
RETURN;
END
GO
IF OBJECT_ID('tempdb.dbo.iTVF_CheckString') IS NOT NULL DROP FUNCTION dbo.iTVF_CheckString;
GO
-- (3) Create and iTVF
CREATE FUNCTION dbo.iTVF_CheckString(@pattern varchar(5))
RETURNS TABLE
AS
RETURN
SELECT
lds_id,
lds_txt,
@pattern AS pattern_txt,
CASE WHEN charindex(@pattern, lds_txt) = 0 THEN 0 ELSE 1 END AS has_pattern
FROM dbo.lds;
GO
No some code to test each function.
-- (4) Let's test these guys...
SET NOCOUNT ON;-- cleaner output
-- let's clean up the query cache
DBCC FREEPROCCACHE with no_infomsgs;
DBCC DROPCLEANBUFFERS with no_infomsgs;
DECLARE @pattern varchar(5) = '123-a';
SET STATISTICS TIME ON-- let's time these queries
PRINT char(10)+'Inline TVF:';
SELECT *
FROM dbo.iTVF_CheckString(@pattern)
WHERE has_pattern = 1;
PRINT char(10)+'Multi-line TVF:' +char(32);
SELECT *
FROM dbo.mTVF_CheckString(@pattern)
WHERE has_pattern = 1;
SET STATISTICS TIME OFF
GO
and the results:
SQL Server Execution Times:
CPU time = 516 ms, elapsed time = 521 ms.
Multi-line TVF:
SQL Server Execution Times:
CPU time = 2506 ms, elapsed time = 2537 ms.
Both functions return the same result set but the iTVF does so 5 times faster than the mTVF.
If you examine the code for each you will notice that the iTVF is simple returning a select statement. The mTVF, on the other hand, is:
(1) Creating a temp table named @x
(2) Inserting the contents of my query into that table
(3) Returning the result set.
To better understand how evil mTVFs are let's look at the query plan. If you run the test above in SSMS with "Include Execution Plan" on you will see this:
The iTVF utilizes the Clustered index, performs a simple Computer Scalar operation and returns the result set. Now look at the mTVF...
Yuck! the mTVF does a TABLE SCAN, and makes 2 calls to the mTVF.
Normally I don't use the option to "Display Estimated Execution plan" option but it's worth doing here because it reveals even more ugliness about mTVF's.
The mTVF actually has a "sub excecution plan"... You can see an index scan, then a table insert in the "sub plan", then, in the main plan, there's that table scan then all this data is tied together using the sequence operator and finally returned to you.
This should help explain why the iTVF is 5 times faster than the mTVF.:-P
Edit: 2nd image was not showing up. Fixed.
-- Itzik Ben-Gan 2001
March 19, 2015 at 6:28 am
Katerine459 (3/18/2015)
Unless the CTE is also a correlated subquery, wouldn't it just run once
And who said that a correlated subquery doesn't run just once?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
March 19, 2015 at 10:21 am
GilaMonster (3/19/2015)
Katerine459 (3/18/2015)
Unless the CTE is also a correlated subquery, wouldn't it just run onceAnd who said that a correlated subquery doesn't run just once?
I read that here: http://www.sqlservercentral.com/articles/Stairway+Series/105972/
March 19, 2015 at 10:34 am
Katerine459 (3/19/2015)
GilaMonster (3/19/2015)
Katerine459 (3/18/2015)
Unless the CTE is also a correlated subquery, wouldn't it just run onceAnd who said that a correlated subquery doesn't run just once?
I read that here: http://www.sqlservercentral.com/articles/Stairway+Series/105972/
This statement is wrong (and trivially provable to be wrong)
It will be run once for each candidate row selected in the outer query.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply