March 28, 2011 at 9:06 am
Hi all,
I've come across a (at least for me) strange behavior of Sql Server. If I run a query, using two temp table, I get ~900'000 reads. If I run the same query with two table variables, I get only 700 reads.
Here is the Query:
select internbelnum = b#internbelnum,
zeilennbr = ve_tbreferenz#bemerknbr,
anzeigenr = ve_entitylink#anzeigenr,
textbaustein = ve_tbstamm#dokutext,
zeilentyp = 40,
gruppierung = case when ve_entitylink#enttyp = 1060 then 2 else 1 end, -- textbausteine mit enttyp 1060 gehoeren zu az zeilen
refnum = ve_tbreferenz#refnum
from @belege b, ve_entitylink, ve_tbreferenz, ve_textbausteine, ve_tbstamm, sprachauswahl
where ve_entitylink#oberentid = b#internbelnum
and ve_entitylink#oberenttyp = 20
and ve_tbreferenz#bemerknbr = ve_entitylink#entid
and ve_tbreferenz#enttyp = ve_entitylink#enttyp
and isnull#ve_tbreferenz#intbemnum, 0# > 0
and ve_tbreferenz#bemerknbr not in #select z#zeilennbr from @zeilen z where z#zeilentyp = 40#
and isnull#ve_tbreferenz#bemerkung, ''# like ''
and ve_textbausteine#textbausteinnum = ve_tbreferenz#intbemnum
and ve_textbausteine#mandid = @mandidin
and sprachauswahl#base = 1
and ve_tbstamm#textbausteinnum = ve_textbausteine#textbausteinnum
and ve_tbstamm#mandid = @mandidin
and ve_tbstamm#lcid = sprachauswahl.fallback
Some of the tables have many rows #ve_entitylink, ve_tbstamm#.
Here if I substitude @belege and @zeilen with temp tables, the performance is far worse.
(belege has 4 rows and zeilen 20 rows)
Why is that like that?
How could I improve the version with temp tables?
Thx for hints (I'm sure there is a explanation for this behavior).
Reto Eggenberger
March 28, 2011 at 9:17 am
It looks like your query has hash marks where it should have periods and parentheses. Either that, or your using some version of SQL other than Microsoft's T-SQL.
When you mention the quantity of reads, do you mean on your I/O statistics, or something in an execution plan, or something else?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
March 28, 2011 at 12:01 pm
You're right. Somehow the points and parentheses got mutaded during copy-paste.
I use SQL2008. An the reads come from profiler.
The excecution plans for the queries are also different. The query with table variables uses index seeks. The other has lots of table scans.
Maybe I could create indexes on the temp tables. But I think with that few rows, this shouldn't make such a difference.
greetings
Reto E.
March 28, 2011 at 1:58 pm
May I ask why you prefer temporary tables above table variables?
Almost all the books I have read about SQL Server state that it's better to use table variables than temporary tables.
http://www.sqlservercentral.com/articles/Temporary+Tables/66720/
March 28, 2011 at 2:28 pm
jeedee (3/28/2011)
Almost all the books I have read about SQL Server state that it's better to use table variables than temporary tables.
http://www.sqlservercentral.com/articles/Temporary+Tables/66720/%5B/quote%5D
Oddly enough, if what you say is true, then I strongly disagree with all of those books. Even Wayne's fine article (which you provided a link to) draws no conclusion as to which is better other than "It Depends".
--Jeff Moden
Change is inevitable... Change for the better is not.
March 28, 2011 at 2:34 pm
In fact, I'll state that I normally avoid Table Variables because they don't persist during troubleshooting and they don't act like Temp Tables would. Here's an example I recently documented for myself from a post here on SSC where just changing from a Table Variable to a Temp Table fixes the problem. And, to be sure, Table Variables are NOT a "memory only" structure any more than Temp Tables are a "disk only" structure... they both start out in memory and spill to disk if they don't fit in memory.
Run the following more than once and see what I mean.
/**********************************************************************************************************************
The goal is to delete 10 randomly selected rows from a table. It doesn't work with a Table Variable alone.
This is one of those places where SQL Server sometimes loses it's mind and it's known as "Halloweening". It's when SQL
Server basically decides to make an "internal Cross-Join" to solve a problem. For every row in Table "A", it makes a
full scan of Table "B". If you don't think so, take a look at the table scan symbol in the lower right corner of the
Actual Execution Plan and look at the arrow coming out of it. It has 256 actual rows coming out of it which is the
number of rows in the table times itself (16 in this case).
In this case, there's a bit of obfuscation of the problem caused by the TOP 10... if you read all 16 rows from the table
and take the TOP 10, you should get what you want EXCEPT that it's doing that 16 times. If you take the Top 10 of the
same set of 16 rows of data in random order, you end up with more or less than 10 unique ID's being chosen because of
the random sort. Even though the Inner Join that follows that will "uniquify" the ID's there are still more than 10
which is why you end up deleting more or less than 10.
There are three fixes for this. One is to trick SQL Server into doing the right thing by using an INNER MERGE JOIN.
Another thing you can do is use a Temp Table so SQL Server can make a better "guess" at the Execution Plan where it will
throw in an extra table spool to sort the mess out. The reason it'll make a better guess with the Temp Table is because
it can actually estimate the rows instead of SQL Server estimating that there's just one row in the table (which won't
need the table spool) because of the fact that Table Variables are estimated as having just one row. You can prove
that by using a third possible fix... force a recompile for the DELETE by using OPTION(RECOMPILE) and it'll work
correctly every time.
As a side bar, this (being evaluated as a single row no matter how many rows it has) is why I don't use Table Variables
in ANY T-SQL except in functions and special "rollback proof" code (very rarely need) even if they are occasionally
faster. I just don't trust the damned things. They're also a PITA to troubleshoot because they don't persist data in
SSMS.
The explanation for a QOD should be that it'll delete more or less than 10 rows in an unpredictable fashion because the
Halloweening will cause the SELECT TOP 10 to run and return 10 random ID's once for each row in the table and there may
be more or less than 10 unique SomeValueues returned by those (in this case), 16 executions.
Run this a dozen times and see what you get.
**********************************************************************************************************************/
--===== Create a table variable to store the example data in.
DECLARE @t TABLE (RowNum INT IDENTITY(1,1), SomeValue VARCHAR(9))
;
--===== Populate the table with example data
INSERT @t
(SomeValue)
SELECT 'a' UNION ALL SELECT 'b' UNION ALL SELECT 'c' UNION ALL SELECT 'd' UNION ALL
SELECT 'e' UNION ALL SELECT 'f' UNION ALL SELECT 'g' UNION ALL SELECT 'h' UNION ALL
SELECT 'i' UNION ALL SELECT 'j' UNION ALL SELECT 'k' UNION ALL SELECT 'l' UNION ALL
SELECT 'm' UNION ALL SELECT 'n' UNION ALL SELECT 'o' UNION ALL SELECT 'p'
;
--===== Try to do the random delete of ten rows
DELETE t
FROM @t AS t
INNER JOIN (
SELECT TOP 10 RowNum FROM @t ORDER BY NEWID()
) AS b
ON b.RowNum = t.RowNum
;
--===== Show that it didn't work
SELECT @@ROWCOUNT AS RowsDeleted
;
SELECT COUNT(*) AS RowsKept FROM @t
;
--Jeff Moden
Change is inevitable... Change for the better is not.
March 28, 2011 at 2:37 pm
jeedee (3/28/2011)
Almost all the books I have read about SQL Server state that it's better to use table variables than temporary tables.
http://www.sqlservercentral.com/articles/Temporary+Tables/66720/%5B/quote%5D
I strongly disagree. Could you provide a reference from said books? As Jeff stated, not even the article from Wayne recommends one above the other. Each case depends on many variables as to whether one will work better than the other - and both (temp table and table variable) really should be tested.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
March 28, 2011 at 2:54 pm
For example Microsoft SQL Server 2008 R2 Unleashed, under the "General T-SQL Performance Recommendations" section. There (page 1655) I read the following:
You should use table variables instead of temporary tables in stored procedures whenever
possible or feasible. Table variables are memory resident and do not incur the I/O overhead
and system table and I/O contention that can occur in tempdb with normal temporary
tables. However, remember that table variables exist only for the duration of the SQL
batch or stored procedure in which they are defined.
Of course, according to the above quote, it depends (just like Jeff Moden said) if table variables have advantages over temporary tables based on the case.
However, due to the fact that I'm reading these guidelines in a "general performance recommendation section" I would assume table variables are preferred in most cases.
Furthermore if I delve into this matter on the Internet, most people seem to be in favor of a particular option regarding this matter.
But perhaps I should start making use of "it depends!" more often starting from now. 😉
March 28, 2011 at 3:07 pm
jeedee (3/28/2011)
For example Microsoft SQL Server 2008 R2 Unleashed, under the "General T-SQL Performance Recommendations" section. There (page 1655) I read the following:You should use table variables instead of temporary tables in stored procedures whenever
possible or feasible. Table variables are memory resident and do not incur the I/O overhead
and system table and I/O contention that can occur in tempdb with normal temporary
tables. However, remember that table variables exist only for the duration of the SQL
batch or stored procedure in which they are defined.
Of course, according to the above quote, it depends (just like Jeff Moden said) if table variables have advantages over temporary tables based on the case.
However, due to the fact that I'm reading these guidelines in a "general performance recommendation section" I would assume table variables are preferred in most cases.
Furthermore if I delve into this matter on the Internet, most people seem to be in favor of a particular option regarding this matter.
But perhaps I should start making use of "it depends!" more often starting from now. 😉
It looks like somebody needs an errata submission for this book. Table variables are not entirely memory resident. They do incur IO cost and can be quite the bottleneck in your queries if working with large data sets. Wayne and Gail Shaw both did a great job on proving that one with their research on the matter.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
March 28, 2011 at 4:53 pm
jeedee (3/28/2011)
For example Microsoft SQL Server 2008 R2 Unleashed, under the "General T-SQL Performance Recommendations" section. There (page 1655) I read the following:You should use table variables instead of temporary tables in stored procedures whenever
possible or feasible. Table variables are memory resident and do not incur the I/O overhead
and system table and I/O contention that can occur in tempdb with normal temporary
tables. However, remember that table variables exist only for the duration of the SQL
batch or stored procedure in which they are defined.
Of course, according to the above quote, it depends (just like Jeff Moden said) if table variables have advantages over temporary tables based on the case.
However, due to the fact that I'm reading these guidelines in a "general performance recommendation section" I would assume table variables are preferred in most cases.
Furthermore if I delve into this matter on the Internet, most people seem to be in favor of a particular option regarding this matter.
But perhaps I should start making use of "it depends!" more often starting from now. 😉
This is why I don't recommend any books to anyone. The quote you dug out is proof enough that even well known and generally trusted authors make what are, IMHO opion, some pretty glaring mistakes. Combine that with the glaring error cause by table variables in the code I posted a couple of posts above, it brings two things to mind... yes, most definitely, "It Depends". The other is, "One test is worth a thousand expert opinions." 😉
Thanks for the feedback jeedee.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 28, 2011 at 4:58 pm
Sergiy,
If you're still out there somewhere, thanks again for the wonderful pearl you gave me years ago... it's saved my hiney more than once...
[font="Arial Black"]"A Developer must not guess... a Developer must KNOW!" [/font]:-)
--Jeff Moden
Change is inevitable... Change for the better is not.
March 29, 2011 at 1:45 am
Hi all
I did some further investigation.
When I delete the "zeilentyp = 40" part (see --<<---) of the select statement, I get only about 1'100 reads instead of 110'000 reads!
But this is only a constant and shouldn't impact performance like that, right?!?
What might be the reason, that the query optimizer generates a completely different execution plan for those two nearly identical queries?
select internbelnum = b.internbelnum,
zeilennbr = ve_tbreferenz.bemerknbr,
anzeigenr = ve_entitylink.anzeigenr,
textbaustein = ve_tbstamm.dokutext,
--zeilentyp = 40, --<<---
gruppierung = case when ve_entitylink.enttyp = 1060 then 2 else 1 end, -- textbausteine mit enttyp 1060 gehoeren zu az zeilen
refnum = ve_tbreferenz.refnum
from #belege b
inner join ve_entitylink
on ve_entitylink.oberentid = b.internbelnum
and ve_entitylink.oberenttyp = 20
and ve_entitylink.mandid = @mandidin
and ve_entitylink.enttyp in (60,1060)
inner join ve_tbreferenz
on ve_tbreferenz.bemerknbr = ve_entitylink.entid
and ve_tbreferenz.enttyp = ve_entitylink.enttyp
and isnull(ve_tbreferenz.intbemnum, 0) > 0
and ve_tbreferenz.bemerkung is null
inner join ve_textbausteine
on ve_textbausteine.textbausteinnum = ve_tbreferenz.intbemnum
and ve_textbausteine.mandid = @mandidin
inner join sprachauswahl
on sprachauswahl.base = 1
inner join ve_tbstamm
on ve_tbstamm.textbausteinnum = ve_textbausteine.textbausteinnum
and ve_tbstamm.mandid = @mandidin
and ve_tbstamm.lcid = sprachauswahl.fallback
where ve_tbreferenz.bemerknbr not in (select z.zeilennbr from #zeilen z where z.zeilentyp = 40)
March 29, 2011 at 3:28 am
Hi all
I was able to dramatically improve the procedure. But I had to rewrite the whole section. I didn't want to do this (since it's not an application we wrote). But the vendor wasn't able to bring up a solution ...
To original procedure had the above stated select statement 3 times (one for the invoice language, one for fallback language, one for base language).
Now I combined those 3 selects into a single one using ISNULL() to fall back onto the higher language level.
Problem solved.
by the way:
If I used a large amout of data with the above mentioned query, there was no difference between the temp table and the table variable! The table variable version was only faster with a small set of data.
I guess because the table variable got written down to disk with the large data sets...
Grettings
Reto E.
March 29, 2011 at 7:40 am
jeedee (3/28/2011)
Almost all the books I have read about SQL Server state that it's better to use table variables than temporary tables.
http://www.sqlservercentral.com/articles/Temporary+Tables/66720/%5B/quote%5D
Well, since you referenced my article, I'd like to point out that it does not say that it's better to use one over the other. (It does say that MS recommends table variables.) I do state that you need to evaluate the differences between the two, and to do testing.
There are two reasons why I rarely use table variables:
1. Lack of statistics, etc. makes sql create a execution plan with the table variable having just 1 row. If you happen to be on SQL 2008 SP2, and use the OPTION (RECOMPILE) query table hint, then it can see just how many rows are in the table... but if you do a where clause on it, it then guesses a row quantity. That guess appears to be about 35% of the # of rows in the table variable - and can again cause an inappropriate execution plan to be generated.
2. Debugging - it's easier to fill a temp table once, and then just run selects off of it in SSMS. For table variables, you have to declare & populate it first every time you want to run a query against it.
In short, MS says that it's better to use table variables over temp tables. Most people don't agree with this blanket statement. There are benefits to tables variables, but there are also plenty of valid reasons to avoid them.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
March 29, 2011 at 7:48 am
CirquedeSQLeil (3/28/2011)
It looks like somebody needs an errata submission for this book. Table variables are not entirely memory resident. They do incur IO cost and can be quite the bottleneck in your queries if working with large data sets. Wayne and Gail Shaw both did a great job on proving that one with their research on the matter.
Actually, Gail did a great EXCELLENT job of proving this with her research - and she has graciously allowed me to use that in my presentations. Please visit her blog at http://sqlinthewild.co.za/index.php/2010/10/12/a-trio-of-table-variables/[/url] for where you can have proof of the data in table variables being written to the data file.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
Viewing 15 posts - 1 through 15 (of 23 total)
You must be logged in to reply to this topic. Login to reply