February 4, 2010 at 5:06 pm
The issue I’m seeing is that when I join a table variable to a persistent table, the query plan is scanning my persistent table instead of using the non-clustered index. I discovered that if I run the exact same code, except using a temp table instead of a table variable, the query plan uses the non-clustered index on my persistent table instead of scanning the table. I’ve also noticed that a DISTINCT TOP X in the SELECT statement seems to be contributing to the issue.
I’ve provided the T-SQL code necessary to reproduce the issue. If anyone has any input, I’d be delighted to hear an explanation.
Thanks,
Adam
--Create our test table
CREATE TABLE [dbo].[myTest](
[rowid] [int] IDENTITY(1,1) NOT NULL,
[mydata] [varchar](100) NULL,
CONSTRAINT [PK_myTest] PRIMARY KEY CLUSTERED
([rowid] ASC
))
--Create our nonclustered index
CREATE NONCLUSTERED INDEX [IX_mydata] ON [dbo].[myTest]
([mydata] ASC)
--Populate our table with some data (this should take about a minute)
declare @total int, @current int
set @current = 1
set @total = 100000
while @total >= @current
begin
insert into myTest (mydata) values (cast(@current as varchar(10)) + ' row data')
set @current = @current + 1
end
--Join our test table to a temp table. This uses the nonclustered index.
create table #temp (mydata varchar(100) NULL)
insert into #temp (mydata)values ('54 row%')
insert into #temp (mydata)values ('9510%')
select distinct top 100 m.rowid, m.mydata--This will do an index seek using my nonclustered index
--select m.rowid, m.mydata--This will do an index seek using my nonclustered index
from myTest m
inner join #temp i on m.mydata like i.mydata
drop table #temp
--Now join our test table to a table variable. The SELECT DISTINCT TOP 100 does not use the index. Instead, it will scan the table.
declare @temp table(mydata varchar(100) NULL)
insert into @temp (mydata)values ('54 row%')
insert into @temp (mydata)values ('9510%')
select distinct top 100 m.rowid, m.mydata--**This will table scan**
--select m.rowid, m.mydata--This will do an index seek using my nonclustered index
from myTest m
inner join @temp i on m.mydata like i.mydata
February 4, 2010 at 5:19 pm
One of the differences between table variables and temp tables is that no statistics are kept on table variables, while statistics are kept on temp tables. In the absence of such statistics, the optimizer may e choosing a more conservative (slower) execution plan.
You could always try giving it a hint. Or alternatively, you might try making the column of your temp table a primary key.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
February 4, 2010 at 7:09 pm
The Dixie Flatline (2/4/2010)
One of the differences between table variables and temp tables is that no statistics are kept on table variables, while statistics are kept on temp tables. In the absence of such statistics, the optimizer may e choosing a more conservative (slower) execution plan.You could always try giving it a hint. Or alternatively, you might try making the column of your temp table a primary key.
I agree with Bob.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
February 5, 2010 at 7:15 am
On our actual code (not my test code) we chose to use a hint to force the query to use the index and that works. But I just don't understand what's going on.
I get that SQL keeps statistics on temp tables, but no on table variables. But I'm not clear how statistics on my temp table (or lack thereof on my table variable) is going to help SQL decide whether or not it should scan my persistent table or use the NC index.
Also, I'm failing to understand why the DISTINCT TOP X makes a difference when I'm using a table variable.
Thanks for the quick replies, I appreciate the feedback.
Thanks,
Adam
February 5, 2010 at 7:31 am
Oh yeah, and I forgot to mention that if you change the LIKE to "=" then SQL will use the index. So if you change this statement that does a table scan:
select distinct top 100 m.rowid, m.mydata
from myTest m
inner join @temp i on m.mydata like i.mydata
To this statement, it will use the nonclustered index:
select distinct top 100 m.rowid, m.mydata
from myTest m
inner join @temp i on m.mydata = i.mydata
February 5, 2010 at 9:58 am
I get that SQL keeps statistics on temp tables, but no on table variables. But I'm not clear how statistics on my temp table (or lack thereof on my table variable) is going to help SQL decide whether or not it should scan my persistent table or use the NC index.
Well, since SQL doesn't keep statistics on table variables, the optimizer assumes that the table variable only has one row, and makes a plan accordingly. Which is to just scan the table for the one row, instead of going to the index. Then when it hits 100/1000 rows, the plan is no longer the correct plan.
(I've seen temp tables outperform table variables with as little as 36 rows.)
You might be interested in this article: Comparing Table Variables to Temporary Tables[/url]
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
February 5, 2010 at 10:57 am
But the issue isn't in scanning the table variable, or the temp table. The issue is that SQL is scanning my static table that has 100,000 rows (or in the actual PROD environment, it has over 5 million). When we use a table variable, SQL does a table scan on my static/physical table.
My table variable only has 2 rows. Scanning it is fine. Scanning my physical table with 5 million rows is the problem.
Thanks,
Adam
February 5, 2010 at 12:10 pm
Post the execution plans?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
February 5, 2010 at 12:17 pm
I've attached the SQL plans, named accordingly. Please let me know if there's anything else I can post that would be helpful.
Thanks,
Adam
February 5, 2010 at 12:35 pm
The estimated row count on your actual physical table is wrong. SQL thinks that the physical table contain only 1000 rows and hence that a scan of the physical table isn't so bad. The row estimate is wrong in both, so it's not just due to the table variable.
Try a stats update or index rebuild. See if it fixes that inaccuracy.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
February 5, 2010 at 1:20 pm
I rebuilt the clustered index, and attached the new plans. I also attached a plan for the table variable after removing the "DISTINCT TOP 100" from the SELECT statement. This is interesting to me. Notice the plan with the DISTINCT in there; it's estimating 1 row for our table variable, but the actual number of rows is 199,999. My static table has 100,000 rows, and my table variable has 2 rows. So it almost seems like the plan is doing some sort of cartesian join??
Notice the plan for the table variable where I've removed the DISTINCT TOP 100. In this plan, the actual number of rows returned from the table variable is 2. And, the plan is doing a nonclustered index seek for my static table.
-Adam
February 5, 2010 at 3:03 pm
None of the three execution plans attached to your last post use the temp table. All three use the table variable. The first and third are identical queries, yet show different execution plans. What changed between those two? Was the table variable not indexed until the last query?
--
-------------------------------------------------------------------------------------------------------------
-- temp_table2 execution plan
-------------------------------------------------------------------------------------------------------------
select m.rowid, m.mydata--This will do an index seek using my nonclustered index
from myTest m
inner join @temp i on m.mydata LIKE i.mydata
-------------------------------------------------------------------------------------------------------------
-- table_variable2 execution plan
-------------------------------------------------------------------------------------------------------------
select distinct top 100 m.rowid, m.mydata--**This will table scan**
--select m.rowid, m.mydata--This will do an index seek using my nonclustered index
from myTest m
inner join @temp i on m.mydata LIKE i.mydata
-------------------------------------------------------------------------------------------------------------
-- table_variable2_NoDISTINCT execution plan
-------------------------------------------------------------------------------------------------------------
select m.rowid, m.mydata--This will do an index seek using my nonclustered index
from myTest m
inner join @temp i on m.mydata LIKE i.mydata
--
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
February 5, 2010 at 7:50 pm
amoericke (2/5/2010)
On our actual code (not my test code) we chose to use a hint to force the query to use the index and that works. But I just don't understand what's going on.I get that SQL keeps statistics on temp tables, but no on table variables. But I'm not clear how statistics on my temp table (or lack thereof on my table variable) is going to help SQL decide whether or not it should scan my persistent table or use the NC index.
Also, I'm failing to understand why the DISTINCT TOP X makes a difference when I'm using a table variable.
Thanks for the quick replies, I appreciate the feedback.
Thanks,
Adam
My question would be... if the Temp Table does the trick so much better, why are you good folks hell bent on using Table Variables? Unless you need the ability to do a rollback without rolling back a temporary structure, there's not much of a need for table variables in this situation.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 5, 2010 at 8:02 pm
You got me, Jeff. Me and table variables were just a summer thing, y'know. That said, any time someone has the ah-hah! moment that execution plans are somehow important, I think they should be encouraged. Speaking of which...
Amoericke: Grant Frichey has written an excellent book on execution plans. You should order it if you really want to get into this. Doing your homework will pay you great dividends in years to come, because we aren't going to discuss all the variations online. We just don't have the time.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
February 5, 2010 at 9:35 pm
If you're still curious, I uploaded the correct plan for the temp table (sorry for getting the wrong plan uploaded earlier).
I'm not hell bent on using a table variable, I was just posting a question to see if anyone could explain why my static table would be scanned when joined to a table variable vs. using an index when joined to a temp table.
I appreciate all the feedback.
-Adam
Viewing 15 posts - 1 through 15 (of 21 total)
You must be logged in to reply to this topic. Login to reply