February 5, 2010 at 11:18 pm
Adam, I am looking at the execution plan you just sent that uses a temp table. I'm comparing it to the version you sent at 1:20pm which uses a table variable. BOTH use an index seek on the same index. If not for the TOP and DISTINCT in the latest plan, they would be IDENTICAL. The query plans will not show us the part of your code that builds the table variable, so try this experiment.
I know you've been through this once, but the sequence is critical. Be sure to follow the sequence exactly and make notes at each step before proceeding to the next.
1. Create a table variable without a primary key, no TOP and no DISTINCT. Save the query plan.
2. Change the code to use a temporary table with no primary key. Run it, save the query plan, and compare it to plan 1.
3. Change the table variable to have a primary key. Run the same code again. Save the query plan.
4. Change the code to have a primary key on the temp table. Run it, savcompe the query plan and compare it to plan 3.
Next compare plan 1 to plan 3.
Finally, compare plan 2 to plan 4.
What differences existed between 1 and 2? Between 3 and 4?
What difference did the index make between plan 1 and plan 3? Between 2 and 4?
The index seek only happens when the optimizer can determine a minimum and a maximum value that could possibly satisfy the LIKE condition. (The execution plan starts with the minimum value and scans the index until it reaches the maximum value.) I believe you will find that the presence of an index is what gives that information to the optimizer.
This does not retreat from anything you've been told about table variables not having statistics like temp tables. The optimizer needs those statistics to choose intelligently even though it my sometimes get it right where the plan is trivial.
Let us know what you find out.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
February 6, 2010 at 7:00 am
Maybe I did something wrong, but all 4 scenarios used the index. Without the DISTINCT TOP X in there, the table variable (and the temp table) has always used the index. The DISTINCT TOP X using the table variable is the only scenario that does the table scan on my static table:
declare @temp table(mydata varchar(100) NULL)
insert into @temp (mydata)values ('54 row%')
insert into @temp (mydata)values ('9510%')
select distinct top 20 m.rowid, m.mydata
from myTest m
inner join @temp i on m.mydata LIKE i.mydata
I've tried that same code with a primary key on the table variable, but it still does a scan on my static table (myTest). The change in the plan is that is does a clustered index scan on the table variable instead of just a table scan (on the heap).
Did you run through your sequence and see something different? My orignal post has all the code you'd need to test this and see for yourself. My code just creates a table called myTest, populates it with 100,000 rows and then joins to it using a temp table and a table variable.
Thanks for you input.
February 6, 2010 at 8:52 am
The Dixie Flatline (2/5/2010)
You got me, Jeff. Me and table variables were just a summer thing, y'know. That said, any time someone has the ah-hah! moment that execution plans are somehow important, I think they should be encouraged.
My apologies... and I think I get it, now. I wanted to know why they (OP) were trying to force something to work that likely can't be made to work especially when there was a viable alternative that works very well. Sometimes I'm a wee bit slow on the uptake and I think I just figured it out thanks to you... it's no longer just a production issue... it's a learning issue now and I absolutely agree with the deep dive on the execution plan. Thanks Bob.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 6, 2010 at 9:23 am
I haven't run it myself, but I your last post clarifies things a bit. Here's what I believe to be happening.
1. Without an index (primary key) over the data, the optimizer is not aware of the minimum and maximum values that could satisfy the LIKE condition. Possibly it doesn't even know how many rows are in that table.
2. In the absence of an index, the optimizer does not develop the minimum and maximum values and has no predicates upon which to base an index seek. It chooses the safe route of just scanning the entire primary table and, for each row, scans the table variable to see if it will pass the LIKE test against any row in the table variable. That's right: It scans the non-indexed table variable thousands of times.
You may be asking: Couldn't they at least check the number of rows in the table variable, and if small, do a summary query to get the min/max values? I think it should, because the plan DOES know the number of rows in the primary table and the nested loop means that it will have to scan the table variable thousands of times. One more wouldn't hurt, and it would set up the predicates required for the index seek. (You might even do that yourself, using a CTE or subquery, but it's easier just to change the table variable to a temp table.)
Adam, I encourage you to get some of the books and keep studying on execution plans. It will pay dividends in how your queries perform. But if SQL were a vehicle, the optimizer is an automatic transmission. Don't try to wrestle with it, beyond putting it in low when you are towing heavy loads. Cooperate with the optimizer by giving it good structures. Constraints and indexes give the optimizer the statistics it needs to choose wisely when developing execution plans. For right now, the answer remains that table variables don't have the statistics to support more intelligent decisions by the optimizer.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
February 6, 2010 at 10:10 am
No apologies necessary, Jeff. No offense taken. Adam just struck me as that rare man who wants to learn to fish.
By the way, that's a compliment, Adam. We get really tired of trying to help people who don't want to learn and understand, they just want someone to feed them an answer. Most of the volunteers want to teach, not to do someone else's work for them. You are asking "why?" as well as "how?" and that's a good thing. But you have some thick books to read.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
February 8, 2010 at 11:56 am
I bounced this around with a coworker and with the input he provided, the input you all provided, and some additional testing, I think I've finally wrapped my mind around this.
To simplify my coworker's input; the TOP X is in part causing the scan. SQL is assuming that it can scan the table to find the TOP X values faster than it could seek the index.
I disabled 'auto create statistics' on TempDB on my test server, and reran my query that uses the temp table. I verified that with 'auto create statistics' off, SQL does not create any stats on my temp table and the plan scans the table (same as when we use the table variable). With 'auto create statistics' turned back on, SQL creates stats on my temp table and the plan uses the index again.
To summarize what was observed with the table variable, the only time the query would NOT use the index is when we had DISTINCT TOP X. Furthermore, X had to be a relatively large value, and DISTINCT was irrelevant in whether or not SQL would scan or use the index. If we provide SQL a small value, the theory is that SQL guesses (since it doesn't have any stats on the table variable) that it can scan the table to return the TOP X values faster than it could if it used the index. Once we make X a very large value, SQL guesses that it's more efficient to use the index than to scan the table.
I had also noted that if we did the join using "=" instead of LIKE, SQL will use the index instead of scanning. My theory is that SQL is guessing that with the LIKE operator, there will be many possible matches and so with TOP X being relatively small, it won't take long to scan the table and find the first X matches. Once we change it from LIKE to "=", that will greatly reduce the number of possible matches, so SQL guesses that scanning to find the TOP X rows is probably not going to be the most efficient method (there's now a better chance we may have to scan the entire table to find the TOP X exact matches), and so it decides to use the index.
I believe the root of this is due to the table variable not having stats. I think the oddities of when SQL would choose to scan vs. use the index were due to SQL guessing which operation would be faster based on how many results it needed to return (TOP X) and how selective the predicate was (LIKE vs. "=").
Thanks to all who contributed to this forum.
-Adam
February 8, 2010 at 12:19 pm
One comment: You are finding the Top (x) Distinct values. SQL has to develop that DISTINCT set before it can apply the TOP operator. The query plans will bear that out.
Yes, the optimizer is "guessing" at the number of rows that will be considered in the distinct list based on the like or equality. Remember that it DOES have stats on the primary table, and knows whether or not the column to be matched is unique. But you've got it. Congratulations!
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
Viewing 7 posts - 16 through 21 (of 21 total)
You must be logged in to reply to this topic. Login to reply