March 11, 2013 at 4:23 pm
Hello all,
I could use some help trying to better understand how stats are used in this situation.
I have a query that runs relatively poor (14 seconds); however the execution plan is actually quite good. It uses an index and performs a seek, no problems here. The overall plan is decent with some areas of improvement. Anyway, a developer of mine added an index and reduced the run time quite significantly (dropped to 3 seconds). Yet, after reviewing the index utilization, I found that the new index was not being touched (both DMV's and execution plan tell the same story) and instead it was still using the original index. So I then created a statistic which matched his index and I was able to achieve the same result in performance gains and yet my execution plan is 100% the same.
With the above said, I'm curious to understand a couple things.
1) How if at all, can you view which stats are touched during query execution?
2) Why would the optimizer not use the statistic already on the index it used within the execution plan? (yes, I did update stats)
3) What really determines how "auto create stats" takes effect?
Thanks
March 11, 2013 at 8:34 pm
1) How if at all, can you view which stats are touched during query execution?
Statistics are not used during execution. They are used in plan generation phase which happens just before the first execution (and can happen later also, after some events and tresholds). Optimizer takes various permutations, probably using almost all available statistics of objects involved to come-up fast with execution plan that is good enough (not necessary the best). Once generated, plan is normally stored in procedure cache and may be reused in subsequent executions. I'm not aware of any method to determine exact statistics optimizer touched during evaluation of different execution plans. In the final execution plan you can see "estimated" number of rows, cost, etc that is direct product of statistics. If estimated number of rows is very different than actual number of rows, statistics are usually not good enough. You should be aware what exactly statistics store (number of rows, selectivity of first 1, 2, .., n columns of statistics, histogram only on the first column of statistics).
2) Why would the optimizer not use the statistic already on the index it used within the execution plan? (yes, I did update stats)
Optimizer uses almost all available statistics. During index creation, statistics are also created on indexed columns - always, regardless of "auto create stats" parameter. Columns not covered with index might not have statistics, and columns covered with index but not as leading index column might not have histogram (histogram is useful only for highly skewed data).
3) What really determines how "auto create stats" takes effect?
If "Auto create statistics" is on, statistics will be created on columns not covered with any index, for which optimizer thinks is beneficial (fired on "missing column statistics" event). Auto created stats are always single-column statistics, so you sometimes might want to manually create multi-column statistics for correlated columns. Max 200 steps of histogram is sometimes not granular enough for really large tables so one might solve that with multiple filtered statistics to cover various ranges, each with 200 steps to get more granular total histogram.
Bottom-line is: statistics on unindexed columns may help optimizer to choose a better plan, even if indexes are the same! There are three types of statistics: created during index creation on indexed columns (you can't disable that), statistics created manually with "CREATE STATISTICS" command, statistics created automatically on "missing column statistics" event if "auto created statistics" is on.
Your execution plans are not 100% same. They might look the same from the outside, but details hidden in xml plan are surely different. You can verify that saving those plans as xml (right click, save as xml) and then compare those xmls with some text comparison tool.
You can use SSMS or dbcc show_statistics to see the stats, but much nicer and detailed view into statistics you will get with my add-in for SQL Management Studio: SQL XDetails. Links are in my signature.
March 12, 2013 at 10:32 am
Hello Vedran,
First off, thank you for the detailed response.
In regards to your response to #2. The execution plan is properly using an index; however my assumption is that it's not using the statistic created for that index and this is where I'm at a loss. The index required for this query touches 8 columns; however If I create a statistic on two of the eight columns in question (which are used for a JOIN), my query performance goes from 14 to 3 seconds. This tells me that the statistic created for that index in question, is not efficient?
You were correct. I saved my execution plans as XML and compared, they are wildly different to the point I'm not even sure where to focus in at this time.
Unfortunately time is against me at this current stage and I think at this point I'm simply trying to understand if behavior like this is expected/normal. In by that, having to create statistics where statistics are already available; however they're simply more narrow in scope based on the index.
March 12, 2013 at 10:42 am
If non-clustered index key columns are a, b, c, and table has clustered index key "id", one multi-column statistics will be created with the same name as index.
It will create histogram on column a, and describe density of this column combinations:
a
a, b
a, b, c
a, b, c, id
If you manually created multi-column statistics on (a, b), that will create histogram on column a, and density information on this column combinations:
a
a, b
If you manually created single-column statistics on a, and single-column statistics on b, that will create histogram on column a, histogram on column b, and density information on this column combinations:
a
b
You probably have different statistics created manually than the ones created by index.
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply