I’ve been looking into the behaviour of statistics in SQL Server 2008 – so I’d like to present an overview from a DBA perspective.
What are statistics?
Statistics are a database scoped object that are used by the query optimiser in creating query plans. They describe the distribution of values in a column or columns and help the optimiser choose indexes to use. The optimiser uses a cost based weighting to choose a best plan, and the statistics are used to estimate the number of rows that will be processed by a given operator, and hence the cost of that operator.
Statistics are built by sampling the data values, the ratio of rows sampled to total rows is the sample rate.
A statistic object is automatically created for every index on a table, and if the auto_create_statistics database option is set then the optimiser will create a statistic object for every column in a query predicate. A statistics object can also be created with the CREATE STATISTICS ddl.
Statistics can be examined at a database level by querying the system view sys.stats. This query will list all stats on user tables in the current database, and uses the stats_date() function to determine when the stat was last updated.
select o.type_desc ,o.name as 'Object' ,s.name as 'Stat' ,stats_date(s.object_id,stats_id) as LastUpdated from sys.stats s inner join sys.objects o on s.object_id = o.object_id where o.type_desc <> 'SYSTEM_TABLE' --exclude internal tables
And an individual statistics object, the so-called stat blob, can be examined using DBCC SHOW_STATISTICS(). Have a look at the links at the end of this post if you want to know more detail about the data surfaced by DBCC SHOW_STATISTICS().
Automatic statistic maintenance:
If you look at a query plan and find large discrepancies between estimated and actual row counts – there is a good chance your statistics are out of date.
There are three database scoped options that directly impact statistic maintenance: auto_create, auto_update and auto_update_async.
Auto create dictates whether the query optimiser will create statistics on columns used in predicates during optimisation if required.
Auto update dictates whether the optimiser will automatically update an out of date or stale statistic if it is detected at optimisation time. (An update statistics event will also trigger a recompilation of any cached plans that might use the statistic). SQL server keeps track of modifications at a column level using the undocumented internal table sys.sysrscols and this is used to determine whether the statistic is out of date.
Auto update async means that the statistic updates occur asynchronously on a background thread. This helps to mitigate an issue I’ll discuss shortly.
Possible issues with automatic maintenance:
Auto create and auto update statistics are the SQL Server defaults and should be left enabled unless there is a compelling reason to disable them.
There are a few reasons that a DBA might choose to manually manage the maintenance of statistics:
1. There is a regular maintenance window during which statistics are updated.
2. On large tables – auto update events can take too long and cause performance issues.
3. Again, for large tables, the automatically calculated sample rate may be too small for useful statistics. Manually updating statistics allows the DBA to control sample rates. Additionally, an index rebuild operation will also rebuild the associated statistic with a 100% sample rate – likely to be a far better than an automatic update for large tables.
4. As mentioned, a statistics update is initiated by the optimiser during the query optimisation phase. This means the update is part of the original transaction. If the update takes too long and pushes the transaction past a query timeout threshold – the transaction, including the statistics update will be cancelled and rolled back. If the transaction is attempted again – the statistics are still stale, and an update is initiated…
Using the auto update async option should ameliorate this issue by spawning the update onto a background thread while the original query proceeds with the stale stats.
5. The threshold at which a statistic is declared stale for tables greater than 500 rows is 500 + 20% of total rows. For large tables this can be too infrequent. Highly skewed updates can cause statistics to become misleading long before this threshold is reached.
Overriding automatic behaviour:
Statistics are always automatically created for indexes. It is possible to suppress the creation of the additional column statistics that the optimiser creates on predicate fields at the database scope.
ALTER DATABASE <database name> SET AUTO_CREATE_STATISTICS OFF
And to override the automatic updating behaviour at a database scope.
ALTER DATABASE <database name> SET AUTO_UPDATE_STATISTICS OFF
The automatic updating behaviour can be overridden at a statistic level by using the NORECOMPUTE option with the CREATE or UPDATE STATISTICS statements or by using sp_autostats.
This will show the auto update status of all indexes on a given table or indexed view:
exec sp_autostats 'table or indexed view name'
This will disable automatic updating for an individual statistic:
exec sp_autostats 'table or indexed view name' ,'off' ,'statistic name'
NOTE: You can disable automatic updating for individual statistics if this is enabled at the database level. You can NOT enable automatic updating for individual statistics if this is disabled at the database level.
Manual statistic maintenance:
If you look at a query plan and find large discrepancies between estimated and actual row counts – there is a good chance your statistics are out of date.
Statistics can be manually updated with the sproc sp_updatestats or with the UPDATE STATISTICS statement.
Running sp_updatestats will update all stale statistics in the current database.
Using the resample parameter:
EXEC sp_updatestats 'resample'
will cause the updates to use the last sample rate for each statistic. If resample is not specified the default sample rate calculations are used.
NOTE: only statistics considered stale will be updated by the sproc. Consider the output when run on my test machine:
The UPDATE STATISTICS statement can be executed at the table or indexed view or individual statistic level. This allows granular control over the sample rate and the statistic will be updated whether SQL considers it to be stale or not.
Index maintenance and statistics:
The distribution of data is not altered when an index is rebuilt or reorganised so there is no requirement for statistics to be updated as a result of these operations. However statistics are updated/rebuilt with a 100% sample rate when an index is rebuilt but are not updated not when an index is reorganised. It is, therefore, counter-productive to update stats after a rebuild.
However, index maintenance is likely to be occurring in a maintenance window, so it does make sense to maintain statistics where necessary during this window.
Some trace flags:
2371 This trace flag was added in SQL Server 2008 R2 SP1 and lowers the threshold at which statistics are considered stale. This is recommend for SAP systems.
205 & 8721 These trace flags will log auto update activity to the SQL Server logs to help troubleshoot issues with statistics.
Conclusion:
Statistics are a critical component of SQL Server’s query optimisation system. By default the maintenance of statistics should be left to SQL Server. However, there is the potential to closely manage select statistics as part of regular maintenance to improve performance.
A full treatment of SQL Server statistics is well beyond the scope of this overview. The interested reader is directed to the links below, alongside the various links in the text.
References:
http://msdn.microsoft.com/en-us/library/dd535534(v=sql.100).aspx
http://msdn.microsoft.com/en-us/library/ms190397(v=sql.100).aspx
http://technet.microsoft.com/en-US/sqlserver/gg508890.aspx