October 8, 2008 at 5:12 pm
Rather then having a script update all statistics in a database I only want to update statistics that are out of date. I assume with 2005 I still run DBCC SHOW_STATISTICS to get index statistics, but how do I determine if an index should be updated?
Is there a DMV I should be using instead of SHOW_STATISTICS?
Thanks, Dave
October 8, 2008 at 6:28 pm
Bol says, use
STATS_DATE ( table_id , index_id )
... tells you when were the statistics last updated.
October 8, 2008 at 9:46 pm
I stated my question incorrectly. The date shows when a statistic was last updated, but that doesn't necessarily mean the statistics are not accurate. How do I determine when statistics should be updated?
Thanks, Dave
October 9, 2008 at 2:05 am
If you run sp_updatestats SQL tells you which statistics were updated. The algoritm SQL uses is documented on the internet, maybe somebody can provide the link?
Wilfred
The best things in life are the simple things
October 9, 2008 at 3:43 am
There's no easy way in 2005. The counters that keep track of changes are hidden.
SQL will automatically update stats on a column as soon as the number of changes to it hits 20% or the rows in the table + 500 rows. The only time when stats really need to be updated manually is if that change % is too high or if the samples scan isn't enough. Often those cases have to be identified manually.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
October 9, 2008 at 7:40 am
We have auto_update_statistics and auto_create_statistics turned off as a vendor requirement. We have to update them manually. For this reason I need to determine how to know when a statistic should be updated. I could update all statistics and not wory about whether or not they should be updated, but this system will grow very large and running a FULLSCAN will be time consuming. After one year the databases will be 500GB and will grow by about 250GB per year. The more precise I can be with updating only statistics that need to be updated and reindexing only databases with a certain level of fragmentation, the better off we will be. Defrag is the easy part, buy updating statistics based upon conditional logic has proved to be a challenge.
Thanks
October 9, 2008 at 8:03 am
What? Not enabling auto_create_stats is asking for a performance showstopper. (I don't think I'll have to tell you why)
Determine the overhead and time you'll have to spend on determine which statistics need to be updated, compared to the auto_stats executed by SQL.
It's true that SQL can decide to update the statistics when the threshold has arrived, but compare this minor overhead to accurate statistics (with its performance gain). My recommendation is to enable create/update statistics.
If you're really concerned about auto statistics you can:
- Turn on auto_update_statistics_asynchronously. This reduces the overhead for users.
- Create a ddl trigger to determine how often auto_stats fires.
Both options are well documented on this website
Wilfred
The best things in life are the simple things
October 9, 2008 at 8:36 am
I prefer to keep it turned on and have done so with all systems I support, but when a large vendor tells us to turn it off and we do not, there goes vendor support. Every performance problem we encounter the vendor will point to us not turning off auto update statistics. I don't want to go down that path. I am contacting them today for clarification, because the Microsoft links they reference to support their argument no longer exist, including KB 1291 (at least I cannot find it).
Here is their quote:
To provide the up-to-date statistics the query optimizer needs to make smart query optimization decisions, you will may want to leave the “Auto Update Statistics” database option on. This helps to ensure that the optimizer statistics are valid, helping to ensure
that queries are properly optimized when they are run.
But this option is not a panacea (see KB article 1291). When a SQL Server database is under very heavy load, sometimes the auto update statistics feature can update the statistics at inappropriate times, such as the busiest time of the day.
October 9, 2008 at 8:45 am
Yeah, I know that discussion.
Make sure you know the benefits of asynchronously update stats before calling them.
Good luck
Wilfred
The best things in life are the simple things
October 9, 2008 at 10:45 am
DBADave (10/9/2008)
We have auto_update_statistics and auto_create_statistics turned off as a vendor requirement. We have to update them manually.
I love vendors like that. They make my job (performance tuning) so easy. </sarcasm>
What you can do is look at the index usage DMV (sys.dm_db_index_usage_stats). There's a column there that indicates how many times the index has been updated, however it's only since the last restart of SQL). If you monitor that you should be able to calculate aprox how many changes have been made to a table. Compare that with the number of rows in the table and decide on a threshold to do a stats update.
Updating all stats with the fullscan option is usually overkill. I would suggest you do sampled updates by default and switch to fullscan for any tables that you know need more sampling. Signs of that would be queries whose exec plans show a large discrepancy between estimated and actual rows, where the stats have been updated recently and an update with fullscan fixes the problem.
Bear in mind that if you rebuild an index, that will update the stats on that column with fullscan. Defrag/reorg will not.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
October 9, 2008 at 11:00 am
sys.dm_db_index_usage_stats. I knew there had to be a DMV out there somewhere that contained index information. I can use that along with the following rule referenced by you and Microsoft.
SQL Server 2005 determines whether to update statistics based on changes to column modification counters (colmodctrs).
A statistics object is considered out of date in the following cases:
• If the statistics is defined on a regular table, it is out of date if:
1. The table size has gone from 0 to >0 rows.
2. The number of rows in the table when the statistics were gathered was 500 or less, and the colmodctr of the leading column of the statistics object has changed by more than 500 since then.
3. The table had more than 500 rows when the statistics were gathered, and the colmodctr of the leading column of the statistics object has changed by more than 500 + 20% of the number of rows in the table when the statistics were gathered.
• If the statistics object is defined on a temporary table, it is out of date as discussed above, except that there is an additional threshold for recomputation at 6 rows, with a test otherwise identical to test 2 in the previous list.
Table variables do not have statistics at all.
I also contacted the vendor to inform them their technical documentation may be outdated. Their links to Microsoft articles no longer work and the KB article they reference can't be found when searching the knowledge base. They sent me a newer document, but that one still contains some of the broken links along with a working link to a 2000 MS document. 2005 made significant changes to how statistics are maintained so I'm wondering if their engineers ever tested AUTO_UPDATE_STATISTICS in 2005. I'm guessing the answer is NO.
Thanks, Dave
October 9, 2008 at 11:13 am
I'd suggest a sliding percentage based on the table size. From experience, 20% is too high a threshold on larger (+1000000 row) tables
Just bare in mind that not all columns have indexes and hence you won't be able to calc threshold on a per-column basis. Since you're doing it manually, check for the no of mods to the clustered index (ie the entire table) and base the update off that. It's not as smart as 2005's algorithm, but it's probably the best you can do under these circumstances.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
October 9, 2008 at 4:10 pm
there is a column 'rowmodctr' in sysindexes which tracks the no of updates since the last update stats. Pretty sure this is what SQL uses for auto_update.
You could use this value against no of rows in table to decide whether stats need updating.
basic query to list most updated rows:
select rowmodctr, name, id, indid from sysindexes where rowmodctr > 0
order by rowmodctr desc
HTH
george
---------------------------------------------------------------------
October 9, 2008 at 8:39 pm
Thanks George. I'll give it a try.
October 10, 2008 at 2:11 am
george sibbald (10/9/2008)
there is a column 'rowmodctr' in sysindexes which tracks the no of updates since the last update stats. Pretty sure this is what SQL uses for auto_update.
It's what SQL 2000 and earlier used. SQL 2005 and higher use per-column counters which are hidden. The rowmodctr (and in fact the sysindexes view) are only included for backward compatibility and the rowmodctr is not guaranteed accurate.
From BoL:
In earlier versions of SQL Server, the Database Engine maintained row-level modification counters. Such counters are now maintained at the column level. Therefore, the rowmodctr column is calculated and produces results that are similar to the results in earlier versions, but are not exact
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 15 posts - 1 through 15 (of 22 total)
You must be logged in to reply to this topic. Login to reply