April 8, 2013 at 1:59 pm
So, I'm using my google foo to try to find some articles about why we should not be updating statistics and recompiling on a daily basis. It is pretty clear that with auto_update_statistics enabled, there is no reason whatsoever to recompile after the statistics update. However, I cannot find any information about how to tell when we should update the statistics. I would like to believe that we shouldn't have to at all, but I have to convince myself of that as well. Instead of just saying "well, statistics are created and updated manually, so we don't need to update them daily." I need some references. Google is only showing me how to do it and what it does... not how to determine when I should do it. Any help appreciated.
Jared
CE - Microsoft
April 8, 2013 at 2:08 pm
SQLKnowItAll (4/8/2013)
So, I'm using my google foo to try to find some articles about why we should not be updating statistics and recompiling on a daily basis. It is pretty clear that with auto_update_statistics enabled, there is no reason whatsoever to recompile after the statistics update. However, I cannot find any information about how to tell when we should update the statistics. I would like to believe that we shouldn't have to at all, but I have to convince myself of that as well. Instead of just saying "well, statistics are created and updated manually, so we don't need to update them daily." I need some references. Google is only showing me how to do it and what it does... not how to determine when I should do it. Any help appreciated.
As a rule of thumbs, a 20% change of data volume or data distribution on a particular table would be a good threshold for updating performance statistics. All execution plans referencing such a table should be invalidated automatically (at least Oracle does it this way) at the time fresh statistics are gathered therefore a new execution would trigger a compilation.
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.April 8, 2013 at 2:22 pm
for me, it depends on the size of the table;
for huge tables, it takes a lot less than 20% of the rows to make the statistics stale enough that they need to be refreshed manually;
`
I have a variation of this running thoughout the day, and don't seem to have stat problems anymore on our busier tables.
I put this together with some arbitrary limits, based on tye size of tables , so tables with 10^8 rows have different rules than 10^3 rows
i'm open for suggestions and improvements, of course!
--The Analysis: my crappy assumptions:
--UPDATE STATISTICS dbo.GMPVDET
--tables under 1000 rows, I'll leave then at 20%
--tables with more rows than that, I'll use an arbitrary sliding scale formula.
--formula to be modified based on analysis
SELECT X.*,
ISNULL(CASE
WHEN X.[Total Rows]<=1000
THEN
CASE
WHEN [Percent Modified] >=20.0
THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' WITH ALL, FULLSCAN --20% Small Table Rule'
END
WHEN [Percent Modified] = 100.00
THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' WITH ALL, FULLSCAN --100% No real Stats Rule'
--WHEN X.[Rows Modified] > 1000
--THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' WITH ALL, FULLSCAN --1000 Rows Modified Rule'
ELSE
CASE
WHEN X.[Total Rows] > 1000000000 --billion rows
THEN CASE
WHEN [Percent Modified] > 0.1
THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' WITH ALL, FULLSCAN -- 1B Big Table Rule'
END
WHEN X.[Total Rows] > 100000000 --hundred million rows
THEN CASE
WHEN [Percent Modified] > 1.0
THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' WITH ALL, FULLSCAN -- 100M Big Table Rule'
END
WHEN X.[Total Rows] > 10000000 --ten million rows
THEN CASE
WHEN [Percent Modified] > 2.0
THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' WITH ALL, FULLSCAN -- 10M Big Table Rule'
END
WHEN X.[Total Rows] > 1000000 --million rows
THEN CASE
WHEN [Percent Modified] > 5.0
THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' WITH ALL, FULLSCAN -- 1M Big Table Rule'
END
WHEN X.[Total Rows] > 100000 --hundred thousand rows
THEN CASE
WHEN [Percent Modified] > 10.0
THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' WITH ALL, FULLSCAN -- 100K Big Table Rule'
END
WHEN X.[Total Rows] > 10000 --ten thousand rows
THEN CASE
WHEN [Percent Modified] > 20.0
THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' WITH ALL, FULLSCAN -- 10K Big Table Rule'
END
END
END,'') AS [Statistics SQL]
FROM (
SELECT DISTINCT
DB_NAME() AS [Database],
S.name AS [Schema Name],
T.name AS [Table Name],
I.rowmodctr AS [Rows Modified],
P.rows AS [Total Rows],
CASE
WHEN I.rowmodctr > P.rows
THEN 100
ELSE CONVERT(decimal(8,2),((I.rowmodctr * 1.0) / P.rows * 1.) * 100.0)
END AS [Percent Modified]
FROM
sys.partitions P
INNER JOIN sys.tables T ON P.object_Id = T.object_id
INNER JOIN sys.schemas S ON T.schema_id = S.schema_id
INNER JOIN sysindexes I ON P.object_id = I.id
WHERE P.index_id in (0,1)
AND I.rowmodctr > 0
) X
WHERE [Rows Modified] > 1000
ORDER BY [Rows Modified] DESC
Lowell
April 8, 2013 at 4:14 pm
I have found that you need to examine the 'noise' in each table, and update accordingly. Now, you have to go to 'huge' before it matters, but as little as 0.5% in a couple of fact tables I've dealt with have been enough to throw off the entire system.
Here's what I've found was the easiest (conceptually). Do your standard checking for volume differences. I'm more used to older systems but the stuff above looks tasty. Then I used to keep an override table. If the table wasn't in the overrides, we used to (on weekends) restat anything at +10%. I had one table that got restatted at 0.000001 or something equally rediculous, and that was nightly.
It all depends on your system and what you need, that's why it's so hard to get a # to nail down.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
April 9, 2013 at 1:39 am
Fantastic script above! Thanks. Also, if you're using SQL 2008 R2 SP1 upwards, check out trace flag 2371 to see if its of any use.
Cheers
April 9, 2013 at 2:03 am
SQLKnowItAll (4/8/2013)
It is pretty clear that with auto_update_statistics enabled, there is no reason whatsoever to recompile after the statistics update.
There's never any need to recompile after a stats update no matter what the auto update settings are. Updating stats invalidates all plans, so all queries will recompile afterwards anyway
However, I cannot find any information about how to tell when we should update the statistics. I would like to believe that we shouldn't have to at all, but I have to convince myself of that as well. Instead of just saying "well, statistics are created and updated manually, so we don't need to update them daily." I need some references.
The auto update (which updates after 20% of the table has changed) is not enough, unless you're working on a tiny DB with minimal changes. Your two options are pretty much:
- Analyse and see what tables are prone to stale stats. This will require a lot of familiarity with the queries, checking their performance, watching for degradation, putting in manual stats updates for those tables sufficient that the queries running do not suffer from degraded performance due to stale stats.
- Scheduled stats updates of all tables on a regular basis.
The second is far easier and if you have the available maintenance time there's little reason not to do that. It's the times where you don't have adequate maintenance windows where you have to put in huge amounts of work to identify the queries that need manual updates.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
April 9, 2013 at 6:20 am
GilaMonster (4/9/2013)
SQLKnowItAll (4/8/2013)
It is pretty clear that with auto_update_statistics enabled, there is no reason whatsoever to recompile after the statistics update.There's never any need to recompile after a stats update no matter what the auto update settings are. Updating stats invalidates all plans, so all queries will recompile afterwards anyway
However, I cannot find any information about how to tell when we should update the statistics. I would like to believe that we shouldn't have to at all, but I have to convince myself of that as well. Instead of just saying "well, statistics are created and updated manually, so we don't need to update them daily." I need some references.
The auto update (which updates after 20% of the table has changed) is not enough, unless you're working on a tiny DB with minimal changes. Your two options are pretty much:
- Analyse and see what tables are prone to stale stats. This will require a lot of familiarity with the queries, checking their performance, watching for degradation, putting in manual stats updates for those tables sufficient that the queries running do not suffer from degraded performance due to stale stats.
- Scheduled stats updates of all tables on a regular basis.
The second is far easier and if you have the available maintenance time there's little reason not to do that. It's the times where you don't have adequate maintenance windows where you have to put in huge amounts of work to identify the queries that need manual updates.
Thanks for all of the replies! I think doing it regularly works for us. We currently do it daily, but I am trying to streamline it. The stored proc created years ago does the update on statistics and then runs sp_recompile, which I think is unnecessary and since it takes a schema lock has caused deadlocks on some other processes. I am tasked with analyzing old stuff and updating it. Sample of original below run in debug mode to show the dynamic script actually executed:
USE [DBA]
DECLARE @mytable_id INT
DECLARE @mytable VARCHAR(100)
DECLARE @owner VARCHAR(128)
DECLARE @sql VARCHAR(256)
SELECT @mytable_id = MIN(object_id)
FROM sys.tables WITH(NOLOCK)
WHERE is_ms_shipped = 0
WHILE @mytable_id IS NOT NULL
BEGIN
SELECT @owner = SCHEMA_NAME(schema_id), @mytable = name
FROM sys.tables
WHERE object_id = @mytable_id
SELECT @sql = 'UPDATE STATISTICS '+ QUOTENAME(@owner) +'.' + QUOTENAME(@mytable) +''
EXEC (@SQL)
SELECT @sql = 'EXEC sp_recompile '''+ QUOTENAME(@owner) +'.' + QUOTENAME(@mytable)+''''
EXEC (@SQL)
SELECT @mytable_id = MIN(object_id)
FROM sys.tables WITH(NOLOCK)
WHERE object_id > @mytable_id
AND is_ms_shipped = 0
END
My thought was to remove the whole sp_recompile on this bad boy as I believe it is not needed since the update statistics will invalidate most of the plans anyway and force a recompile. Thoughts? (Also, we I want to stop running this on Monday when indexes are rebuilt on Sunday and little to 0 data has changed).
Jared
CE - Microsoft
April 9, 2013 at 7:26 am
i use a combination of the date last updated and the amount of data modified
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply