November 9, 2015 at 8:07 am
Okay, now that I've click-bated the MVP's, knowing full well the answer is "It Depends", I would like to try and get a feel for what everyone uses as a sample size. 😀
I'm taking a look at this post and it got me thinking about flipping percentages on a dynamic UPATE STATISTICS query I'm building that joins against sys.sysindexes to get the rowcount. The goal is to get a query built like:
[font="Courier New"]UPDATE STATISTICS [schema].[object] [statsname] WITH <sample size>[/font]
I also added to the script a time component (eventually this will become a proc). So for any stats whose last update of statistics occurred more than X days ago, update them.
Is this time component mixed with Auto sample size a good way to approach this? SP_UPDATESTATS does not work because I confirm the stats date is not updated.
Thoughts on this approach as well as what sampling rules you tend to follow?
select quotename(SCHEMA_NAME(schema_id)) + '.' + quotename(object_name(S.object_id)) [TableName],
quotename(S.NAME) [StatsName],
STATS_DATE(S.object_id, S.stats_id) as [LastStatsUpdate], I.rowcnt [NumRows]-- , S.*
into #statsdump
from sys.stats as S
inner join sys.tables as T on T.object_id = S.object_id
inner join sys.sysindexes as I on I.id = S.object_id
where I.indid <= 1
declare @ageindays int
set @ageindays = 3 -- update anything older than N days
create table #work_to_do
(
rownum int identity(1,1) primary key clustered,
updatecommand varchar(max)
)
insert into #work_to_do(updatecommand)
select 'UPDATE STATISTICS ' + TableName + ' ' + StatsName + ' WITH SAMPLE ' +
CASE
when NumRows < 500000 then '100 PERCENT'
when NumRows < 1000000 then '50 PERCENT'
when NumRows < 5000000 then '25 PERCENT'
when NumRows < 10000000 then '10 PERCENT'
when NumRows < 50000000 then '2 PERCENT'
else '3000000 ROWS'
end +
' -- Table Rows: ' + cast(NumRows as varchar(max)) [UpdateCommand]
from #statsdump where laststatsupdate < getdate() - @ageindays
order by numrows desc
DECLARE @CMD varchar(max)
DECLARE UpdateStatsCursor CURSOR
FOR
SELECT UpdateCommand from #work_to_do order by rownum asc
OPEN UpdateStatsCursor
FETCH NEXT FROM UpdateStatsCursor into @CMD
WHILE @@FETCH_STATUS = 0
BEGIN
print(@cmd)
FETCH NEXT FROM UpdateStatsCursor into @CMD
END
CLOSE UpdateStatsCursor
DEALLOCATE UpdateStatsCursor
go
drop table #statsdump
drop table #work_to_do
Gaby________________________________________________________________"In theory, theory and practice are the same. In practice, they are not." - Albert Einstein
November 9, 2015 at 11:13 am
GabyYYZ (11/9/2015)
Okay, now that I've click-bated the MVP's, knowing full well the answer is "It Depends"
Yup, it does. Hence you'll get n different opinions in x posts (where n >=x)
I would like to try and get a feel for what everyone uses as a sample size. 😀
If I'm bothering to update stats manually, I'm going to use FULLSCAN unless I can't due to operational constraints. If I can't, I'll use the highest % I can get away with
it got me thinking about flipping percentages on a dynamic UPATE STATISTICS query I'm building that joins against sys.sysindexes to get the rowcount.
Don't user sys.sysindexes. It's deprecated, included only for backward compat with SQL 2000 and will be removed soon (I hope very soon). Also stats have not been based on the rowmodcnt since SQL 2000.
There's a DMV which returns the column modification count (what stats updates are based on now). I can't remember the name right now, but a browse through BoL will find it. Use that.
If you want the row count, sys.dm_db_partition_stats or sys.partitions both have it.
Is this time component mixed with Auto sample size a good way to approach this?
No. It gives no consideration to the data changes in the table. A table that's static can have year old stats and be fine. I know of systems where some tables needed stats updates every half an hour otherwise the query performance tanked. Age is NOT a good reflection of how stale stats are.
The code I've snipped basically repeats what SQL's auto sampling does, the larger the table, the smaller the %. If you're not going to use FULLSCAN, just use auto and let SQL pick the sampling size it wants. No need for you to recode the sampling selection yourself.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 9, 2015 at 11:18 am
If I have the time, I prefer to to do a full scan. Sampling a small percentage, unless you know your data has a standard distribution, is going to lead to inaccurate statistics and bad plans.
I don't think I'd do time-based updates of stats either because if the base data hasn't changed there isn't any reason to update the statistics because they haven't changed.
For SQL Server 2008 R2 SP2 and SQL Server 2012 SP1 and later I'd use a query something like this:
WITH StatisticsInfo
AS (
SELECT
QUOTENAME(SCHEMA_NAME(T.schema_id)) + '.' + QUOTENAME(OBJECT_NAME(S.object_id)) [TableName],
QUOTENAME(S.name) [StatsName],
STATS_DATE(S.object_id, S.stats_id) AS [LastStatsUpdate],
DDSP.rows,
DDSP.rows_sampled / CONVERT(DECIMAL(38, 4), DDSP.rows) AS SamplingRate,
DDSP.modification_counter,
DDSP.modification_counter / CONVERT(DECIMAL(38, 4), DDSP.rows) AS ModificationRate
FROM
sys.stats AS S
INNER JOIN sys.tables AS T
ON T.object_id = S.object_id
CROSS APPLY sys.dm_db_stats_properties(S.object_id, S.stats_id) AS DDSP
WHERE
DDSP.modification_counter > 0 AND
DDSP.rows > 1000
)
SELECT
StatisticsInfo.TableName,
StatisticsInfo.StatsName,
StatisticsInfo.LastStatsUpdate,
CASE WHEN StatisticsInfo.rows < 50000 AND
StatisticsInfo.ModificationRate > .19 THEN 1
WHEN StatisticsInfo.rows < 100000 AND
StatisticsInfo.ModificationRate > .14 THEN 1
WHEN StatisticsInfo.rows < 500000 AND
StatisticsInfo.ModificationRate > .09 THEN 1
WHEN StatisticsInfo.rows < 100000 AND
StatisticsInfo.ModificationRate > .04 THEN 1
WHEN StatisticsInfo.rows < 5000000 AND
StatisticsInfo.ModificationRate > .01 THEN 1
ELSE 0
END AS UpdateStats,
StatisticsInfo.SamplingRate,
StatisticsInfo.ModificationRate
FROM
StatisticsInfo;
Then I'd only update those stats that meet my criteria UpdateStats = 1 and like Gail has said I'd use FULLSCAN or AUTO. This query also gives you the last Sampling rate and you could use that if you aren't having issues with bad plans and you really want to control the sampling rate.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
November 9, 2015 at 11:27 am
Thanks Gail and Jack for your input. As I'm finally getting exposure to live 2014, I need to unlearn some things in 2008 (non-R2). I would use Ola's scripts, but there seems to be some administrative overhead on it. On a reindex + stats job run of 5.5 hours, 3 of those hours seem to be dedicated to just Updating Statistics, but when I look at the run times in CommandLog, they don't total anywhere near 3 hours. Thoughts on why this may be occurring?
This was the main reason I was interested in re-inventing the wheel and come up with my own script.
Gaby________________________________________________________________"In theory, theory and practice are the same. In practice, they are not." - Albert Einstein
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply