How to use sampling when updating statistics

Question

Post reply

How to use sampling when updating statistics

Indianrock

SSC-Insane

Points: 20333
More actions
March 18, 2011 at 1:47 pm

#145318

We've been updating statistics on the weekend using fullscan and a stored procedure that determines which ones need it most. Since we're at 35 million records and may double by next year, the weekend job is taking many hours. I need to understand how to use "sampled" updates without causing poor execution plans. ( in hopes of reducing runtime on this maintenance )
I have just added weeknight sql agent jobs that do one hour of fullscan updates using the same algorithm ( below ) as the weekend job.
This is the main query in the stored procedure that determines which stats need updating.
-- Make the queue of indexes to update:
IF OBJECT_ID('tempdb..##updateStatsQueue') IS NOT NULL
DROP table ##updateStatsQueue;
SELECT schemas.name AS table_schema,
tbls.name AS table_name,
i.name AS index_name,
i.id AS table_id,
i.indid AS index_id,
i.groupid AS groupid,
i.rowmodctr AS modifiedRows,
( SELECT MAX(rowcnt)
FROM sysindexes i2
WHERE i.id = i2.id
AND i2.indid < 2
) AS rowcnt,
STATS_DATE(i.id, i.indid) AS lastStatsUpdate,
'False' AS Processed
INTO ##updateStatsQueue
FROM sysindexes i
INNER JOIN SYSOBJECTS AS tbls ON i.id = tbls.id
INNER JOIN SYSUSERS AS schemas ON tbls.uid = schemas.uid
INNER JOIN INFORMATION_SCHEMA.TABLES AS tl ON tbls.name = tl.table_name
AND schemas.name = tl.table_schema
AND tl.table_type = 'BASE TABLE'
WHERE i.indid > 0
AND table_schema <> 'sys'
AND i.rowmodctr > 0
AND ( SELECT MAX(rowcnt)
FROM SYSINDEXES i2
WHERE i.id = i2.id
AND i2.indid < 2
) > 0;

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Jim Murphy SSCertifiable Points: 6263 More actions · Answer 1

Here is your answer: http://msdn.microsoft.com/en-us/library/ms187348%28v=sql.90%29.aspx

UPDATE STATISTICS

... will by default do a 20% sample (if I remember correctly).

UPDATE STATISTICS Production.Product(Products)

WITH SAMPLE 50 PERCENT;

Jim

Jim Murphy
http://www.sqlwatchmen.com
@SQLMurph

Indianrock SSC-Insane Points: 20333 More actions · Answer 2

Sorry, I wasn't clear enough. I have a general understanding of using sampling rates. The question is how to choose a sample percentage that will "get the job done" short of just using fullscan all the time.

Yes I can test on a DEV box, but looking for queries etc that help one determine what sample percentage is that sweet spot between wasted run time and tons of performance problems on Monday morning.

For large tables, millions of records, a low or default sample rate has a great chance of being ineffective and causing poor execution plans with, in our case, high cpu.

Jim Murphy SSCertifiable Points: 6263 More actions · Answer 3

Oh, pardon me. Going from rebuilding all stats, like it or not, to only building the stats where 'stuff has changed too much', will decrease your maintenance time; possibly a lot. This is assuming a FULLSCAN in each case. So you may not need to do a sample.

Call me paranoid, but I highly prefer a fullscan since this is the only way I have found to repair corrupt stats. Course, I think I have some childhood luggage from the SQL 6.x and 2000 days that I need to see a therapist about. I have not seen or heard much of 2005 stats getting corrupt as much.

Sometimes I'd feel comfortable with a 20% sample, knowing that the higher the sample, the more accurate the stats and the better the query optimizer can decide how to access the data. However, I have not [yet] run into many problems with a 20% sample, so long as I do a fullscan once a month. I have not done only a 20% sample (or any sample for that matter) WITHOUT a full scan somewhat regularly because I don't like crisis calls. I still bolt upright screaming in bed a few nights a year and my wife needs to rock me back to sleep. Ya, I need to do a DBCC SHRINKJIM(Head);

Anyway, I choose to break up my stats on my VLDB's to do a full scan on certain tables each night in the time allotted, and get 1/7th of the work done each night. However, if you don't have the luxury of an evening maintenance window, then you'll have to do something different.

What about your reindexing? When are you doing that and are you doing a rebuild or a reorg?

Jim

Jim Murphy
http://www.sqlwatchmen.com
@SQLMurph

Grant Fritchey SSC Guru Points: 398913 More actions · Answer 4

Honestly, on most systems, on most databases, I just use the defaults. It's pretty rare that suggesting a marginal change in sampling rates will make that big a difference to your stats. As a matter of act, I only use FULL SCAN on very targeted, usually problematic, tables or indexes.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Jim Murphy SSCertifiable Points: 6263 More actions · Answer 5

Good to know. Perhaps I'll sleep better tonight. Thanks.

Jim

Jim Murphy
http://www.sqlwatchmen.com
@SQLMurph

Indianrock SSC-Insane Points: 20333 More actions · Answer 6

I just started doing an hour per weeknight, and after some fiddling around, was able to confirm that something like 100 stats are getting done each night. I had to dig a bit to make sure it wasn't one of the "automatic stats" processes doing it. Eventually one of our tools revealed the exact sql and during my job runtime they were all 'FULLSCAN.' We're basically using the technique described in the link below except we have it always do fullscan. Sounds like when I have time I need to experiment on a dev box with various sample rates.

Indexes are done Saturday mornings using another proc that determines fragmentation, then either rebuilds or reorganizes depending on frag level. Since the stats routine looks for stats in need of work, I'm confident the stats updated by index rebuilds are not later "redone" by the stats job.

http://www.sqlstatistics.com/page/2/

Jim Murphy SSCertifiable Points: 6263 More actions · Answer 7

Right. Exactly. The reindex will handle index stats, but not column stats, so it sounds like you have a plan and a good strategy. I just wanted to bring up the reindex topic in here too since it does [some] stats updating.

Jim

Jim Murphy
http://www.sqlwatchmen.com
@SQLMurph