update statistics with fullscan -- 11 hours or more

Question

update statistics with fullscan -- 11 hours or more

Indianrock

SSC-Insane

Points: 20333
More actions
May 22, 2011 at 7:41 am

#237890

It was decided a few years back that we must update statistics with fullscan ( 100% sampling ). I need to revisit this and find an approach to analysis that determines if this is really necessary.
With large clients due to convert to sql server ( from foxpro) this year, the reindex/update stats window following weekend conversions is much too long.
On a normal ( no conversion ) weekend reindexing takes about 4 hours. Update stats is run in two blocks, first allowing six hours for the column-based "_WA_SYS" stats, plus five hours for the normal index-related stats. All of these processes use commonly available stored procedures that work on indexes needing work the most, first.
I just spoke with some senior DBAs at the company that bought us recently. They have much larger databases and update stats with sampling rates of about 20%. I believe they mentioned a billion rows -- ours is 37 million and growing.
So I need to revisit why we were getting poor execution plans when not using fullscan. I have Sql Sentry's Plan Explorer and it does a better job of revealing estimated rows than management studio. Do I then run the actual query to see what the row count "really" is?

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 1

The cookie cutter answer I can give you is that you must update stats so that you get the correct plans. But you obviously know that already. That means you can get away with less than 100% sample and less than 100% of the stats weekly.

#1 The server waits for 5% change in the table before auto_update kicks in. 5% of 37 M is a freaking lot of changes.

#2 For ledger or history tables where you have Today or getdate() as the default value and a leading column in any indexes, the stats won't know that you have data for today nor how much so the plan can quickly get screwed up to a scan instead of a seek.

#3 I personnally do full scan here because it only takes 30 minutes / day to run. However I've been wanting to find a way to only update what needs updating based on # of rows changed. I haven't found a way yet but I'll be tagging along for the ride here.

#4 Don't update stats of the objects you did a full reindex. The server always does an update stats with fullscan for those.

Do you have any window to do update stats daily to lighten the load on the big window?

Jeffrey Williams SSC Guru Points: 90313 More actions · Answer 2

One of the things you should look at is using sp_updatestats instead. This procedure will identify those statistics that need to be updated and only run against those.

With that said, be aware that by default this procedure will perform the update with the default sampling rate. That default sampling rate cannot be changed - it is calculated.

You can force the sampling rate to be used with the RESAMPLE option. All this does is tell the procedure to use the last sampling rate that was defined for that statistic. So, if the last sampling rate was FULL - it will use that rate and perform a full.

I have 2 processes I run on a weekly basis for my 1.2TB database. The first process loops through all statistics and identifies any statistics that were auto updated (which also uses the default sampling rate). If found, it updates those statistics with a full scan. The next process then uses sp_updatestats to update all statistics that need to be updated.

The above processes take about 8 hours for this database - sometimes up to about 10 hours.

The next step, which I have not completed yet is to identify statistics that need to be updated more often. Once I have that identified, I will create a process that runs at least daily - and possible more often to update those statistics.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

Indianrock SSC-Insane Points: 20333 More actions · Answer 3

We use commonly available stored procedures to do reindexing and update stats. Stats Approach Here http://www.sqlstatistics.com/page/2/

Both do indexes first that need the work the most. So, since reindexing runs first, any index completely rebuilt should not be touched by the following stats process. We did get approval recently for weeknight one-hour run of update stats. That's been running for several weeks but has no impact on the hard-coded 11-hour runtime on the weekends.

My current proposal is to take incremental steps until we see a performance decrease on Monday morning. For example change update stats to a 90 % sample rate and reduce the time allowed to 10 hours instead of 11. Monitor and proceed with more of the same. We have dev environments but none really set up to mimic a production load.

Management keeps saying "how do huge companies with VLDB handle this?"

Well all I know is that the DBAs I referred to above manage it with a fairly low sampling rate on a database much larger than ours. I have to conclude that it is either something specific to our database schema/indexes, the fact that our database does both reporting and OLTP ( so indexes are a mixed bag ) or that we are not understanding the execution plans and really don't need fullscan.

Grant Fritchey SSC Guru Points: 398913 More actions · Answer 4

It's fairly rare to need to do a FULL SCAN on all your stats. It's probably only a few tables that really need that. Unfortunately those are usually the larger and/or more frequently accessed tables. You can eliminate most tables, but the savings will be marginal at best. Instead I'd look to try to break the process up over a longer period of time, run some on Friday not, some on Saturday and some on Sunday. That way the pain is spread around a bit more.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Indianrock SSC-Insane Points: 20333 More actions · Answer 5

Thanks Grant. Time to read your book on execution plans. It's sounding like no magic bullet here.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 6

Indianrock (5/23/2011)
Thanks Grant. Time to read your book on execution plans. It's sounding like no magic bullet here.

there's no magic bullet for any global tuning process... always takes a little time and effort.. 🙂

Grant Fritchey SSC Guru Points: 398913 More actions · Answer 7

Ninja's_RGR'us (5/23/2011)
Indianrock (5/23/2011)
Thanks Grant. Time to read your book on execution plans. It's sounding like no magic bullet here.
there's no magic bullet for any global tuning process... always takes a little time and effort.. 🙂

+1

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Indianrock SSC-Insane Points: 20333 More actions · Answer 8

The latest suggestion I've received is that the bulk of our problem is related to configuration of our SAN ( netapp ). Currently, average read wait per IO for our production client database is 153 milliseconds with average write wait of 21.2 For tempdb it's 20.9 on reads and 1785 on writes. I believe the high tempDB write wait is primarily due to maintenance and certain nightly "batch" type processes ( skewed much higher than typical weekday numbers ).

Tempdb does consist of eight 10GB data files and one log file, all "on their own luns." However this notion of 'separate luns' is unclear when it comes to something like a Netapp. One description of how such a disk device handles IO requests is "dipping your hand in a bathtub of water." I.E. where the water/data comes from is unknown.

Jeffrey Williams SSC Guru Points: 90313 More actions · Answer 9

To validate your IO - you should be looking at the Avg Disk Reads/sec and Avg Disk Writes/sec counters. Open perfmon on the server and start capturing those numbers.

For reads, a good value is an average less than 20 - ideally it should be around 10. Writes should average around 6 but would be okay at around 10 or so.

With a Netapp SAN, the questions you should be asking of your SAN guys is not what disks are presented, but what aggregates are being used and what else is using those aggregates as well as what head-end's (filers) your system is routed through.

If there are too many competing systems access the aggregates, this could cause SAN contentions. If there are too many systems directed through a single filer - that also could cause contention.

One last item to ask would be how many drives are being used in those aggregates. I believe the maximum is still 64 - but that could have been fixed by now.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

Indianrock SSC-Insane Points: 20333 More actions · Answer 10

I'm not following as far as average disk reads/writes per second. If we mean our system is busy and doing 200 reads per second, then a less busy system at 100 reads per second would be better. If we mean it is capable of handling so many reads per second, then a higher number would be better.

What I showed was from the performance dashboard Historical IO report which shows, among other things, the number of IOs performed for each database since last restart ( April 10th in our case ). From that I posted the average read and average write wait in milliseconds.

Jeffrey Williams SSC Guru Points: 90313 More actions · Answer 11

You can read this blog: http://blogs.technet.com/b/vipulshah/archive/2006/11/30/understanding-perfmon-counters-while-troubleshooting-sql-server-performance-issues.aspx

The actual counters are: Avg Disk sec/Read and Avg Disk sec/Write (sorry - mixed them up, which I always do unless I actually open perfmon).

What these indicate is how fast each read/write operation is taking. A good IO sub-system will respond in less than 10ms for each read (on average). It doesn't matter how many read or write operations - but, how fast is each one?

The actual number of IOPS that can be performed is dependent upon the type of SAN, the fabric, the HBA configuration, the number of disks backing the LUN, the cache on the SAN, etc...

So, again - open perfmon on the server, select PhysicalDisk and monitor the Avg Disk sec/Read and Avg Disk sec/Write. If those numbers are consistently above 20ms then you have a problem. If you have a problem, then you can start digging into the other counters to see where the bottleneck actually is.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs