September 17, 2005 at 10:07 pm
A person I work with runs an index analysis during a period of heavy load which consists of updates and queries of a set of tables.
He claims analyzing the tables and indexes will speed up the *current* jobs. I can't think of a valid reason why this would be so since the statements are in fact run a number of times in the same job.
Let's take an example to amplify what I'm saying:
A job is running which does a series of selects like this:
Start job:
Jobs loops based on selection criteria to come up with a result set and for each row in the result set
select rows from tableA
select matching rows from tableB, tableC, tableD.
End job:
Now a bunch of these jobs are running at the same time and were submitted to kick off at the same time, say 4pm.
At 5:30pm tables A, b, c, and D and their corresponding indexes are analyzed with a 13% sampling rate.
The jobs finish about 9pm.
There's some updating of the tables that is also going on at the same time which includes a nice dose of new rows but far under the 20% I've read about causing an autoupdate of stats.
Frankly, I'm more worried that he's teaching this approach to newbies in the shop and I believe there's a lot of wrong in doing it this way, locks, dirty reads and inconsistent data among them.
I also cannot see how he claims the performance improves since the SQL statements are cached while the jobs are running which to me indicates that the access paths chosen at the beginning of the job will still be used.
So basically I don't see how analyzing the tables and indexes in the middle of the jobs will help. Tables A, B, C and D are rather large 100-2000k rows each so they're not buffered or in the cache that I can see.
If someone could shed a light on this claim and either debunk it or explain how it may be helpful to do this in the middle of jobs, I'd love to know how this works. I can understand how this would work if you did it before the jobs were run but I don't see how it will affect it while running.
September 19, 2005 at 2:40 pm
By "indexes are analyzed with a 13% sampling rate" I assume you mean updating statistics. The only thing you can say for sure is that doing this during a heavy load will slow down your server. Locking would be a big concern. The update may have some benefit, but I would say the burden of proof is on the person doing this to show some effect. It depends on how many rows are being added or updated, and how the distribution of the new data compares to the old distribution. There is also a qestion about whether the update has any effect if all the jobs' execution plans have already been compiled. I have seen cases where bad counts in sysindexes (fixed with DBCC UPDATEUSAGE) affected execution plans, but I never tried fixing stats in the middle of a running job.
Try running it with and without the statistics update on alternate days, then compare run times after a week or two (depending on day-to-day variation in the workload). With a five-hour job I would expect the time difference to be very noticable if it is doing any good.
To prove definatively that the statistics update has an effect, I would put copies of the job in two Query Analyzer windows, and get the execution plan before and after the update. If the execution plan has not changed, and it doesn't run any faster, its a waste of time. You could also use DBCC SHOW STATISTICS before and after the update to look for significant differences. Remember that the jobs may still be stuck with their 4PM execution plans even if a new one would be generated after the stats update.
It would be a good idea to use SQL Profiler to capture a trace of database activity from 4-9PM, then use Query Analyzer's Index Analysis Wizard (during off-peak hours) to tune your database.
September 19, 2005 at 4:29 pm
Kevin,
I would highly recommend not running update statistics during heavy load periods. The reason I say this is because we have a daily process which updates one of our reporting db's. The process consists of insert..selects with many joins. We have found that updating statistics while this is going on adversely affected the time to completion for this update. We're talking hours. Quite honestly, it doesn't make sense to optimize/update statistics while such things are going on. Optimizing and analyzing beforehand may be a good idea but not during.
September 19, 2005 at 9:42 pm
I agree with both of you. I don't see how this changes the execution plan at all. There's a lookahead mechanism but I'll be damned if I can see any of this doing that any good.
I don't care much at this point since we're talking about test boxes. What I'm more concerned about is someone doing this on live production boxes as a result of this claim.
I suppose I'll eventually have to put my foot down on this but I just wanted to know if anyone else was crazy enough to try this and whether I was missing some secret or undocumented feature of the database.
You are both quite right to make much the same observations as I have. I omitted much of that in my original post since I didn't want to shade the opinions by poisoning the well beforehand.
I also agree that the burden of proof is on the person making the claim and that there needs to be a baseline for the measurement.
I cannot believe the execution plan would change either.
The other suggestion of analyzing the tables before the runs would work nicely but nobody is willing to communicate just when the jobs are run at this point let alone which tables will be affected and to what degree new records will impact the existing statistics.
At this point I'm just going to leave it at trying to get certain tables updated nightly without anyone getting their feathers ruffled. After that I'm going to get that stupid idea out of their head about ever doing it in production.
I'll certainly welcome anyone explaining how it could benefit performance but I'll remain skeptical until I see this run in a sandbox where it can be accurately measured.
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply