Clustered Index affect on FULLSCAN Update Statistics on Large tables

Question

Post reply

Clustered Index affect on FULLSCAN Update Statistics on Large tables

Wildcat91

SSC Veteran

Points: 299
More actions
April 29, 2009 at 3:34 pm

#130516

We have a 650GB database with 4 tables taking up the majority of the space in that DB. Two of the tables have over 100 million rows and the other two have about 40 million rows. We run sp_UpdateStats once a day M-F to update stats on those tables with a sampling of data. On Sunday's, we run a FULLSCAN against all of our DB's. For these 4 tables, we have never been able to get it to finish. Two of the tables have Clustered Indexes on them. One of the tables with over 100 million rows has a clustered index and it seems to take about 5 minutes for each statistic to be updated. The other table with over 100 million rows is a Heap table and it takes about 15-20 minutes for each statistic to update. Does a clustered index make a big difference in Update Statistic performance even if some of the columns in the statistic are not part of the Clustered Index?
Thanks for any help.

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply

Paul White SSC Guru Points: 150468 More actions · Answer 1

Hey Brendan,

This is a complex area, so I have some questions, as well as opinions:

Why do you feel the need to full scan? In VLDBs, a 10% sample is usually sufficient (it could be 5% or 50% though - the decision should be based on empirical data).

Are the tables partitioned?

Are any partitions static data - if so are the appropriate filegroups set to read only?

Do any other operation occur at the same time as the stats rebuild?

Is the database using snapshot isolation or read committed snapshot? Turning this off during maintenance might help.

Is the process I/O bound (probably) or limited by another factor such as CPU or memory?

Do the statistics operations use mutliple CPUs? (Use profiler or dynamic views with a smaller sample size to test)

The question of whether scanning a heap or a table with a clustered index is a vexed one and depends hugely on a number of factors. I don't plan to re-ignite the debate here, but the answer is: it depends.

For example: a freshly-built heap will usually scan faster than the equivalent CIX table. Over time, depending on things like the number of forwarded records and holes left by deleted records, a CIX scan may be faster.

Your tables are almost certainly better off with a clustered index. Sure, key lookups may required a little more CPU, but range scans, inserts, updates and deletes will be much faster. There are just too many good reasons to have a CIX to consider running on a heap - unless your situation is very specific and you have tested extensively to show that a heap brings a worthwhile improvement.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Wildcat91 SSC Veteran Points: 299 More actions · Answer 2

Paul,

I think the only reason we do a fullscan on the weekends is because we have always done it for our other systems. The other systems are much smaller and we only update statistics on the weekend and not every day. With this system, we rebuild or reorganize some indexes on a nightly basis so we run sp_updatestats every day. I guess we just figured we would run the fullscan on the weekends like our other systems.

The tables are not partitioned. Since this is our first time dealing with tables this size, we went with what the vendor recommended and no partitionling was involved.

The weekend fullscan is usually run by itself with nothing running at the same time. Occassionally there will be weekend work in the database by developers.

The database does not use snapshot isolation.

The process is I/O bound.

There are 32 CPU's on this server(we had to throw a ton of hardware at it to make up for vendor flaws). The update stats job does use mulitple CPU's.

I agree that all of these tables should have clustered indexes and have raised the issue to the vendor asking why they don't. They claim that the heavy transaction processing causes overhead if a clustered index is present. Of course it causes a little, but I think the positives from the Clustered index far outweigh any overhead.

The thing that is strange to me is that a 100 million row table with a clustered index can update individual statistics of just one column in 5 minutes while the other 100 million row table without a clustered index takes almost 4 times as long per statistic.

Thanks a lot for your input. I appreciate it.