updating statistics for big tables

  • Hi All,

    Need some advise on how to deal with UPDATING STATISTICS huge tables. Below are some questions and I am scratching my head to get a workaround.

    I have a big staging table in our data warehouse which is of 6TB. Currently Autocreatestats = 1, Autoupdatestats =1.

    1. UPDATE STAT is taking more than 3 hours. So, my question is how to keep the stats up to date for such big and changing tables.

    2. Does updating statistics takes locks on the data pages and can affect concurrency when update stats is kicked off??

    3. What happens if update stats is cancelled after running more than 3 hours ?? will it rollback all the good work done so far?

    4. Does update stats also uses tempdb behind the screens?

    Kindly suggest.

  • Are you updating the statistics on the table in one command i.e. update statistics <tablename> or updating individual statistics one by one i.e. update statistics <tablename> (<statisticsname>)

  • Are you updating the statistics on the table in one command i.e. update statistics <tablename> or updating individual statistics one by one i.e. update statistics <tablename> (<statisticsname>) ?

    Are you using fullscan or using a sample for the update?

    Are you doing any index maintenance on the tables? The statistics on the index will have been updated by the index rebuild job but with the SAMPLE option.

    Use sp_helpstats to get a list of the statistics on the table and use sp_autostats to get a list of the statistics with the last updated time and the AUTOSTATS setting.

  • 1) Buy better hardware? How much RAM do you have and what is the throughput you get from your IO subsystem for a large scan?

    2) I usually recommend for my clients to DISABLE autoupdate stats on data warehouse tables, certainly the fact tables and all stage objects. You simply cannot want stats updates to fire whenever you cross that 20%+500rows boundary during (sometimes multiple times??) a load.

    3) Assuming you are on Enterprise Edition of SQL Server and also partitioned, newer versions of SQL Server have MUCH better scenarios here for you with stats and index mx.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Hi TripleAxem

    Answers Inline.

    Are you updating the statistics on the table in one command i.e. update statistics <tablename> or updating individual statistics one by one i.e. update statistics <tablename> (<statisticsname>) ?

    Ans : update statistics <tname> with fullscan;

    Are you using fullscan or using a sample for the update?

    Ans : fullscan;

    Are you doing any index maintenance on the tables? The statistics on the index will have been updated by the index rebuild job but with the SAMPLE option.

    Ans : No

  • I'd start by breaking the update statsistics down to do individual statistics on the table rather than the whole table. I'd also look at if you can get away with doing samples instead of fullscans. This might be tricky and I've known systems run very slowly when the sample rate has changed from fullscan to a sample. It depends on your data distribution.

  • tripleAxe (4/15/2016) The statistics on the index will have been updated by the index rebuild job but with the SAMPLE option.

    Rebuilding an index, for example by using the ALTER INDEX … REBUILD statement, will update only index statistics with the equivalent of using WITH FULLSCAN. Rebuilding indexes does not update any column statistics.

    http://www.benjaminnevarez.com/2010/07/rebuilding-indexes-vs-updating-statistics/

    Catch-all queries done right [/url]
    Gail Shaw's Performance Blog[/url]

  • I was still using DBCC DBREINDEX to carry out the index maintenance where we had issues with statistics being updated. IIRC it was SQL 2000. So the index stats where updated using the sampling method. In my case this wasn't good enough and until the statistics were updated using a full scan the performance of a number of queries suffered.

  • Consider updating the statistics one by one and include a wait interval to reduce the impact on your system. You can also add a sample size for daily updates. Below is a possible solution for a table called Events. The fist two blocks generate the statements required to update all statistics on the desired table.

    USE <DB_NAME()>;

    GO

    SELECT name AS index_name,

    STATS_DATE(object_id, index_id) AS statistics_update_date

    , 'update statistics Events (' + name + ');' as statement

    FROM sys.indexes

    WHERE object_id = OBJECT_ID('Events');

    GO

    -- Weekly

    update statistics Events (IX_NetworkNode) ;

    GO

    WAITFOR DELAY '00:10:00'

    update statistics Events (PK_Events) ;

    GO

    WAITFOR DELAY '00:10:00'

    update statistics Events (IX_EventTime) ;

    GO

    WAITFOR DELAY '00:10:00'

    update statistics Events (IX_NetObjectID) ;

    GO

    USE <DB_NAME()>;

    GO

    SELECT name AS index_name,

    STATS_DATE(object_id, index_id) AS statistics_update_date

    , 'update statistics Events (' + name + ') with sample 25 percent;' as statement

    FROM sys.indexes

    WHERE object_id = OBJECT_ID('Events');

    GO

    --Daily

    update statistics Events (IX_NetworkNode) with sample 25 percent;

    GO

    WAITFOR DELAY '00:10:00'

    update statistics Events (PK_Events) with sample 25 percent;

    GO

    WAITFOR DELAY '00:10:00'

    update statistics Events (IX_EventTime) with sample 25 percent;

    GO

    WAITFOR DELAY '00:10:00'

    update statistics Events (IX_NetObjectID) with sample 25 percent;

    GO

    Br. Kenneth Igiri
    https://kennethigiri.com
    All nations come to my light, all kings to the brightness of my rising

  • tripleAxe (4/21/2016)


    I was still using DBCC DBREINDEX to carry out the index maintenance where we had issues with statistics being updated. IIRC it was SQL 2000. So the index stats where updated using the sampling method. In my case this wasn't good enough and until the statistics were updated using a full scan the performance of a number of queries suffered.

    Rebuilding an index will update statistics with the equivalent of a full scan – doesn’t matter whether you use DBCC DBREINDEX or ALTER INDEX … REBUILD, the effect is the same. It can do this because it sees a complete picture of the index while its doing the rebuild.

    Quoted from Paul Randal - http://www.sqlskills.com/blogs/paul/search-engine-qa-10-rebuilding-indexes-and-updating-statistics/

    Catch-all queries done right [/url]
    Gail Shaw's Performance Blog[/url]

  • Couple of things to emphasize.

    I too have a large data base that is our data warehouse. It's stretching billions of records. Like you, updating statistics with FULL SCAN is going to take hours depending on the table, indexes and so forth.

    If you're rebuilding your indexes, then there is no need to duplicate work. As others have mentioned, just remember that rebuilding indexes will update only index statistics with the equivalent of using WITH FULLSCAN. It does not update any column statistics. Then you just need to take care of column statistics with that option specified in UPDATE STATISTICS. If you choose to just use UPDATE STATISTICS without Indexes or Columns defined, it will do both just as if you don't specify a sample or FULL SCAN, it will determine the best sample for you or the default I believe.

    If you're looking to still manually update, then finding a good sample for specific tables can help to a great degree. Sampling works fine for my large tables. Just remember that letting SQL Server choose that for you with UPDATE STATISTICS is not always the best option. You can test that by first letting SQL decide for you and checking DBCC SHOW_STATISTICS and then doing a FULL SCAN and comparing the results.

    As others have recommended, using Ola's script is also a good option to helping out.

    I've read a few articles that have suggested that smaller sample rates can take longer than larger sample rates or even FULL SCANS depending on the hardware and setup. I have not personally ran into that, but it's a consideration when choosing the right sample.

    Likewise, RESAMPLING has also been known to get people in trouble if you use that option when things go really stale. That's because the last good sample may not be relevant for the sample you need today.

  • Below script will help to get the details of last updated date, row, samples rows and modification_counter. if modification count is very huge then need to do update stats with full scan instead of sample.

    select object_name(stat.object_id) as object,sp.stats_id, name, last_updated, rows, rows_sampled, modification_counter

    from sys.stats AS stat CROSS APPLY sys.dm_db_stats_properties(stat.object_id, stat.stats_id) AS sp

    where stat.object_id = object_id('Tablename');

  • Below script will help to get the details of last updated date, modification_counter. if modification count is very huge then need to do update stats with full scan instead of sample.

    select object_name(stat.object_id) as object,sp.stats_id, name, last_updated, rows, rows_sampled, modification_counter

    from sys.stats AS stat CROSS APPLY sys.dm_db_stats_properties(stat.object_id, stat.stats_id) AS sp

    where stat.object_id = object_id('Tablename');

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply