Maintain statistics on 1 TB Table

Question

Maintain statistics on 1 TB Table

curious_sqldba

SSC-Dedicated

Points: 36494
More actions
November 3, 2016 at 12:51 pm

#315211

Folks just wanted to know if anyone has experience maintaining indices and statistics on big tables ( 600 GB +).

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 1

There are a bunch of people around here with that kind of experience. What's the question?

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Eirikur Eiriksson SSC Guru Points: 182915 More actions · Answer 2

Can you post the full DDL, including all indices, the typical usage etc. and particularly the reason for this table being so large. Normally such a large table would be some kind of auditing/logging where maintaining statistics should be almost irrelevant.

😎

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 3

curious_sqldba (11/3/2016)
Folks just wanted to know if anyone has experience maintaining indices and statistics on big tables ( 600 GB +).

Yes, do you have any experience there?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

curious_sqldba SSC-Dedicated Points: 36494 More actions · Answer 4

Apologies, didn't provide more details.

My original question was how do you maintain statistics, are you using table partitioning or some thing else.

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 5

Partitioning a table isn't going to help much with stats unless you have 2012 or greater and are willing to accept the default sampling rate, which I'm usually not willing to accept.

Creating partitioned views will probably do better for you because it uses separate actual tables (1 per partition) but there are caveats in their use, just like there are caveats in partitioned tables.

Rebuilding stats is important especially for indexes/columns that are "ever increasing" and used for criteria of one form or another. The key is knowing WHEN to rebuild them and, like indexes, keeping their numbers to only those used and knowing WHEN to rebuild them. For statistics, that's not as easy a task as it is for indexes.

There are various articles on the internet that say that you should sometimes just drop all column-based statistics and let the system figure out when to rebuild them. For smaller tables, that can be effective because no statistics should take very long to rebuild. For larger tables, however, it can take some time and the first query that causes such a rebuild on certain columns could take quite a while to execute. So prophylactically dropping all stats may not be a good idea there.

There is an undocumented Trace Flag (8666) than can be used to help over time. It's used to include which statistics were actually used to create execution plans in plan cache over time. There's a good article on to use it here. [font="Arial Black"]https://blogfabiano.com/2012/07/03/statistics-used-in-a-cached-query-plan/[/font][/url] Since the people before me thought it was a good idea to create stats on every bloody column in every table (it's not), I'm currently using that technique to find only the columns stats that are used over the next month or so and then I'll drop the unused column stats. I'll also use it to help confirm which indexes are being used because each index also has stats.

As of 2012, there's also a handy system function to help explore stats that also includes a replacement for RowModCtr called sys.dm_db_stats_properties. There's a good article on how to use it here.[font="Arial Black"]http://www.sqlskills.com/blogs/erin/understanding-when-statistics-will-automatically-update/[/font][/url]

Of course, that doesn't help you much because you're using SQL Server 2008 and so you have to use the same thing that MS used... the sys.sysindexes table, which can be found here. [font="Arial Black"]https://msdn.microsoft.com/en-us/library/ms190283.aspx[/font] Using similar values of found in that compared to the newer sys.dm_db_stats_properties function still works to help decide if any given statistic needs to be rebuilt. I never do it just by the time that has passed since the last rebuild because a lot of reference tables (some are a bit large) are totally static and the stats never change.

As for stats rebuilds on a large table like you have, obviously you want to figure out all the above. You want to identify and remove unused stats. You want to figure out if a stat has seen additional rows and if it's enough (in your opinion) to justify rebuilding the stat and that could also be based on how long it takes to rebuild which you won't know until you start keeping track of such things in, possibly, a stats run table that you could build.

Of course, not rebuilding stats or letting the system do it possibly not often enough can also be a real problem. We had such a problem where the addition of 200,000 rows wasn't enough to trigger an automatic stats rebuild. A 2 second query suddenly started taking "forever" (more than an hour and did no reads, so killed it after an hour). Once we rebuild the associated stats on the table, the query immediately went back to taking 2 seconds (big batch query). It also took a couple of minutes to rebuild the stats on the table.

The bottom line is that stats are actually much more important than the fragmentation of indexes when it comes to performance (fragmentation is NOT considered by the optimizer in the creation of plans). I'm getting close the end of a now 11 month experiment where I've not rebuilt any indexes on one of my production boxes since 2016-01-17 (and performance has only gotten better and there's less blocking, as well). I do, however, rebuild stats on a regular basis. So, when it comes to rebuilding stats on larger tables, you just have to bite the bullet and do it.

I know that's not a "Black'n'White" answer but neither is the problem of stats maintenance. I hope the information helps.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 6

I don't know if you're on 2008 or 2008R2, but if the latter, you should explore using traceflag 2371. It makes a huge difference for lots of people when dealing with larger databases and statistics. It will result in more frequent update of your stats, but, frequently, that's a good thing, not a bad thing. Read more about it in the Books Online.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

curious_sqldba SSC-Dedicated Points: 36494 More actions · Answer 7

@jeff - Completely agree, stats are absolutely necessary. I use below code to check when were the stats last updated, i am not 100% satisfied on the results on this query. Do you know if a process was updating a stat and got killed would it still make an entry in system tables saying it updated the stats?

SELECT OBJECT_NAME(object_id) AS [ObjectName] ,

[name] AS [StatisticName] ,

STATS_DATE([object_id], [stats_id]) AS [StatisticUpdateDate]

FROM sys.stats;

curious_sqldba SSC-Dedicated Points: 36494 More actions · Answer 8

Grant Fritchey (11/4/2016)
I don't know if you're on 2008 or 2008R2, but if the latter, you should explore using traceflag 2371. It makes a huge difference for lots of people when dealing with larger databases and statistics. It will result in more frequent update of your stats, but, frequently, that's a good thing, not a bad thing. Read more about it in the Books Online.

I did come across trace flag 2371, i will give that a try ( I am on 2008R2). Thanks

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 9

curious_sqldba (11/4/2016)
@Jeff - Completely agree, stats are absolutely necessary. I use below code to check when were the stats last updated, i am not 100% satisfied on the results on this query. Do you know if a process was updating a stat and got killed would it still make an entry in system tables saying it updated the stats?
SELECT OBJECT_NAME(object_id) AS [ObjectName] ,
[name] AS [StatisticName] ,
STATS_DATE([object_id], [stats_id]) AS [StatisticUpdateDate]
FROM sys.stats;

I would never update statistics based on the STATS_DATE. Like I said in my previous post, you can have static tables that don't ever need stats updates because the data never changes or changes so infrequently that it would simply be a total waste of time to update the stats based on when they were last rebuilt. For 2008 or below, use the RowModCtr (which I what MS does) from the deprecated sys.sysindexes compatibility view possibly coupled with other things to make a better determination.

As for an interrupted update, do what I did. Try it. Run DBCC SHOW_STATISTICS to get the original information, update and interrupt a stats rebuild on your big table, and then run DBCC SHOW_STATISTICS to show that nothing changed. Nothing gets updated in the system tables that you and I can see unless the update successfully completed.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 10

Just to add a bit, you say that you're on 2008 R2. In that case, you can use sys.dm_db_stats_properties instead of sys.sysindexes to get the modified row count. The column you're looking for there is the "modification_counter" column.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 11

Grant Fritchey (11/4/2016)
I don't know if you're on 2008 or 2008R2, but if the latter, you should explore using traceflag 2371. It makes a huge difference for lots of people when dealing with larger databases and statistics. It will result in more frequent update of your stats, but, frequently, that's a good thing, not a bad thing. Read more about it in the Books Online.

Not directed at you, ol' friend. I'm just disappointed in the MS documentation. Unfortunately, MS says the following as to whether or not to use the TF...

You can use the following guidance for enabling the new trace flag in your environment:
1.If you have not observed performance issues due to outdated statistics, there is no need to enable this trace flag.
2.If you are on SAP systems, enable this trace flag. Refer to this blog http://blogs.msdn.com/b/saponsqlserver/archive/2011/09/07/changes-to-automatic-update-statistics-in-sql-server-traceflag-2371.aspx for additional information.
3.If you have to rely on nightly job to update statistics because current automatic update is not triggered frequently enough, consider enabling trace flag 2371 to reduce the threshold.

For #1, MS never tells how to determine if performance issues are due to outdated (hate that term because people think stats are date sensitive) statistics or not. The only way I know how to tell is to never do #3 or index rebuilds and wait for something to dog out performance wise, rebuild the stats, and see if that fixes it. It's kind of a "if the dam breaks, then fix it" recommendation instead of being able to "check for leaks". 😛

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)