update stats vs rebuilding indexes

Question

update stats vs rebuilding indexes

vsamantha35

SSChampion

Points: 11551
More actions
March 2, 2023 at 8:24 am

#4155848

Hi All,
Sometimes just by updating stats with full scan on the tables involved in the poorly performing query would fix the problem.
How does that magic happen and why many people suggest to just go for updating statistics rather than rebuilding the indexes in certain scenarios ? is that because the rebuild operation is expensive in terms of I/O and the amount of transaction is generated? what is the logic behind that? Does the trick works for both smaller and big tables as well or does it work in only certain scenario's where data is not changing quite often?

Thanks,
Sam

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 1

Rebuilding an index does two things. First, since you're literally rebuilding the index, it reorders the pages and the rows on those pages. This has a benefit in that it can eliminate some pages, making the index smaller and condensing the rows to pages, based on your fill factor. Second, as a part of the rebuild, you get a full statistics update.

What happens is, people rebuild the indexes because they think that it's necessary to constantly and continually defrag the indexes. And they see performance enhancements... usually short term ones. Most of these come from the new set of statistics that were generated, showing more accurate row counts, leading the optimizer to make better choices for the plan. A few may result from queries that are doing massive scans having fewer pages to scan, improving performance, in the short term until the indexes fragment again.

In short, the performance enhancements from index rebuilds are frequently an illusion that it comes from the index rebuild. Instead, it's because of the statistics update. So, many people, just skip the index rebuild and instead update the statistics. More often than not, and I mean, 90%+, this is the biggest contributor to improved performance.

As to smaller/bigger tables, well, you're dealing with a specific issue in your tiny table that's abused, right? For most situations, tiny tables are not updated often, if at all. Index rebuilds are either a waste of time because nothing there is changing, or, the indexes are too small to benefit from a rebuild (less than 8 pages goes on mixed extents which doesn't get defragged much, if at all). Rebuilds and stats updates too, generally are better for larger tables that are active.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

vsamantha35 SSChampion Points: 11551 More actions · Answer 2

Grant Fritchey wrote:

Rebuilding an index does two things. First, since you're literally rebuilding the index, it reorders the pages and the rows on those pages. This has a benefit in that it can eliminate some pages, making the index smaller and condensing the rows to pages, based on your fill factor. Second, as a part of the rebuild, you get a full statistics update.
What happens is, people rebuild the indexes because they think that it's necessary to constantly and continually defrag the indexes. And they see performance enhancements... usually short term ones. Most of these come from the new set of statistics that were generated, showing more accurate row counts, leading the optimizer to make better choices for the plan. A few may result from queries that are doing massive scans having fewer pages to scan, improving performance, in the short term until the indexes fragment again.
In short, the performance enhancements from index rebuilds are frequently an illusion that it comes from the index rebuild. Instead, it's because of the statistics update. So, many people, just skip the index rebuild and instead update the statistics. More often than not, and I mean, 90%+, this is the biggest contributor to improved performance.
As to smaller/bigger tables, well, you're dealing with a specific issue in your tiny table that's abused, right? For most situations, tiny tables are not updated often, if at all. Index rebuilds are either a waste of time because nothing there is changing, or, the indexes are too small to benefit from a rebuild (less than 8 pages goes on mixed extents which doesn't get defragged much, if at all). Rebuilds and stats updates too, generally are better for larger tables that are active.

Thanks for the information. Few follow-up questions which came into my mind.

#1) In terms of I/O and time, which is faster? running Index Rebuilds or Running Updatestats or is it depens on case to case?

#2) I have a database called "testdb" and I want to perform the following,

1. If fragementation <5% do nothing

2. If fragmentation between 5-30%, do a re-org

3. If fragmentation > 30%, do a rebuild.

At the same, I want to update statistics on all tables, on all columns with FULLSCAN, only for those tables which are modified.

In that case, does the below command suit my requirement? Want to use Ola Hallengren's maintenance script.

use dba_utils
go
EXECUTE [dbo].[IndexOptimize]
@Databases = 'testdb',
@FragmentationLow = null,
@FragmentationMedium = 'INDEX_REORGANIZE,INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE',
@FragmentationHigh = 'INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE',
@FragmentationLevel1 = 5,
@FragmentationLevel2 = 30,
@MaxDOP = 4,
@UpdateStatistics='ALL',
@UpdateStatistics='INDEX',
@UpdateStatistics='COLUMNS',
@StatisticsSample = 100,
@OnlyModifiedStatistics='Y',
@TimeLimit=18000, -- 5 hours in secs -- if it runs more than 5 hours, come out.
@LogToTable = 'Y'
go

#3) If I just wanted to run update stats on all tables, on all columns with FULLSCAN, only for those tables which are modified.

Will this command work or need to modify a bit?

use dba_utils
go
EXECUTE [dbo].[IndexOptimize]
@Databases = 'testdb',
@MaxDOP = 4,
@UpdateStatistics='ALL',
@UpdateStatistics='INDEX',
@UpdateStatistics='COLUMNS',
@StatisticsSample = 100,
@OnlyModifiedStatistics='Y',
@TimeLimit=18000, -- 5 hours in secs -- if it runs more than 5 hours, come out.
@LogToTable = 'Y'
go

Thanks,

Sam

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 3

I'm sure with some work, I could find exceptions, but the overall rule is easy here, statistics updates are radically less resource intensive than index rebuilds. It's not even close.
I haven't used Ola's scripts in anger in quite some time. I don't have a good answer for you.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Jeff Moden SSC Guru Points: 1004412 More actions · Answer 4

vsamantha35 wrote:

#1) In terms of I/O and time, which is faster? running Index Rebuilds or Running Updatestats or is it depens on case to case?
#2) I have a database called "testdb" and I want to perform the following,
1. If fragementation <5% do nothing
2. If fragmentation between 5-30%, do a re-org
3. If fragmentation > 30%, do a rebuild.

For question #1, just do a test. I think that you'll find that the stats take a lot longer because Index Rebuilds only affect/rebuild the columns they actually use. Stats rebuilds also take on the "column" stats that are automatically created but they don't block like rebuilds do. ONLINE rebuilds are almost as bad as using REORGANIZE when it come to resource usage and they are very much worse on the log file and they take a whole lot longer even if no one is hitting on the table while it runs.

And, on that note, let's talk about Item #2 above. Although 99% of the world has adopted those numbers as a supposed "Best Practice", they were never intended to be a "Best Practice" and, for most of your indexes that fragment, they are, in fact, a very WORST practice. They're the real reason why people think that Random GUIDs are a fragmentation problem when they actually do operate as most people would expect a fragmenting index to act as. I have the proof, if you're interested. See the following video and watch it past the Q'n'A, as well, for a literally BIG surprise. Realize also that, despite the title of "GUIDs vs Fragmentation", it's NOT just about GUIDs. Most of what is in that presentation can be applied to many types of indexes. I used GUIDs because if we can solve that "problem" for fragmentation, we can solve just about anything... and I clearly demonstrate both the cause and the fix.

To summarize, the cause is those bloody "Best Practice" recommendations and, more specifically, the use of REORGANIZE. It works just like MS says it will work but people read it the wrong way and it does NOT work like 99% of the people in the world think it works. It's also NOT less resource intensive than REBUILDs except in some very slight cases. In a lot of cases, it's a whole lot worse, especially for the log file and CPU!

I also show where that God-forsaken supposed set of "Best Practice" numbers came from and how the even the originator of those numbers says to "take those values with a grain of salt and NOT treat them as absolute".

Here's the link... and, remember... it's actually better to do NO index maintenance than it is to do it wrong. If you're using the current supposed "Best Practice" recommendations, then you're flat out doing it wrong!

Here's the link to the 'tube that proves all that with code...

https://www.youtube.com/watch?v=rvZwMNJxqVo

And, a warning... if you're listening with a headset, be aware that they added 3 advertisements at the 15:00, 30:15, and 45:25. The advertisements are important to help support this event but I wanted to let you know that they're sudden and they're loud! They WILL lift your headset!

And, YES! I went almost 4 years without doing any regular index maintenance and performance actually got better all by itself in the first 3 months because I had simply stopped doing it wrong. The only thing I did for those 4 years was to rebuild stats once a week and do the occasional bebuild on some very large and weird indexes that got below even 50% page density.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ScottPletcher SSC Guru Points: 101123 More actions · Answer 5

Grant Fritchey wrote:

(less than 8 pages goes on mixed extents

For SQL 2017 (or SQL 2016, for that matter), only if for some odd reason you've set MIXED_PAGE_ALLOCATION ON and/or allowed it to remain OFF. You should specify ON for all dbs.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

vsamantha35 SSChampion Points: 11551 More actions · Answer 6

For question #1, just do a test. I think that you'll find that the stats take a lot longer because Index Rebuilds only affect/rebuild the columns they actually use. Stats rebuilds also take on the "column" stats that are automatically created but they don't block like rebuilds do. ONLINE rebuilds are almost as bad as using REORGANIZE when it come to resource usage and they are very much worse on the log file and they take a whole lot longer even if no one is hitting on the table while it runs.

So, is it better to go for OFFLINE INDEX REBUILDS especially when it is large index as it also would cause less blocking?

Michael L John One Orange Chip Points: 27060 More actions · Answer 7

vsamantha35 wrote:

For question #1, just do a test. I think that you'll find that the stats take a lot longer because Index Rebuilds only affect/rebuild the columns they actually use. Stats rebuilds also take on the "column" stats that are automatically created but they don't block like rebuilds do. ONLINE rebuilds are almost as bad as using REORGANIZE when it come to resource usage and they are very much worse on the log file and they take a whole lot longer even if no one is hitting on the table while it runs.
So, is it better to go for OFFLINE INDEX REBUILDS especially when it is large index as it also would cause less blocking?

Try this. Stop reindexing. Update only individual statistics that have changed.

Take a look at the output of this query. There is an issue with the modification_counter returned. If the same row gets updated 1000 times, it will show 1000 in that.

SELECT 
DB_NAME(db_id()),   
S.name, 
O.name, 
O.object_id,
STAT.name,
STATPROP.rows,
STATPROP.modification_counter,
STATPROP.last_updated
FROM sys.objects O
INNER JOIN sys.schemas S ON O.schema_id = S.schema_id
INNER JOIN sys.stats STAT ON STAT.object_id = O.object_id  
CROSS APPLY sys.dm_db_stats_properties(STAT.object_id, STAT.stats_id) STATPROP
WHERE S.name <> 'sys'

Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/

Jeff Moden SSC Guru Points: 1004412 More actions · Answer 8

Michael L John wrote:

vsamantha35 wrote:
For question #1, just do a test. I think that you'll find that the stats take a lot longer because Index Rebuilds only affect/rebuild the columns they actually use. Stats rebuilds also take on the "column" stats that are automatically created but they don't block like rebuilds do. ONLINE rebuilds are almost as bad as using REORGANIZE when it come to resource usage and they are very much worse on the log file and they take a whole lot longer even if no one is hitting on the table while it runs.
So, is it better to go for OFFLINE INDEX REBUILDS especially when it is large index as it also would cause less blocking?
Try this. Stop reindexing. Update only individual statistics that have changed.

I agree. Simply stop doing index fragmentation maintenance. Do stats rebuilds, instead.

That being said, there are some times where you actually might want to rebuild an index. Those times are for large indexes that have a seriously low Page Density due to fragmentation and you want to recover 10's of Gigabytes of space every month or two. There are a couple of ways to handle those properly. When you run into one a month or two after you stop doing index fragmentation maintenance, post back we can tell you how to handle them with the least pain for you and your system.

In the meantime, please study the following...

REBUILD [ WITH (<rebuild_index_option> [ ,... n]) ]

In the above, note the part about "minimal logging", if that's possible on your system.

ONLINE = { ON | OFF } <as applies to rebuild_index_option>

In the above, read mostly in the OFF section of that part but also read about the ON part if you absolutely need it. Understand the the ON will make things a whole lot slower and a whole lot more nasty for your log file. Still, it won't be as bad as using REORGANIZE on such indexes would be.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

vsamantha35 SSChampion Points: 11551 More actions · Answer 9

Jeff Moden wrote:

Michael L John wrote:
vsamantha35 wrote:
For question #1, just do a test. I think that you'll find that the stats take a lot longer because Index Rebuilds only affect/rebuild the columns they actually use. Stats rebuilds also take on the "column" stats that are automatically created but they don't block like rebuilds do. ONLINE rebuilds are almost as bad as using REORGANIZE when it come to resource usage and they are very much worse on the log file and they take a whole lot longer even if no one is hitting on the table while it runs.
So, is it better to go for OFFLINE INDEX REBUILDS especially when it is large index as it also would cause less blocking?
Try this. Stop reindexing. Update only individual statistics that have changed.
I agree. Simply stop doing index fragmentation maintenance. Do stats rebuilds, instead.
That being said, there are some times where you actually might want to rebuild an index. Those times are for large indexes that have a seriously low Page Density due to fragmentation and you want to recover 10's of Gigabytes of space every month or two. There are a couple of ways to handle those properly. When you run into one a month or two after you stop doing index fragmentation maintenance, post back we can tell you how to handle them with the least pain for you and your system.
In the meantime, please study the following...
REBUILD [ WITH (<rebuild_index_option> [ ,... n]) ]
In the above, note the part about "minimal logging", if that's possible on your system.
ONLINE = { ON | OFF } <as applies to rebuild_index_option>
In the above, read mostly in the OFF section of that part but also read about the ON part if you absolutely need it. Understand the the ON will make things a whole lot slower and a whole lot more nasty for your log file. Still, it won't be as bad as using REORGANIZE on such indexes would be.

Sure. Thank you very much Jeff , Micheal and everyone.

Jeff Moden SSC Guru Points: 1004412 More actions · Answer 10

Just in case you need to fight off the folks that still think that regular index maintenance based on logical fragmentation is a good idea, Microsoft finally "got the word" on the 20th of April 2021 and updated the documentation. Anyone still using the old "5/30 method" that 99% of the world is still using is wrong (and they've always been wrong... it just wasn't documented as such before).

Here's the link to the better documentation with the warning that REORGANIZE doesn't work they way most people think and it's almost always better to do no index maintenance than it is to do REORGANIZE.

https://learn.microsoft.com/en-us/sql/relational-databases/indexes/reorganize-and-rebuild-indexes#what-to-consider-before-performing-index-maintenance

Here's the newest "strategy" they came up with. They still make no commitment to what actually needs to be done but at least they're headed in a better direction. Again, I wouldn't use REORGANIZE on any index... it's NOT the gentle little kitty that they make it out to be.

https://learn.microsoft.com/en-us/sql/relational-databases/indexes/reorganize-and-rebuild-indexes?view=sql-server-ver16#index-maintenance-strategy

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

CreateIndexNonclustered SSChasing Mays Points: 648 More actions · Answer 11

vsamantha35 wrote:

For question #1, just do a test. I think that you'll find that the stats take a lot longer because Index Rebuilds only affect/rebuild the columns they actually use. Stats rebuilds also take on the "column" stats that are automatically created but they don't block like rebuilds do. ONLINE rebuilds are almost as bad as using REORGANIZE when it come to resource usage and they are very much worse on the log file and they take a whole lot longer even if no one is hitting on the table while it runs.
So, is it better to go for OFFLINE INDEX REBUILDS especially when it is large index as it also would cause less blocking?

The only reason you would rebuild online is if you can't take the index offline/there are limited windows to do the maintenance. The index doesn't necessarily need to be big, it can just be frequently used. In what I have supported, all the indexes I needed to rebuild online did happen to be very large indexes and rebuilding online alone was not enough. I also needed to make it a resumable index rebuild either because the logs were getting hammered and needed to run some backups, or because a slow point of production was ending and I needed to stop using up the resources until the next window.

In your index maintenance script using ola's index optimize, you need to set fragmentation actions to avoid an unplanned index reorg or rebuild. I typically only do stats updates and have a job where FragmentationLow, medium and high are all set to null. Each parameter accepts a string containing a comma separated list, can't specify each parameter multiple times

Jeff Moden SSC Guru Points: 1004412 More actions · Answer 12

CreateIndexNonclustered wrote:

vsamantha35 wrote:
For question #1, just do a test. I think that you'll find that the stats take a lot longer because Index Rebuilds only affect/rebuild the columns they actually use. Stats rebuilds also take on the "column" stats that are automatically created but they don't block like rebuilds do. ONLINE rebuilds are almost as bad as using REORGANIZE when it come to resource usage and they are very much worse on the log file and they take a whole lot longer even if no one is hitting on the table while it runs.
So, is it better to go for OFFLINE INDEX REBUILDS especially when it is large index as it also would cause less blocking?
The only reason you would rebuild online is if you can't take the index offline/there are limited windows to do the maintenance. The index doesn't necessarily need to be big, it can just be frequently used. In what I have supported, all the indexes I needed to rebuild online did happen to be very large indexes and rebuilding online alone was not enough. I also needed to make it a resumable index rebuild either because the logs were getting hammered and needed to run some backups, or because a slow point of production was ending and I needed to stop using up the resources until the next window.
In your index maintenance script using ola's index optimize, you need to set fragmentation actions to avoid an unplanned index reorg or rebuild. I typically only do stats updates and have a job where FragmentationLow, medium and high are all set to null. Each parameter accepts a string containing a comma separated list, can't specify each parameter multiple times

I know this seems to be a strange question according to the current understanding of indexes by the general population but, why are you doing index maintenance to begin with and why are you using logical fragmentation to make a determination as to when to do the maintenance?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

CreateIndexNonclustered SSChasing Mays Points: 648 More actions · Answer 13

changing page verification to checksum, widening a column, after an irregular large delete or update, after a sql version upgrade, controlling growth of a VLDB (even 15% fragmentation on a 20 tb database is a lot of ssd sitting blank), re-enable non-clustered indexes, fixing a small index when someone in the past set it to 10% fill factor, etc

Jeff Moden SSC Guru Points: 1004412 More actions · Answer 14

CreateIndexNonclustered wrote:

changing page verification to checksum, widening a column, after an irregular large delete or update, after a sql version upgrade, controlling growth of a VLDB (even 15% fragmentation on a 20 tb database is a lot of ssd sitting blank), re-enable non-clustered indexes, fixing a small index when someone in the past set it to 10% fill factor, etc

So, you have a large table with 15% logical fragmentation and you want to rebuild it to save space. Have you actually checked page density to see if it's actually worth it? And, since Ola' good code doesn't work by Page Density, why would you use that? And, what is the current fill factor on such a table? If it's "0", then all you're doing is making page splits MUCH worse and you're actually perpetuating the fragmentation.

I can also demonstrate an index that will have 99.9% fragmentation that never has a page density below 99% and will severely and quickly refragment with no wasted space.

My point here isn't to criticize... my point here is that each and every index has its own "personality" and doing index maintenance based on logical fragmentation or even page density is not only a waste of time but can be downright unhealthy for the whole database, especially on the proverbial "morning after" such index maintenance.

A perfect example of this is to ask the question, why do Random GUIDs fragment so much? Almost every one says it's because they're Random in Nature when that's the actual reason why they should almost never fragment. It turns out that people don't actually know how to maintain such indexes. For example, I know how to insert 100,000 rows per day into a clustered index that is keyed on a Random GUID and go for 58 days with less than 1% logical fragmentation and virtually;y no page splits. The reason why most people can't is because they use the supposed "Best Practices" (they 're not "Best Practices" and were never meant to be taken as "Best Practices") of reorging between 5 and 30% and rebuilding only above 30%.

Also, you mentioned a "20 TB" database. Do you actually have one of those? If so, what's the largest index you've had to rebuild and how much empty space did you leave behind in the process of rebuilding it?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)