June 12, 2023 at 9:28 am
Hi
To stop transaction log growth, our index rebuild process changes databases to Bulk Logged from full recovery and then back again at the end of the process.
What we've noticed, and replicated on Dev systems, is that changing the recovery mode clears all the plans for that database from the cache. We've changed the recovery mode without doing any index updates and the same thing happens, so it's not that the plans are being invalidated after a stats update.
I can't find any references to this behaviour. Is it just us or do other people have this issue?
Cheers
Alex
June 12, 2023 at 2:55 pm
That seems to be normal SQL Server activity, to clear the plan cache when certain ALTER DATABASE statements are executed. Ref:
https://www.sqlskills.com/blogs/erin/query-store-and-the-plan-cache-flushing/
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
June 12, 2023 at 3:03 pm
Thanks for that Scott. I amazed that this isn't documented better, Microsoft document changing recovery mode before defragging indexes but no mention of this!
June 12, 2023 at 3:36 pm
Here's a thought. Stop reindexing.
Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/
June 12, 2023 at 5:31 pm
I agree. Yes, MS might have good reasons for dropping the cache in those cases, but it should document that it will do that.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
June 12, 2023 at 7:05 pm
Here's a thought. Stop reindexing.
+1,000,000!!!
--Jeff Moden
Change is inevitable... Change for the better is not.
June 12, 2023 at 7:26 pm
Hi
To stop transaction log growth, our index rebuild process changes databases to Bulk Logged from full recovery and then back again at the end of the process.
What we've noticed, and replicated on Dev systems, is that changing the recovery mode clears all the plans for that database from the cache. We've changed the recovery mode without doing any index updates and the same thing happens, so it's not that the plans are being invalidated after a stats update.
I can't find any references to this behaviour. Is it just us or do other people have this issue?
Cheers
Alex
Do you have any substantial proof that all the index maintenance you're doing has improved performance?
I can see surgically doing a rebuild to recover lost space once page density goes below a threshold but that's not what you're likely measuring. You're probably using Logical Fragmentation to determine if an index should be maintained and, if you are, you're probably also using the supposed "Best Practice" of "Rorg between 5 and 30% and Rebuild greater than 30%". Is that what you're doing?
Along with that, have you ever looked at blocking levels before and after such index maintenance?
And, finally, have you checked to see which indexes you've de-fragmented that go over 5% fragmentation in less than, say, 2 days?
What most people don't know is that Paul Randal, the guy that wrote all that stuff for Reorg and Rebuild, actually wrote an article way back in 2009 that tells you to not take too much stock in those numbers. Read the article... especially the last sentence of the article.
And, I'm here to tell you that people like Michael John and myself have proven that the supposed "Best Practices" are not best practices, were never meant to be "Best Practices" and, especially because REORGANIZE doesn't actually work the way most people think it does (resulting in it actually perpetuates fragmentation in most cases), is actually a "Worst Practice".
It's even the reason why people think Random GUIDs fragment so quickly. I've got a demonstration where I insert 100,000 rows into a random GUID for 58 simulated days with <1% logical fragmentation. How'd I do that? It's simple... I stopped following the supposed "Best Practices" and did it the right way.
I also went almost 4 years in production without doing any index maintenance because of the "Morning After Blocking" issues I spoke of above and performance actually got a lot better in the first 3 months and then stayed there. I finally (and carefully) rebuilt some indexes to recover disk space.
There is no panacea code (everything being handled the same way) for index maintenance. If you're not going to evaluate and classify every index, you're doing it wrong (and your log file hates you for it). It's far better to do no index maintenance than it is to keep doing it wrong. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
June 13, 2023 at 6:50 am
I'm sure there's a valid argument there Paul, but then amount of energy it'd take to convince my colleagues, plus the fact that removing defragging would be blamed for every performance issue for the next six months, it doesn't feel like a good investment of my time.
I think what we'll go for is see how the defragging goes without changing the recovery mode. I suspect this change to bulk logged was implemented when the orgainsation was more disk constrained.
When I've got more time I'll definitely read the Paul Randal article though
June 13, 2023 at 12:00 pm
I'm sure there's a valid argument there Paul, but then amount of energy it'd take to convince my colleagues, plus the fact that removing defragging would be blamed for every performance issue for the next six months, it doesn't feel like a good investment of my time.
I think what we'll go for is see how the defragging goes without changing the recovery mode. I suspect this change to bulk logged was implemented when the orgainsation was more disk constrained.
When I've got more time I'll definitely read the Paul Randal article though
I would take the time an watch Jeff's series on "Black Arts Index Maintenance" as well as show your colleagues that Microsoft no longer recommends reindexing unless you can prove that it makes a difference. They also say that when there are performance gains from reindexing, it is likely from updating statistics.
Seriously. It is worth every minute of your time to do this analysis. My "timeline" for discovering the various performance issues that reindexing actually CAUSED corresponds to the research Jeff has done.
My example? Over 2 two year period, there were just under 1900 trouble tickets logged related to database performance issues. Now, stopping reindeing was not the only steps I took. But, that did provide the biggest benefit. Over the 5 years after stopping reindexing, the number of trouble tickets related to database performance was reduced to 48. And most of them were not database issues, they were bad code.
Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/
June 13, 2023 at 12:24 pm
Hi Michael
I can only think of 2 ways defragging indexes could impact performance.
Outside of these two things I can't think how doing a defrag would cause performance issues. They may not have a positive impact but I find it difficult to see, outside of the maintenance window, how they could cause all the performance issues you're reporting.
Correlation is not causation, so what was the mechanism that caused all the performance issues. What were the biggest hitters?
June 13, 2023 at 1:08 pm
- you're running queries at the same time as the defrag process, in which case you really need to look at your maintenance windows
A 24/7 operation doesn't really have a maintenance window. Activity was very high at 8 AM (North America), 7 PM (Asia, Australia), 2 AM (Eastern Europe) and 4 AM (western Europe). ETL's executed starting at midnight, and were spread across the next 6-8 hours.
Creating an availability group provided some relief by directing selects against the secondary's, but not much.
The original maintenance that I inherited would start at 9 PM, and typically would still be executing at 8 AM. That followed the bad recommendations.
During this period, the users in around the world experienced many issues. The various ETL processes were victims of deadlocks almost every night.
In this environment, adjusting the maintenance window was an impossible task.
defragging decreases the page density, so your doing more IO, in which case you need to look at your fillfactor.
How much can you reduce (increase?) the fillfactor before that has an impact? The nature of these systems would create very high fragmentation within a very short period of time. The number of page splits that occurred immediately after reindexing finished was staggering. The overhead of SQL Server scanning extra data pages compared to the overhead of a page split is significant.
I strongly suggest that you take the time to google-fu "Jeff Moden Black Arts Index Maintenance" and watch the presentations. The research, with facts, is very eye opening. What you think should be happening is definitely NOT happening.
Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/
June 13, 2023 at 4:16 pm
Outside of these two things I can't think how doing a defrag would cause performance issues.
THAT, my friend, is the very problem with 99.9% of the world. 🙂
As Michael John suggests, watch the following 82 minute tube including the outtake after the Q'n'A section.
https://www.youtube.com/watch?v=rvZwMNJxqVo
The show starts at time 00:01:24
--Jeff Moden
Change is inevitable... Change for the better is not.
June 14, 2023 at 7:40 am
Hi
Michael - sorry your systems don't have a proper maintenance window. IT architects, who don't design maintenance windows into their systems, should be taken out and shot (too much!). Just like infosec have drummed into us security needs to be designed in, so must maintenance windows.
Jeff - I watched your video, it was really interesting. Going full "Brent Ozar" and doing no index maintenance looks like a bad idea, but dropping reorgs and lowering the threshold for rebuilds seems a good and sellable idea.
Most PCI's are not identity columns or GUIDs though. They'll be dates, orderID's, customerID's etc. They're going to vary in their sequentiality (made up word?). For many of them you're still going to get a hot edge....
Understandably the video focussed on updates and inserts, but most of the work SQL does is reads. Both page splits and lower fill factors are going to have a negative impact on reads, there must be a balancing act to have here. I'm not convinced that the fact the page split will be non sequential, is that big an issue on Flash storage.
All your examples are with fill factors of 70%, 80%, or 90%. Microsoft now recommends 100% fill factor. What are your thoughts on that?
I found the stuff on how fill factor was only applied really when the indexes are rebuilt really interesting and how new pages are always 100% filled on a sequential PCI.
Apologies if I'm holding any sticks the wrong way round. There was a lot to take in.
Cheers
Alex
June 14, 2023 at 3:33 pm
Hi
Michael - sorry your systems don't have a proper maintenance window. IT architects, who don't design maintenance windows into their systems, should be taken out and shot (too much!). Just like infosec have drummed into us security needs to be designed in, so must maintenance windows.
Yeah. The person who worked on this maintenance window was a complete incompetent. He really didn't consider that ~20k users in the systems 24/7/365 may be a significant challenge. He only captured usage metrics, and ran multiple workload replays over a period of time with different fragmentation, fillfactors, and so forth while observing the differences between the baseline metrics and the metrics captured by the testing so he could make an informed decision. After he stopped reindexing and enhanced updating statistics, the only net effect was that complaints from the users and failures of automated processes decreased to almost nothing. The company was so displeased with his efforts that they awarded him with an attaboy certificate and a monetary bonus.
Jeff - I watched your video, it was really interesting. Going full "Brent Ozar" and doing no index maintenance looks like a bad idea, but dropping reorgs and lowering the threshold for rebuilds seems a good and sellable idea.
Stopping reindexing is such a bad idea that Microsoft changed their documentation partially based upon the findings from Jeff's research.
From Microsoft's documentation, contained here :
Index maintenance decisions should be made after considering multiple factors in the specific context of each workload, including the resource cost of maintenance. They should not be based on fixed fragmentation or page density thresholds alone.
AND
Customers often incorrectly attribute this improvement to the index rebuild itself, taking it to be result of reduced fragmentation and increased page density. In reality, the same benefit can often be achieved at much cheaper resource cost by updating statistics instead of rebuilding indexes.
AND
Do not assume that index maintenance will always noticeably improve your workload.
They have no idea what they are talking about, right? I can't believe that they are promoting something so stupid as perform reindexing only if a noticeable improvement is observed.
I would still blindly reindex day after day, while maybe requiring downtime, losing the procedure cache, and taxing the resources of the server. That makes way more sense to me than to actually provide a measurable benefit.
Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply