November 9, 2020 at 3:08 am
Usually, we often talk about the approach of optimization select statement in SQL server, but now if there are any tips of improving the performance in a high concurrent system, if there is , how to optimze? could you please help to provide some pratical experience ?thanks!
November 9, 2020 at 5:49 am
I think the 3 most important things are...
There's a whole lot more but, because as you suggest, inserts and updates don't happen as often as SELECTs, many folks don't pay attention to Inserts, Updates, or Deletes until one of those things takes a minute or two to operate on just one row.
Sorry... didn't mean to ramble on that but, to be honest, that's all just scratching the surface.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 9, 2020 at 6:43 am
Thank you Jeff Moden for you kind help!
you said "Understanding that there are at least 6 different Insert/Update patterns and the 4 of them are seriously affected by "ExpAnsive" updates. ", could you please help me the said word ? thanks a lot!
November 9, 2020 at 1:10 pm
Additionally understanding that the WHERE clauses and JOIN criteria (if any) for data manipulation queries are subject to all the same rules as a SELECT query. Something like a function in the WHERE clause that would cause a scan in a SELECT query will also cause a scan in an UPDATE query, hurting performance on each.
Also, the tools used to understand the behavior of a data modification query are the same as attempting to tune SELECT queries. You'll be capturing query metrics with Extended Events or Query Store or the DMVs. You'll be interpreting behaviors using execution plans. All the same stuff.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
November 9, 2020 at 2:10 pm
Additionally understanding that the WHERE clauses and JOIN criteria (if any) for data manipulation queries are subject to all the same rules as a SELECT query. Something like a function in the WHERE clause that would cause a scan in a SELECT query will also cause a scan in an UPDATE query, hurting performance on each.
Also, the tools used to understand the behavior of a data modification query are the same as attempting to tune SELECT queries. You'll be capturing query metrics with Extended Events or Query Store or the DMVs. You'll be interpreting behaviors using execution plans. All the same stuff.
Grant Fritchey, thank you for great and enthusiastic help ! and thanks for your time !
November 9, 2020 at 5:54 pm
Thank you Jeff Moden for you kind help!
you said "Understanding that there are at least 6 different Insert/Update patterns and the 4 of them are seriously affected by "ExpAnsive" updates. ", could you please help me the said word ? thanks a lot!
First, Grant is spot on in what he says. For what he said, he's completely accurate. But, there's a lot he didn't say (not his fault) about Inserts, Updates, and Deletes. So here's the proverbial "rest of the story"... and this is just the "introduction" to the problems that a lot of people (includes at least all those using current supposed "Best Practice" index maintenance) are having but don't know about.
"Expansive" Updates
Starting with your question above (and many people ARE actually aware of this problem)...
Take, for instance, a table containing only integer and date columns plus two variable width columns called "Created_By" and "Modified_By. I'm citing these two very specific columns because the frequently occur as a bit of "poor-man's" auditing in many people's table designs. Both of these columns are typically VARCHAR(n) with "n" typically being a length of 50 or 100 (or some other value... the value doesn't matter here... it's still a variable width column.
During an INSERT into such a table, all columns of the table are affected. Typically, there's a default on the "Created_By" column of ORIGINAL_LOGIN() or some other function to automatically populate the row(s) being inserted with who or whatever inserted the rows.
Not so with the Modified_By column. Typically, it has no default.
Even more typical is the fact the people use and "ever increasing" valued key column such as an IDENTITY column or a date column for their Clustered Index (CI from here on) key.
So, here's what happens with all of that...
The Created_By column will never cause that problem because its content goes in during the original insert.
To summarize, those inserts on the "ever increasing" CI ignore the Fill Factor and pack the pages full. Those are the most recent pages and are usually updated soon after they are inserted and because of the "ExpAnsive" update of the "Modified_By" (or other variable width columns that have had extra data added to them), you end up with relatively massive page splits which can cause a whole lot of blocking and a huge amount of totally unnecessary log file activity the seriously (by a factor of 40 or more) slow down the updates.
And now... "The Rest of the Story" 😀
For the record, I currently have identified the following types of Insert/Update patterns (which also have some sub-patterns), which is also what I set the Fill Factors to (and I have code that assigns the Fill Factors automatically and a few new tools I've built to help automatically figure out what to do to fix some of the indexes but I'm not yet ready to release them to the public yet because of the 90/10 rule of developing something and I want to knock that 10% down to 1 or 2%)...
Type 100 - Totally or near totally static. Almost never changes and almost never suffers an "ExpAnsive" update.
Type 99 - "Ever increasing" index that suffers no "ExpAnsive" updates. Marked differently that "100" just to identify the type of index using only the Fill Factor.
Type 98 - "Sequential Silos". These are a strange pattern where the leading column of the CI might be something like "Server Name" and the second column is an ever increasing column like a date or User ID followed by a last date logged in or an invoice number followed by a status date or whatever. These indexes very quickly suffer massive logical fragmentation but virtually no physical fragmentation (page density) because they suffer virtually no page splits. It's the only "Insert/Update" pattern that I know of that has this trait. They are considered to be "permanently fragmented" even though they typically never suffer "ExpAnsive" updates due to multiple sequential insert points.
Note that these "Type 98" indexes may have to be changed to "Type 2" (described further below) indexes if they also suffer from "ExpAnsive" updates. If they suffer a lot from "Expansive" updates, they may actually fit the category of being "Random Silos" and may actually need to be treated as "Type 1" indexes (also described further below).
Type 97 - The "7" here has the same rough shape as a "2", which stands for "to do". These indexes have an ever increasing key that would other-wise require virtually no index maintenance but suffer from "Expansive Updates" in the process I first described at the beginning of this post. A 97 Fill Factor is used on these because lowering the Fill Factor does nothing to fix these and would be a complete waste of data buffer space (memory) and disk space and backup space, etc, etc and we need "to do" something to fix that problem to get rid of the "ExpAnsive" updates that cause the fragmentation.
Type "2" - These indexes are assigned a Fill Factor of 72, 82, or 92, all ending with a "2" as in "to do". The cause of their fragmentation is either unknown or is known but can't be fixed or there isn't time to fix them (yet). The reason why they are rebuilt at a lower Fill Factor is because we don't know or don't have time to fix them but we want to do one or both of two things. The first might be because they've done enough page splits to cause a "very low" page density and we simply want to recover some disk space. The problem is that we don't want to rebuild these at the default Fill Factor of "0" because that's actually the same as "100" and we don't actually know if the fragmentation was due to out of order inserts or "Expansive Updates" or what but we do know they did page splits (or maybe just DELETEs, which also causes logical fragmentation) and so we also want to leave a little headroom for those things.
This helps prevent the "morning after index maintenance" problem that a lot of people are aware of (index maintenance caused performance to go down) but haven't figured out the cause. The cause is that they Rebuild or (hopefully not) Reorganize indexes that have a "0" Fill Factor and so all "free space" is removed from the index when it needs it the most and EVERY out of order insert or "ExpAnsive" update causes massive page splitting, which seriously slows down ALL activity having to do with the index.
So, the bottom line for Type "2" indexes is you wanted to defragment the index or simply recover disk space but you want to allow some room for expansion and you still have something "to do" to fix those problems (including "Random Silos" where just one part or several small parts of an index are suffering from page splits). These might also end up being rather permanently fragmented.
Type 1 - Random GUIDs and other evenly distributed Inserts (in particular) or Random Updates (inherently covered). I assign a Fill Factor ending in "1" for these, which indicates the "threshold" of fragmentation where they need to be rebuilt (these types must NOT ever be Reorganized because Reorganize simply removes most free space and does nothing for pages between the Fill Factor and the 100% line-off death page density). Once those bad boys go over 1% fragmentation, they need to be Rebuilt because it's an early indication that ALL of the pages in the index are getting ready to split (they go from 1% to 30% virtually over night if active).
While all that sounds real bad and sounds like a really good reason why people say you should never use Random GUIDs for the CI, that's completely false (there are a lot of reasons to not use them but fragmentation shouldn't be one of them). If you establish the correct Fill Factor AND you REBUILD at 1% fragmentation, the CI can go for weeks and even months (yep... I have set of lengthy tests that prove it) with absolutely no page splits (no even supposed "good" ones) and so absolutely no fragmentation (logical or page density wise) occurs. Of course, no fragmentation also means no need for index maintenance during those weeks or months) And, the memory above the Fill Factor isn't wasted... it slowly fills up and so, just like pre-allocating disk size (especially for log files), it prevents a huge number of "growth" problems and will get used.
Type "0/5" - These are indexes that have either not been assigned a Fill Factor (indicated by a Fill Factor of "0") or have a non-zero 2 digit Fill Factor assigned by someone else or, perhaps, a 3rd party chunk of software that someone used.
To summarize, NEVER rebuild an index with a "0" Fill Factor without changing the Fill Factor. This is the leading cause of the "morning after index maintenance" slowdowns caused by the underlying massive page splits that occur. Either assign them one of the Fill Factors above or leave them alone, PERIOD.
If you have indexes with 2 digit Fill Factors that end in 0 or 5 that are becoming fragmented, they're other people's ideas of what the Fill Factor is and I would change them all to Type "2" Fill Factors just to mark them as "to do". I would be seriously leery of rebuilding any of those that have a Fill Factor of 85 or above (remember... only 2 digit fill factors ending in 0 or 5... don't include "100" in this).
REORGANIZE
I'll say it again as I have on quite a few posts now. Although Books Online (BOL/MS Documentation) correctly states exactly what REORGANIZE does and doesn't do, most people I know have misinterpreted that meaning and, so I'll say, that most people are using it the wrong way. The supposed "Best Practice" (and BOL doesn't call it that... read footnote #1 in the documentation where it says that you need to experiment and judge every index) of reorganizing between 5 and 30% fragmentation and rebuilding above 30% is actually a WORST Practice in nearly every case.
And, no... there is no insult intended to Paul Randal, the person that wrote that "recommendation". He'll tell you right up front that MS pressured him into stating some sort of General Guidline and so he chose the one that was best at the time. He also wrote footnote #1 to that that states you really need to examine every index.
It also turns out that REORGANIZE isn't the tame little kitty that everyone makes it out to be and is the frequent cause of the log file being larger (sometimes almost twice as large... yeah... I've got proof of that, as well) than your largest CI (which can be really tough if you have large CIs) even when logical fragmentation is quite low. Even REBUILDs in the FULL Recovery Model on such indexes have much less of an impact on the log file (there's also some cool tricks to keep them from blowing out the data (MDF/NDF) files, as well).
With that, I'll say stop using REORGANIZE for index maintenance even if you're stuck with the Standard Edition because it's actually setting up almost all of your indexes for perpetual fragmentation. IMHO, the only time you should use REORGANIZE is to compress LOBs that you've deleted from (and there's even a "fix" for that).
REBUILDs
Waiting for 30% or more fragmentation might be (I'm really mean "usually is") a silly thing to do depending on the Insert/Update pattern. Most of the "Wait" damage due to rampant page splits occurs between 1 and 10% logical fragmentation (again, I have proof in the form of tests). If you're going to wait until an index is 30% fragmented, it might actually be better to not waste resources by doing any index maintenance especially if the index usually only suffers "single row lookups". If that's the case, then you're actually measuring fragmentation the wrong way... you should be more concerned with wasted space and be looking at page density instead of logical fragmentation in such cases.
The Bottom Line
You can do a shedload of damage with index maintenance insofar as the "morning after", especially if you use what people have mistakenly adopted as a "Best Practice". In many cases, their index maintenance is actually setting up the pages for immediate and long term fragmentation and, like a bad drug, the more you do it, the more you need to do it. In other words, it can and frequently does perpetuate a ton of page splits, especially on the "morning after".
With that, I'll say that it's actually better to do no index maintenance than to do it incorrectly and the current supposed "Best Practices" for index maintenance are doing it incorrectly. There is NO PANACEA when it comes to index maintenance even if the indexes themselves have been perfectly designed.
And, yeah... although my proof there is a bit anecdotal, I went from January 2016 'til about June of this year with virtually no index maintenance and CPU dropped (all by itself) from an average "normal usage" of 22% to 8% (with similar reductions in reads and writes). In other words, performance didn't suffer... it actually got better.
Do I recommend NOT doing index maintenance? No. You need to at least recover disk and memory (buffer) space but you need to do it without causing the "morning after" effect.
Remember that fixing logical fragmentation really only fixes "read aheads", which don't actually matter once the data resides in memory (with a hopefully high PLE). There's a whole lot more to be concerned about like page splits and the resulting blocking and wasted space in memory (buffers) because of the other type of fragmentation, low page density.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 10, 2020 at 6:27 am
Jeff Moden, thank you for your help and guidance !
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply