Updating Statistics

  • Comments posted to this topic are about the item Updating Statistics

  • I a m developer (SSAS/SSIS) and familiar with the concepts of clusted and nonclustered indexes.

    How much more about indexing/statistics/... should i have knowledge of?

    What is "nice to have" and what is "must have"?

     

    EDIT: asking because i am one of the people that answered wrong here

    • This reply was modified 5 years ago by  ktflash.
  • Even after several minutes of thinking, I can't really anything I know about indexing flag as only nice-to-have. Ok, maybe you could just skip the whole in-memory-table stuff with its hash buckets for start, since in memory tables have several limitations and are not very popular for this reason.

    But nearly everything else allows you to build better indexes, better queries and designing your tables / DB better.

    You could start on the pages of Brent Ozar, he has some very nice guides about a lot of topics including indexing.

    BTW: the most underused feature in my opinion is the filtered index. I don't know, how often I found an index on a NULLable column, where only 2 % of the rows contains a value and this index is not filtered (CREATE INDEX idx_tbl__ref_nr on tbl (ref_nr) WHERE ref_nr IS NOT NULL).

    The most important point about filtered indexes is, that the filtered value has always to be included in the index (if it is not part of the index, which often makes less sense, particularly on flags).

    Example:

    CREATE INDEX idx_customer__active on customer (customer_nr) INCLUDE (name, flag_active) WHERE flag_active = 1

    without including the flag_active it would always do a key lookup...

    God is real, unless declared integer.

  • Thanks for this question, Steve. Let me add a link to an interesting article: Does rebuild index update statistics?

  • ktflash wrote:

    I a m developer (SSAS/SSIS) and familiar with the concepts of clusted and nonclustered indexes.

    How much more about indexing/statistics/... should i have knowledge of?

    What is "nice to have" and what is "must have"?

    EDIT: asking because i am one of the people that answered wrong here

    Funny that you should ask that question.  I've found out (the hard way) that even "experts" on the subject are making recommendations and calling certain things "Best Practices" when the supposed "Best Practices" they're recommending are actually causing the system to generate massive amounts of page splits, which is the usual cause for what they're calling "fragmentation" (there are scenarios where no page split occur but you have 99.9% fragmentation appear even after a rebuild virtually overnight).

    Most of the world follows the supposed "Best Practice" of doing nothing up until 5 or 10% fragmentation (we'll call it 5%), using REORGANIZE for "fragmentation" between 5% and 30%, and REBUILD for anything > 30% "fragmentation".

    I know where those numbers came from (Paul Randal) but even he said they were a general recommendation that MS forced him to make.  Further, it says right in Books Online ( https://docs.microsoft.com/en-us/sql/relational-databases/indexes/reorganize-and-rebuild-indexes?view=sql-server-ver15 ) right below that "recommendation" that states...

    Read the important parts in that where it says that "However, the actual values may vary from case to case.  It is important that you experiment to determine the best threshold for your environment".

    And, yeah... Paul Randal wrote that, as well, and he's completely correct there.

    Almost everyone ignores that.

    It turns out that REORGANIZE isn NOT your friend.  For example, almost everyone says that the best reason to NOT use Random GUIDs is because of the massive number of page splits (quickly causing huge amounts of fragmentation).  While there are a lot of other reasons to not use Random GUIDs, fragmentation isn't actually one of them.  I have tests that prove that you can actually go for months with absolutely ZERO page splits or fragmentation and, again, Paul Randal actually mentions such a thing ( https://docs.microsoft.com/en-us/sql/relational-databases/indexes/specify-fill-factor-for-an-index?view=sql-server-ver15 )...

    "A nonzero fill factor other than 0 or 100 can be good for performance if the new data is evenly distributed throughout the table. "

    Sounds like a bloody Random GUID to me. 😉  It's especially good for high INSERT rates because it has NO HOT SPOTS in the table.

    The problem has nothing to do with the fact that you're using Random GUIDs.  The problem is actually caused by the way most people do index maintenance.  In this case, the use of REORGANIZE is the actual killer here because people don't actually consider the fact that REORGANIZE DOES NOT CREATE ANY NEW PAGES.  So, REORGANIZE packs the data tighter up to the assigned Fill Factor but does NOTHING to bring pages fuller than the Fill Factor down to the Fill Factor.  As a result, REORGANIZE actually guarantees, perpetuates, and makes page splits MUCH worse for such Random GUID indexes.

    Then there's the other supposed "Best Practice" of reducing the Fill Factor if you have too much fragmentation between index maintenance runs.  If you don't know what "kind" of insert and update pattern you have, then that's one of the worst recommendations there is because it won't only NOT stop the fragmentation, you end up totally wasting huge amounts of disk space and memory.

    I'm slowly but surely working on a "Stairway" that identifies all of this type of stuff with indexes and also provides verifiable proof in the form of repeatable, demonstrable code on the subject.  I've already taught the basics of what I'm talking about in my "Black Arts" Index Maintenance 1, 2, and 3 presentations at several SQL Saturdays. I even demonstrate that NOT rebuilding indexes and just religiously maintaining stats is better than maintaining indexes blindly and, so, incorrectly.

    So, yeah... there's a whole lot to learn.  It took me 20 years to figure out a lot of this stuff.  The first thing to do is to learn ALL of the basics, while understanding that what most people recommend as "Best Practices" actually aren't.  Then, keep learning from there.

     

     

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply