Best Clustered Index Configuration for Partitioned Multi-Tenant Table with RLS

  • Jeff,

    Have you got any recommendations for the fill factor when I create the new clustered primary keys on (TENANTID, ID)?

    Where ID is an identity column on the table.

  • post to get new page

  • Jonathan AC Roberts wrote:

    Jeff,

    Have you got any recommendations for the fill factor when I create the new clustered primary keys on (TENANTID, ID)?

    Where ID is an identity column on the table.

    Yes.  And "It Depends" on how it will be populated (of course).  Will it be ...

    1. Insert-only (or Inserts followed by non-ExpAnsive Updates in any position other than the 2 key columns)
    2. Insert followed by ExpAnsive updates to only the newly inserted rows
    3. Inserts followed by ExpAnsive updates to any rows anywhere in the index sometime in the future

    For #1 above, logical fragmentation will get to more than 99% in very short order BUT the Page Density will remain very close to 100%.  I call this type of index "Sequential Silos" and there will be no fragmentation withing each Tenant_ID silo.  It will not need index maintenance even though the numbers will say it's "fragmented forever".  The number "8" looks like an infinity symbol standing up and you don't want to waste Page Density on something that will not benefit from a low reduced Fill Factor so create this index with a Fill Factor of 98 (I also call these indexes a "Type 98 Fragmentation Pattern) to remind you of what type of index it is and that the only index maintenance it will need is to rebuild statistics. Don't forget to rebuild statistics on a regular basis.

    For #2 above, new rows will go in at the end of each Tenant_ID silo, creating a nearly 100% page density for each silo.  Updating these rows with an "ExpAsive" update will cause serious logical fragmentation and Page Density issues. The best thing to do would be to make some adjustments so that the Updates are not "ExpAnsive" so that they can be treated like #1 (Type 98) above.  If you can't do that, then build the index with a Fill Factor of 96 to remind you that the table Non-Sequential Silos or Multiple Hot Spots.  It will occasionally need a REBUILD to recover disk and memory space.  I'd wait for the Page Density to get below the Fill Factor-5 or Fill Factor-10 (or 91 or 86 percent page densisty, in this case) before I'd do a rebuild.  If you make the nasty mistake of doing a REORGANIZE on this type of index, it will have log file activity that will generate huge amounts of log file entries that could easily be equal to more than twice the size of the index, not to mention possibly taking days (not a misprint) to execute when it gets "big".   Don't forget to rebuild statistics on a regular basis.

    For #3 above, the updates are following a pattern not unlike Random GUIDs in that they're mostly "evenly distributed".  Depending on the fragmentation rate, I'd build these with a Fill Factor of 81 or 91,  The "1" in each of those reminds you to rebuild the index when it reached just 1% logical fragmentation.  We look at logical fragmentation here, not because logical fragmentation makes any significant degradation of of performance but because it's an indication that the area above the Fill Factor is really full and Page Splits are near the point of exploding across the index because of the even distribution. Don't forget to rebuild statistics on a regular basis.

    That's for non-partitioned indexes.  If you have the same keys for a partitioned table, the Type 98 index can be treated as a Type 99 (see below) because each partition of the index will act like a Type 99 index.  Personally, I'd leave it as a Type 98 to remind me of what will happen to it if I departition the index and it costs almost nothing to do so.

    In case you're wondering...

    Type 100 (Fill Factor) is for static reference/lookup tables.  You shouldn't have to rebuild these except may after updating or inserting new rows as even reference table do.

    Type 99 (Fill Factor) is for ever-increasing indexes that have only 1 hot spot at the end of the index and either have no updates or the updates are non-ExpAnsive.  These are different than the Type 98 above because these only have 1 hot spot and it's at the logical end of the index.  They should almost never need a rebuild except maybe to increase the number of pages for read ahead fragments.

    Type 97 (Fill Factor).  Nearly identical to type 96 except they only have a hot spot at the end of the index and ExpAnsive Updates only occur on the newly inserted row in the hot spot.  Same rule for REBUILD applies... when Page Density gets below Fill Factor-1 or Fill Factor-10 (depends on how often you want to rebuild), that's the time to REBUILD.  This is one place where you could use REORGANIZE but it's still going to beat on your logfile a lot and it won't do as good a job as a rebuild.

    Type 1 (Fill Factor 71, 81, or 91) are for evenly distributed inserts and evenly distributed ExpAnsive Updates and Random Silos.  These types are for Random GUIDs and things like them.  There's nothing you can to prevent the pages from filling up but you can prevent page splits by rebuilding these indexes as soon as they start fragmenting, which is to rebuild them when the reach or go over 1% logical fragmentation.

    Type 2 (Fill Factor 72, 82, or 92). These are indexes that you haven't figured out yet but you need to recover disk space.  Since you don't know the fragmentation pattern, be on the safe side and use a Fill Factor ending in "2" to remind you that you have "something TO do".  The reason for the fill factor might be to help reduce page splits right after you rebuild them.  Of  course, lowering the Fill Factor that much won't work for types 96, 97, or 98 but you just want to recover some disk space.  Remember that type 98 won't have a disk space problem even when very high amounts of "fake" logical fragmentation are present.

    Sidebar:  Online Rebuilds don't do as good a job as Offline.  Both can seriously benefit in duration and logfile usage if you can temporarily shift from FULL to Bulk_Logged Recovery.  The only two things the messes up is (supposedly) Database Mirroring and Log Shipping.  Note that I have not tested what that messes up because I don't use either of those nor anything like AG. 

    And, of course, these only apply to RowStore indexes.  Also, be aware of compressed indexes.  Row compression makes a whole lot of "fixed width" datatypes change to "ExpAnsive" Updates and Page compression includes Row compression.  Compression can change a nice Type 99 index that never fragments into a fragmentation monster that will cause massive Page Density issues in very short order.  It's a sharp reminder that "It Depends" and every "feature" in SQL Server comes at a cost.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff

    Thank you for the detailed explanation and advice on Fill Factor settings. It is incredibly helpful.

    In our case, the clustered index (TENANTID, ID) will likely see updates to rows due to the nature of the application. While these updates are not extremely frequent, they could still lead to page splits over time. Disk space is not a concern since we are running on AWS RDS with ample capacity (3TB disk, 800GB database). The disk is much larger than the database because, with AWS, the more disk you provision, the higher the IOPS you get, and it is more cost-effective to do this than pay directly for higher IOPS.

    Based on your guidance, I am leaning towards starting with a Fill Factor of 80 to minimise page splits, given the update pattern. I realise this leaves more space on each page, which could slightly impact read performance, but with our high write workload, it seems like a reasonable trade-off.

    Would you agree that Fill Factor 80 is appropriate, or do you think a higher value like 90 or 96 would better balance updates and page density? Would you also suggest any additional metrics beyond fragmentation rates and page splits to monitor the index’s performance over time?

    Thanks again

  • Since you say the updates are not "extremely frequent", it's mostly an "Insert" index by TenantID.  It's a Type 98 Index and since there aren't a large number of ExpAnsive Updates, I'd build it with a Fill Factor of 98 (Sequential Silos).  It will logically fragment like crazy but that's "fake" fragmentation because there will be nearly zero bad page splits and the Sequential Silos themselves won't actually have much true fragmentation to them.  You may never have to rebuild this index even with such a high Fill Factor.

    Also, if I understand the clouds correctly, you don't pay so much for storage but you do pay for "Reads"... even the parts of the pages that are empty because you can't read a part of a page.  The whole thing gets loaded even if you read just one row.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Since you say the updates are not "extremely frequent", it's mostly an "Insert" index by TenantID.  It's a Type 98 Index and since there aren't a large number of ExpAnsive Updates, I'd build it with a Fill Factor of 98 (Sequential Silos).  It will logically fragment like crazy but that's "fake" fragmentation because there will be nearly zero bad page splits and the Sequential Silos themselves won't actually have much true fragmentation to them.  You may never have to rebuild this index even with such a high Fill Factor.

    Also, if I understand the clouds correctly, you don't pay so much for storage but you do pay for "Reads"... even the parts of the pages that are empty because you can't read a part of a page.  The whole thing gets loaded even if you read just one row.

    The index itself only handles inserts, and the primary key values are never updated. However, since this is the clustered index, doesn’t it effectively include all the other columns on the table? Those columns are frequently updated, which could lead to page splits or other performance issues.

    Regarding AWS pricing, there are indeed various models. The one we’re using provides increased provisioned IOPS as you buy more disk space, up to a maximum of 3TB, which gives 16,000 provisioned IOPS. Interestingly, it’s actually more cost-effective to buy additional disk space than to go for a smaller disk and separately pay for higher IOPS. It’s not cheap, but I guess we’re also funding Jeff Bezos’ trips to outer space!

Viewing 6 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply