Thoughts about Table Partitioning...

  • Hi.

    Background:

    I have two large tables. Both are clustered on a bigint IDENTITY column. The cluster is presently fill factor 90.

    One table is a narrow table with a ridiculous number of rows, the other is a much lower row count, but every row stores BLOB data.

    Both tables are part of a peer-to-peer transactional replication topology. In order to make this work, each node is assigned a range of identities, which is assigned by server to all replicated tables. That way, an insert in one node won't overlap identity with an insert in another node (of course 😉 ).

    My concern is page splits, which are inevitable in PPTR situations. However, for these two large tables, I feel the need to consider alternatives. Partitioning comes to mind.

    However, I have very minimal exposure to partitioning.

    The range of IDENTITY values by server is known. One node is the OLTP node and is responsible for about 78% of the insert velocity on the narrow table and about 75% of the insert velocity on the BLOB table (PDF document archive). The narrow table inserts rows at the rate > 300,000 per day; the blob table at the rate of 2,200 per day. Delete velocity on these tables is presently zero (no archive / purge because of business rules).

    The questions:

    Therefore, I thought about partitioning at least the narrow table along the identity ranges, so that each partition would be appending pages to the end rather than repeatedly page-splitting one monolithic table as the non-OLTP nodes insert rows.

    Thus my questions:

    1. Am I correct in my thoughts re page-splitting and partitions; i.e., will each node be adding rows to the end of the partition and avoiding page splits (perhaps allowing me to go fill factor 100)?

    2. Is the BLOB table even merit consideration given how the varbinary(max) column data is stored?

    Thoughts?

    Thanks

    John.

  • There are a HUGE number of caveats to partitioning especially Partitioned Tables (as opposed to partitioned views which have their own set of caveats and work arounds. One of the biggest careats is that for "aligned" indexes (and, trust me, you most likely WANT aligned indexes), all unique indexes will have the partitioning column added to them. If you stop and think about that for a minute, that means that the index is no longer unique unless the partitioning column is also the single column PK or Unique column.

    That also means that you need to carefully and deeply consider just exactly what will happen to any FK's pointing to the table. It may mean that you want to skip the idea of aligning indexes but then you also have to understand that you won't be able to maintain partitions of those indexes (because there won't be any partitions) and you can't SWITCH data in or out of the table unless all indexes are aligned.

    It'll take you a month to do it but you need to really (as in deep dive) study about partitioning in Books Online before you write a single line of code towards it. Like a really good movie, once you finish it, study again and find all the stuff that you missed the first time around.

    That, notwithstanding, paritioning of WORM (all rows are written just once and never updated) tables is absolutely worth it especially if you use 1 File per FileGroup and 1 FileGroup per partition. I haven't finished it yet but the system I'm writing for the 300GB telephone recording system table is really going to streamline my backup processes. As a monolith, it takes close to 5 hours to backup because of all the other nightwork that needs to be done at the same time. After partitioning, I'll only need to backup the active month and that will take 0 to 10 minutes depending on where we're at in the month.

    Even partitioning of OLTP tables can be well worth it for backup, index maintenance, and piecemeal restore capabilities but you have to be really careful about FKs that point to the table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • To answer your original questions, if you maintain a clustered index on an IDENTITY column, then whether the table is partitioned or not, the data will always be placed at the "logical" end of the table and the clustered index will almost never suffer fragmentation just because of inserts (story changes for updates on variable width columns) even if the Fill Factor is 100%. To wit, partitioning won't help here but it also won't hurt depending on what the partitioning column is.

    As far as the blob table goes, yes. It should absolutely be considered for partition for the sake of backups. That 300GB table I mentioned has a blob in it that contains the .WAV file of the telephone call.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi. A few more details...

    1. Both tables are clustered on a BIGINT identity. This identity would be the value upon which I would partition. Each node in the PPTR topology inserts rows on this Identity, with certain ranges defined by which server inserts the row. I would partition into these ranges so that each "table" would insert rows at the end, causing the fewest page splits. Not for read performance, but more for insert performance.

    2. The BLOB containing table is completely WORM. The tall Narrow table is WORM in about 80% of its rows. Don't ask why, it is a design decision that was made before I joined the company and one about which I can do nothing. However, the PK on both tables is the clustered BIGINT identity.

    3. The table contains no FK references, but each is referred to by a FK to the table. However, due to the "brilliant" application design, the FK in question are not defined to SQL Server. :crazy:

    4. The purpose of the partitioning in this case is not to permit SWITCH of old partition data (i.e., archiving). It is to reduce or eliminate page splits where the tables are receiving inserts in four (soon to be five) ranges of IDENTITY values in the clustered index. I also think that piecemeal backups don't help me here.

    5. I would align the indexes on the tables.

    Hope that this gives perspective.

    Thanks

    John.

  • JohnFTamburo (1/30/2014)


    Hi. A few more details...

    1. Both tables are clustered on a BIGINT identity. This identity would be the value upon which I would partition. Each node in the PPTR topology inserts rows on this Identity, with certain ranges defined by which server inserts the row. I would partition into these ranges so that each "table" would insert rows at the end, causing the fewest page splits. Not for read performance, but more for insert performance.

    If by "identity", you mean a column that has the IDENTITY property, there wouldn't be any performance increase for such partitioning if you partition based on IDENTITY column values. The "current" partition would be treated as if you had a monolithic table for the inserts.

    There could also be a heavy penalty to pay when you decide to add a new partition unless you've pre-created not one but two empty partitions already. Otherwise, there could be some heavy data movement when creating the new partition if your new partition boundary doesn't create an empty partition. For IDENTITY columns, that would require you to predict the IDENTITY values for the new partitions.

    2. The BLOB containing table is completely WORM. The tall Narrow table is WORM in about 80% of its rows. Don't ask why, it is a design decision that was made before I joined the company and one about which I can do nothing. However, the PK on both tables is the clustered BIGINT identity.

    Understood.

    3. The table contains no FK references, but each is referred to by a FK to the table. However, due to the "brilliant" application design, the FK in question are not defined to SQL Server. :crazy:

    If they ever decide to add FKs that point to the IDENTITY column, which you intend to use as the portioning column, you should be ok insofar as FKs go. Any other column will be virtually impossible to reference as an FK because the partitioning column will automatically be added to any aligned unique index.

    4. The purpose of the partitioning in this case is not to permit SWITCH of old partition data (i.e., archiving). It is to reduce or eliminate page splits where the tables are receiving inserts in four (soon to be five) ranges of IDENTITY values in the clustered index. I also think that piecemeal backups don't help me here.

    Everyone says that but, for mostly WORM tables, the requirement to delete really old data always seems to crop up. For WORM tables such as audit tables and even old "order" tables, I'd always plan on SWITCHing being a future compelling requirement.

    5. I would align the indexes on the tables.

    Agreed. That's normally the best idea because non-aligned indexes will prevent the ability to use SWITCH. Sure, you could drop the non-aligned indexes, do the SWITCH and rebuilt the non-aligned indexes (usually built only to support foreign keys on other than the partitioning column) but life will be easier if you can avoid that.

    You should be made aware that partitioning a table frequently requires a bit more space because of the individual B-Trees on each partition and that partitioning a table usually does nothing for performance even when "partition elimination" comes into play. In fact and contrary to much advertisement to the contrary, partitioning will frequently make code run a bit slower because of the split B-Trees. A well designed and properly indexed monolithic table will usually out perform a partitioned table except when truly crap code is played against it and partition elimination kicks in.

    So far as I'm concerned, the only compelling reasons to partition tables are to make nightly maintenance and backups shorter. They won't actually be easier if you do it right by checking to see if anything in the individual partitions changed so that you don't have to reindex partitions that didn't fragment or backup partitions that suffered no changes.

    Piecemeal restores are a pleasant side effect of partitioning (if there's one file per partition) that can actually save your bacon in the case of isolated corruption. It IS possible that the system can continue to be available during an "online restore" of a file group.

    Please see the following videos for a great introduction and demonstrations of partitioning by someone that I have an extreme inherent trust for. Combined with some extra reading in Books Online, she explains partitioning far better than I ever could and I believe it will answer many of your questions that you haven't thought to ask yet.

    http://technet.microsoft.com/en-us/sqlserver/gg545008.aspx

    http://technet.microsoft.com/en-us/sqlserver/gg545009.aspx

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • About the only way that I can think of to increase the performance of inserts when an IDENTITY column is the clustered index (and with the understanding that I haven't actually tried it) would be to create a FileGroup with multiple files in it and each file is on a GUARANTEED separate set of spindles (most SAN guys hate it) which would also bring more read/write heads into play. You don't actually need partitioning for such a thing.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (2014-01-31):

    If by "identity", you mean a column that has the IDENTITY property, there wouldn't be any performance increase for such partitioning if you partition based on IDENTITY column values. The "current" partition would be treated as if you had a monolithic table for the inserts.

    There could also be a heavy penalty to pay when you decide to add a new partition unless you've pre-created not one but two empty partitions already. Otherwise, there could be some heavy data movement when creating the new partition if your new partition boundary doesn't create an empty partition. For IDENTITY columns, that would require you to predict the IDENTITY values for the new partitions.

    I think this may have answered the final question. Picture four nodes in peer-to-peer transactional replication on four servers. The Table A has a PK clustered on ID, an identity column BIGINT. WORM with high insert velocity.

    Server1 uses IDENTITY values of 1-10,000,000,000

    Server2 uses IDENTITY values of 10,000,000,001-20,000,000,000

    Server3 uses IDENTITY values of 20,000,000,001-30,000,000,000

    Server4 uses IDENTITY values of 30,000,000,001-40,000,000,000

    40,000,000,001 and above is available for future servers.

    If the table is partitioned on the identity ranges above, and server 1 inserts a row that (in current numbering) has ID of 100,345,789, server 2 inserts a row ID of 10,005,111,756, Server 3 inserts ID 20,003,444,333 and server 4 inserts 40,222,333,444, are you saying that there will still be page splits (on the clustered index) for all but the last one?

    That is what I am trying to avoid -- massive page splits on a high-velocity essentially WORM table that inserts clustered identity values along multiple ranges. We have developed sophisticated range management tools to ensure that we don't overlap and screw up the data. Now my concern is speedy inserts as the velocity in this table builds.

    My idea was that the partitioning would in essence create 4 or more "tables" behind the scenes, each of which would simply add pages to the end of the "table" as the identity that "hits" is encountered. If that is what happens, then partitioning just may be worth my time. I could change the fill factor (presently 90) to 100, knowing that I'd never have a page split.

    And, setting up 2 or more empty partitions would not be a hassle.

    I will watch the videos.

    Thanks

    John.

  • Jeff Moden (1/31/2014)


    About the only way that I can think of to increase the performance of inserts when an IDENTITY column is the clustered index (and with the understanding that I haven't actually tried it) would be to create a FileGroup with multiple files in it and each file is on a GUARANTEED separate set of spindles (most SAN guys hate it) which would also bring more read/write heads into play. You don't actually need partitioning for such a thing.

    Don't hate me... The entire PRIMARY and INDEX filegroups are on SSD. 😀

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply