How to reduce maintenance window impact of very large table

  • I have a database where one of the tables – TRANSACTION_DETAILS – accounts for about 80% of the size of the database. Data has to be retained in this table for 7 years. I estimate that the table will reach a steady-state size of about 1.5 billion rows with space usage in the neighborhood of 2 terabytes.

    The table has a bigint identity column so I am not worried about running out of identity column values when they are being consumed at the rate of 200 million per year (unless I did the math wrong, I can add rows at this rate for 46 billion years). This column is also the clustered index.

    The usage of a table is such that data is increasingly read-only as it ages so that data more than a year old is rarely updated and data more than two years old is essentially never updated. In fact, data more than two years old is practically never accessed at all. Data access patterns align with the identity column so that most recently added rows are also the most frequently accessed.

    The PROBLEM is that a table of this size adds substantially to database maintenance windows (backup, recovery, index maintenance) even on powerful servers.

    My QUESTION is what is the best approach for reducing the maintenance burden of this large table.

    CHOICES as I see them are to:

    1.Partition the table on the identity column with a new partition and data file for each additional year of data. Actually, on thinking about it, partitioning is probably a good idea for size reasons alone.

    2.Set up archiving where older data can be moved off to an archive database. This is a little more work initially and, for a couple of reasons, I would like to not do this, but if it is the best option, then I will do it.

    So, what is the recommended option and why. My gut tells me that the option of archiving data to separate archive database that is not part of the nightly and weekly database maintenance cycle is the way to go. Can I achieve the same benefits with partitioning the table within the primary database.

    Thank you.

  • While the data may need to be retained for 7 years there is no reason it needs to be retained in that table.. I would consider archiving the data.

    CEWII

  • As Elliot suggested, one of the best things you could do to ease the maintenance load is to divide the table up a bit. HOW you do such a thing can make all the difference in the world for the sake of performance.

    IF you have the Enterprise Edition of SQL Server, then you could partition the table based on the transaction date. For such a large table, you might consider partitioning by month which would give you 7*12 or 84 partitions which is well within the 1000 partition maximum of SQL Server. Please see "partition functions [SQL Server]" in Books Online for how to setup such a thing and for how to automate the process in a monthly run.

    IF you [font="Arial Black"]don't[/font] have the Enterprise Edition of SQL Server, you can still pull off a minor miracle using "partitioned views" athough it's a bit more work to initially setup because you have to build the separate tables yourself. The key to "partitioned views" is to have a constraint on the transaction date column that limits what can go into each table. Done properly, queries that only refer to one or two months will only select from the appropriate one or two tables. Of course, since the older tables won't suffer many updates and they have their own indexes (another bit of monthly work that can easily be automated for the single new table created each month), index maintenance becomes a breeze because most of the tables won't actually need reindexing. Only the latest 2 or 3 might. Lookup "views [SQL Server], partitioned" in Books Online for how those work.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • With regard to partioned tables, does the partitioning key have to be the first column of the clustering index? It seems that way to me.

    Also, if you have a unique index consisting of two columns where one column contains more unique values than the other AND you are doing lookups where both column values in are provided in the WHERE clause, does the order of the columns matter in the index definition. The reason I ask is that the column that makes more sense for partitioning is not the column with the greatest cardinality.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply