Is it better to archive data into a single table or multiple tables?

Question

Post reply

Is it better to archive data into a single table or multiple tables?

myoda

Mr or Mrs. 500

Points: 550
More actions
August 29, 2013 at 3:59 pm

#276744

Hi,
I currently have a main table that keeps 7 days worth of records and anything older than that is put in an archive table in the same database. The archive table currently has about 40 million records in it, and we also have 4 other archive tables with data from the same main table in a different database seperated by year. Thoes 4 other archive databases are about another 40 or 50 million records.
So I am wondering if it makes any sense to have 5 archive tables for the same data, I was just thinking about combining the 5 archive tables on a seperate database from the main table.
Does anybody have any thoughts on that.
Thanks,

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply

homebrew01 SSC Guru Points: 55567 More actions · Answer 1

homebrew01

SSC Guru

Points: 55567

August 29, 2013 at 7:24 pm

#1646206

How is the archived data used ?

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 2

You're probably going to get a lot of recommendations to put it all in one table because "SQL Server can handle it" and I absolutely agree that SQL Server can handle it. [font="Arial Black"]BUT[/font], to use an example that one of my favorite DBAs used, when you do your laundry, do you take all the clean clothes you have off the hangers and shelves and wash them all along with the dirty clothes or do you normally just wash the dirty clothes (you bachelors that can't actually tell the difference should probably NOT answer that question).

The most likely answer is that you wash only the dirty clothes.

Having years of archivable data (data that is both historical and static) in a single table seems to work quite nicely for properly written queries especially in the presence of properly designed tables and indexes. So, no problem, right?

Well, not quite. There's a whole lot behind the scenes that you might not appreciate if you're the one that has to maintain indexes and statistics, do backups to disk, do disk to tape backups, or the one that needs to change the tapes and either mail them or drive them to the offsite tape warehouse. You also might not be the person that has to make sure that there's enough room on the high performance SAN drives or the person to write the check for those drives even if you're not the one actually paying for them.

If all the data for years of archivable data is in a single table (sometimes, even if it IS a Partitioned Table), then you're washing all the clean clothes with the dirty clothes every time you need to rebuild an index and every time you do a backup and the costs, both monetary and duration wise, can become quite expensive. It becomes even more expensive if even just one byte on one page becomes corrupt and you need to restore the database. If you've done nothing special, you'll most likely have to restore the entire database. If that takes 4 hours and the company relies on the database for thousands of dollars per minute, per hour, or even per day, that'll cost the company tons in tangible profit loss and tons more in the intangible losses such as phone calls from angry customers and a loss of reputation that could also cost the company customers.

Ironically, having an archive database for each year is one of the simplest forms of "partitioning" there is. It puts an individual file in different PRIMARY file group than what the bread'n'butter database is in. Even people who use Table Partitioning (Enterprise Edition) or Partitioned Views (Standard or Enterprise Edition) frequently make the mistake of putting the partitions or individual tables all on the PRIMARY file group of the main money-maker database. Spreading each year of archivable data over 1 database per year helps avoid that problem and can make DR restores a lot faster. You might not even need to do a full restore. You might be able to get away with a "piece-meal" restore where the main database doesn't even go down (Enterprise Edition only but also a whole lot better in Standard Edition).

I'm actually in the process of converting the archivable tables for my company from the monolithic behemoths they've become to a combination of Partitioned Tables in yearly databases (1 FileGroup/File per month) and then "joining" that data with the newest data in the main database using Partitioned Views. Don't get me wrong, the first conversion is a real chore especially if there's already lots of archivable data (a lot of databases can easily have 50% or more of purely archivable data, one of mine is 95% of 250GB) but imagine the joy that the NetOps team will have when I tell them that I've removed a half TeraByte of backups from the nightly backups we do. Imagine the joy the C-Level managers will have when I tell them that I might not even have to take down a database if there's corruption in the database or that I can do a restore of the money-maker part of the database in just 10-15 minutes instead of 4-6 hours.

The bottom line here is that I don't recommend that you combine the archivable data you currently have spread out in different databases into one monolithic table. In fact, you should look into how to partition further.

For one of the absolute best introductions as to what partitioning can do for your company, please see the following URL. Actually, partitioning existing large tables isn't always as easy as they make it seem (LOTS of restrictions concerning the Clustered Index), but it's definitely worth it.

Here's the URL. It's long but worth every minute of listening.

http://technet.microsoft.com/en-US/sqlserver/gg545009.aspx

After you watch that one to wet your appetite, then watch the following for more information…

http://technet.microsoft.com/en-US/sqlserver/gg545008.aspx

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)