December 19, 2008 at 2:23 am
I run an online game out of a SQL Server 2005 database, and over time, our database has grown to over 300 GB, which is now large enough that a backup or restore operation takes about 80 minutes. Given that we have live users 24/7, this is unacceptable, and I've been given a mandate to get that down to 10.
About 95% of the data in our database is data that gets inserted once and never updated. Of that data, typically only data generated in the past 10-15 weeks is queried, but our users can, whenever they want to, request data from anytime in our history and expect to be able to do so. These tables are generally heavily-indexed because it has proven optimal at peak load, which is really the main limitation on our design right now.
I think the best solution is to create an archive database of read-only data that can be accessed during gameplay and that will have the same indexing etc. as the main database, but that will contain all data which is more than 10-15 weeks old. I first tried the following plan:
1) Create the new database with tables and indexes.
2) INSERT INTO archiveDB.table SELECT * FROM mainDB.table WITH (NOLOCK) WHERE timestamp<GETDATE()-75
3) DELETE FROM mainDB.table WHERE timestamp<GETDATE()-75 [since these don't quite run concurrently, I obviously picked a fixed date]
Step 2 was incredibly slow, however, and was on pace for one sample table (200M rows, 10 GB, 8 indexes) to take four days.
Step 3, similarly, was on pace to take about 60 days. This was also true on a copy of the DB that I grabbed from a backup, so there were no locking issues. I ran this test in Simple backup mode, so logging should not have been an issue. Basically, it's so slow as to be completely unworkable.
I next tried the SQL Server-based copy table from one database to another option. This would have taken about a day, and then there would still be the 60 days of deleting. Further, there was no option to copy rows from one database and INSERT those rows into another, just to overwrite.
I suspect partitioning would be the better solution, but we are using SQL Server 2005 Standard rather than Enterprise.
Does anybody have a good idea for an archiving strategy we might use? All help would be greatly appreciated, and if you happen to be an online basketball game fan, I'll gladly throw in a premium subscription. 🙂
Thanks again,
-Charles
December 19, 2008 at 6:13 am
If you can't partition, you could try simulating it. Use a view, that you'll need to rebuild regularly, and split the data up into multiple physical tables, seperated by day, week, month, whatever works. It won't be as simple as partitioning, but the concepts should be the same.
Also, you might just need to tune the delete query. Is it using a good index? This one is grasping at straws a bit because I suspect you're moving so much data that an index wouldn't help anyway.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
December 19, 2008 at 7:08 am
If your issue is "just" needing to reduce backup times perhaps you could move your 95% static data onto one filegroup and the larger tables onto another filegroup(s) and use filegroup backups?
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
December 19, 2008 at 11:41 am
A few suggestions:
1) use differential backups - they could help you meet your goal by themselves if your 95% figure is accurate. Schedule periodic full backups at quiet times so that the differentials don't become too large. It will add a little to restore time (full + differential) but I'm guessing that restore time is less of an issue at this point.
2) split the backup over multiple physical drives.
3) user backup software with inbuilt compression (my personal favourite is Hyperbac). It will require a little more CPU power but the time saving in disk writes is huge.
If they can't be used you could try the filegroup strategy as Jack suggested. You could also split that table into multiple files. Both involve database changes though.
These will all help with backup duration; if you need to archive for other reasons then I'd have other suggestions but partitioning via views is possibly the cleanest method.
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply