January 7, 2009 at 10:54 pm
Well here's my problem. I have a database that has lots of tables with lots of data that I don't really need to keep in my production db and which is slowing performance. Some tables have 10 million plus records in them.
Table partitioning would go a long way to solving this problem however my clients are not willing to upgrade to enterprise sql server so I'm stuck with standard. So I'm looking to moved data from the production database to an archive db.
This where things get tricky. The databases are 24/7. Going off-line for any long period of time is not an option. I've been using SSIS to move data and then purge from production data. Deleting 10 million plus records from a table with several indexes is slow in itself (there is some relatively complex logic in deciding what to archive) never mind the problem of taking out table locks which basically breaks my system that is accessing this database. I've tried looping and deleting smaller chunks and while this makes my system able to keep functioning it pushes the archiving process time to complete into days (maybe even weeks if my calculations are correct). I've also tried copying the data I want to keep (since this is a small percentage of what is actually in the tables) into a copy of the table and then renaming tables and indexes, I probably don't need to elaborate too much on the problems I face with this on a live system.
So basically I'm looking for any other possible methods to approach this problem. Any suggestions will be much appreciated.
January 7, 2009 at 11:16 pm
2005 standard
January 7, 2009 at 11:24 pm
There may the possibility of making a copy of the database and exporting the data out of your current database that you require for your application. You would need to export small chunks of data at any given time using SSIS, but then this allows you to have a fresh database for your application run off and your current database then becomes your archive.
This probably is not the most optimal solution as it then adds another database for you to keep track of but it will solve the problem of 24/7 uptime and better performance.
Hope this helps
January 8, 2009 at 12:19 am
Not a bad idea duplicating the database and switching over once cleaned up. Only problem is there will be new data in the original database by the time it comes to switching over to the new one. There are new records inserted into the database every few seconds. I guess I could try and grab these records from the old database once the new one is online, but there is still the danger of losing data.
Or maybe I misinterpreted your suggestion?
I should also mention that this database is used by a multi-tiered system that has several services that read/write to it constantly. The particular archiving task I'm looking at is only related to one of these services and therefore only a subset of the tables in the entire database. Archiving the entire database is an even bigger problem.
January 8, 2009 at 12:27 am
Yes, you need to make the distinction between legacy and current data the current data. The current data would need to go into the new database and possibly +- 1hour of down time, if you can afford to have that in order for your application to switch databases. Prior to the switch the newest records in the DB would need to be pulled accross, the application switched to the new DB and then its business as usual. This then limits the loss of data alternativly the application switch over would need to occur in a period of least usage.
January 8, 2009 at 12:32 am
Yeah I think I will try this out to see what the total downtime would be.
January 8, 2009 at 12:37 am
Good Luck. Let me know how it pans out.
January 8, 2009 at 12:41 am
Yeah, cheers for the help.
January 13, 2009 at 12:15 am
I ended up getting a pretty reasonable solution to this problem. Basically for large tables I copy the data I want to keep into a temporary table and then truncate the table. Of course I have to lock the table before I start selecting records to keep. My t-sql code checks to see how many records are to be deleted and if there is less than say one million records it just deletes rather than copying since if it has to copy several million records it will be quicker just to delete. My initial down-time came to about 20 mins which is pretty good. Subsequent archiving process (daily runs) take around 6 mins to execute and have no negative impact on the system.
The tricky thing about this problem is I have to build the SSIS packages dynamically with C# code as I have to re-use existing logic built into the system. My app seems to work well, but I guess I'll have to wait and see what the testers think :0)
January 13, 2009 at 1:05 am
Need to know more about archiving. I have database its more than 1 database same schema for different clients. I have created Archive databases for each client according to database name, Say if client db is ClientDB1 my archive db name is ClientDB1_Archive and so on...
now here is my question for archiving/purging i have splitted it in various functional area and then Archiving /purging the source dB in archive dB.
the purpose behind it to maintain the referential as well as integrity of data. I have written the stored proc which runs by specified job at mention time lines.
Any suggestions on above mention scenario for Archiving / Purging.
Abhijit - http://abhijitmore.wordpress.com
January 13, 2009 at 1:29 am
Personally I did not keep the keys and indexes in my archive db since I want to minimize the time taken to push data into the archive. As far as maintaining the referential integrity in your production database this is something that can only be determined from the table relationships. Using SSIS is a good way to do this since you can create dataflows for pushing the data from the prod db to the archive and then link all the dataflows in a way that honors the constraints between your tables. I guess you could do it all with t-sql, but the more complex the archiving task is the more difficult it will be to debug. With SSIS you can break the process down into its individual flows and sql tasks and execute each step individually. When you execute the entire package if something breaks you instantly see exactly where in the sequence it breaks.
January 13, 2009 at 1:32 am
Great, its good to hear things worked out.
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply