August 22, 2008 at 10:27 am
Hi members,
I have some tables in my database and in each table every month fresh data gets added to the previous data. The approximate size of the data that get added to the tables is 1.5 - 2 million records and my Project Manager wants to archive the data. She wants only 6 months of data in main table and rest of the data should be archived to respective archive tables.
Can anyone explain me how to archive the data in a table and what are the methods in archiving the data. Please suggest me along within an example and also explain me how to access that data which is archived later point of time when all my procedures initially referring to the main table.
Thanks in advance
sai
August 22, 2008 at 6:31 pm
This sounds a bit like a generic homework assignment ..... so here's a generic answer
1) Copy records older than 6 months into archive tables.
insert into Archive_table select * from Prod_table
where Archive_date < DATEADD(MONTH, -6, (getdate()))
2) Delete records older than 6 months from production tables.
delete from Prod_table
where Archive_date < DATEADD(MONTH, -6, (getdate()))
3) Change your stored procedures to point to the archive tables.
August 24, 2008 at 11:58 am
You may want to research table partitioning.
Table partitioning allows for the fastest and cleanest way to achive data archiving and purging.
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.August 24, 2008 at 4:41 pm
Sai,
Once you get the intial archive done, how many rows would need to be archived on a daily basis? Also, please post the CREATE TABLE statement for the table involved as well as what the indexes are... you'll get a much better answer. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
August 25, 2008 at 7:52 am
PaulB (8/24/2008)
You may want to research table partitioning.Table partitioning allows for the fastest and cleanest way to achive data archiving and purging.
Sorry, I am curious how the table partitioning affect the database, if the database is running 24/7. I found all the example in MSDN is cold archiving, I once tried to use the hot archiving (dynamic queries for table partitioning to let it running weekly/bi-weekly/monthly) but the table still get locked, thought the partition switching is pretty fast.
August 25, 2008 at 7:58 am
You can go for the sliding window partitioning. See the whitepaper by Kimberly Tripp on this - http://msdn.microsoft.com/en-us/library/ms345146(SQL.90).aspx
August 25, 2008 at 8:10 am
Yes, I have gone thru' that. That's what I meant cold archiving (predefined values). I have tried to do something like this:
SET @SQLString = N'ALTER PARTITION SCHEME [' + @archiveScheme + '] NEXT USED [FileGroup_MDPTSP]'
...
SET @SQLString = N'ALTER PARTITION FUNCTION ' + @archiveFunction + '()
SPLIT RANGE ('''
SET @strTEMP = CONVERT(NCHAR(8), DATEADD(day, 7, @dtStart), 112) -- +7 next partition
SET @SQLString = @SQLString + @strTEMP + ''')'
..
SET @SQLString = N'ALTER TABLE ' + @factTable + ' Switch Partition 1
TO ' + @archiveTable + ' Partition 2 '
...
SET @SQLString = N'ALTER PARTITION FUNCTION ' + @factFunction + '()
SPLIT RANGE ('''
SET @strTEMP = CONVERT(NCHAR(8), DATEADD(day, (7 + 1) , @dtEnd), 112) -- +1 bcoz of this is executre 2nd last day
SET @SQLString = @SQLString + @strTEMP + ''')'
in order let it work dynamically. Is that way right?
August 25, 2008 at 10:19 pm
If the clustered key is set by date or, perhaps, identity and the identity column is indicative of entry order, then sectional deletes (or insert to archive, delete from source) of, say, 20,000 rows at a time will work just fine.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 26, 2008 at 9:20 am
Jeff Moden (8/25/2008)
If the clustered key is set by date or, perhaps, identity and the identity column is indicative of entry order, then sectional deletes (or insert to archive, delete from source) of, say, 20,000 rows at a time will work just fine.
Hi Jeff, are you answered to my question? 20,000 rows ? what about 200, 000 or even 2,000,000 rows?
August 26, 2008 at 6:21 pm
gan (8/26/2008)
Jeff Moden (8/25/2008)
If the clustered key is set by date or, perhaps, identity and the identity column is indicative of entry order, then sectional deletes (or insert to archive, delete from source) of, say, 20,000 rows at a time will work just fine.Hi Jeff, are you answered to my question? 20,000 rows ? what about 200, 000 or even 2,000,000 rows?
Yep... answering your question and the OPs question... if the system is a 24/7 operation and the table you want to archive is in use, then trust me and do it 20,000 rows at a time. Or, not... simple delete with very few indexes runs very fast and will probably delete 2 million rows, no problem. Oh, but there's that 24 hour thingy getting in the way and since the table is probably indexed for use and maybe has a clustered index that isn't exactly chronilogical, it could take several hours to delete just 2 million rows. Only way to know for sure is to try it... run the delete for 2 million rows... if it takes more than about 10 minutes, you should probably stop it and rollback. If it takes less than 10 minutes, then problem solved. By the way, there will be some serious blocking on the table during the 2 million row delete... I wouldn't try it on a real busy 24/7 system that has short SLA requirements. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
August 26, 2008 at 11:26 pm
that mean I need to go thru' trial-and-error? Thanks anyway
August 27, 2008 at 6:35 pm
Heh... no... trust me... I just got done reducing/archiving a 131 million row table to a 44 million row table using the 20k method I spoke of... on a 24/7 system. 😉 I don't make recommendations unless I've actually done them (and the related trial and error).
--Jeff Moden
Change is inevitable... Change for the better is not.
August 27, 2008 at 10:42 pm
Great, you looks confident! how you do partitioning switching? dynamic queries? manual? how often? weekly? monthly?
using the 20k method? what 20k method?
August 28, 2008 at 6:57 pm
Depends on what your partitions are based on. Mine are based on a "CampaignID"... any campaign that hasn't had a new row added to it in 90 days gets archived. I just do a select/group by on the CampaignID and the Max LoadDate vs GETDATE() and loop through the campaigns to do the Insert/Delete's. It's actually a double loop... inside loop does the archival insert/delete 20K rows at a time as previously determined by divided the number of rows present for that campaign by 20k.
I can't use a partitioned view to do it because I need to log which underlying table the campaign was in for possible restore processes. As you suggest, a bit of dynamic SQL takes care of all that.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 28, 2008 at 7:02 pm
Almost forgot... at my previous job, the archive process ran nightly. At my current job, it runs on Monday and Thursday nights. First run may take a day or more to complete, but it's online all the time. After that, the daily jobs (same as the first job) are a piece of cake and may only take a couple of seconds to execute depending, of course, how many rows slip past your cutoff date on any given day.
Some folks have suggested that such a thing be built into a trigger... past experience tells me that will eventually turn out to be a source of deadlocks on high volume systems.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply