Archiving Data

Question

Post reply

Archiving Data

durai nagarajan

SSCoach

Points: 15778
More actions
February 2, 2014 at 10:14 pm

#287887

Hello Experts,
we have a situation he a table grown to 180+ Gigs with several millions of records. this affects Re Indexing - online maintanence job weekly due to load everytime. we are planning to move old data to a archive table.
Suggest a best method to proceed. we are planning to move based on date which is not in any of the index (and stored as int column )and we cant modify as it is a third party application.
we have come up with this idea
plan 1 - Rename the table and move one month data from archive to production table.
plan 2 - Move one month data in several chunks to archive table.
post this we are planning to schedule weekly archive of the table.
if you have better plan also please suggest.
Regards
Durai Nagarajan

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 1

I have lots of suggestions but "It Depends" on what's what.

1. How many months/GB of data do you want to keep active at any time.

2. What is the format of the date in the INT column? Is it (hopefully) YYYYMMDD?

3. Is the table referenced by FKs from any other table?

4. Does the table have an indexed view on it?

5. What is the PK of this table and does it really have to be unique or is this more like an audit table (which you seem to indicate)? Yeah, I know. Strange question but it's actually very important.

6. Are there any other unique indexes that actually have to be unique?

7. Which edition of SQL Server do you have? Standard Edition or Enterprise Edition?

8. How long does it take to delete a month of information from the table or haven't you tried that yet?

9. How many rows are in the largest month, so far?

10. Do the older months need to be updateable or is this truly and audit or history table of sorts?

11. Can we change the indexes on the table?

I need answers to all of those questions before I can make a recommendation.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

durai nagarajan SSCoach Points: 15778 More actions · Answer 2

1. How many months/GB of data do you want to keep active at any time.

1 month

2. What is the format of the date in the INT column? Is it (hopefully) YYYYMMDD?

like this DATEDIFF(SECOND,{d '1970-01-01'}, '04/1/2013') with '04/1/2013' beign today's date

3. Is the table referenced by FKs from any other table?

No

4. Does the table have an indexed view on it?

No

5. What is the PK of this table and does it really have to be unique or is this more like an audit table (which you seem to indicate)? Yeah, I know. Strange question but it's actually very important.

ID an int column but not identity

6. Are there any other unique indexes that actually have to be unique?

No

7. Which edition of SQL Server do you have? Standard Edition or Enterprise Edition?

Enterprise

8. How long does it take to delete a month of information from the table or haven't you tried that yet?

Haven't tried yet , we have requested for test server.

9. How many rows are in the largest month, so far?

more than 1.8 million

10. Do the older months need to be updateable or is this truly and audit or history table of sorts?

This is a mail audit

11. Can we change the indexes on the table?

No

Regards
Durai Nagarajan

MyDoggieJessie SSC-Forever Points: 44486 More actions · Answer 3

Based upon your answers to Jeff's (excellent) questions, your first option sounds sufficient: Rename the table and move 1 month of data into the production table. Once complete. You can drop all the non clustered indexes from the archive table, making sure you've created them on the new table first.

Once you've verified you have the proper records in the production table, you can remove them from the archive table, then create a simple script to run every hour/day/week/etc - as an example:DECLARE @ArchiveDate datetime,

@BatchSize int,

@Error int = 0,

@Err varchar(15),

@ErrMsg varchar(250),

@Env varchar(65),

@EmailBody nvarchar(750),

@EmailSubject nvarchar(140)

BEGIN

SELECT @BatchSize = 1000, @Env = @@SERVERNAME, @ArchiveDate = (SELECT DATEADD(MONTH , 1,GETDATE()))

END

BEGIN

BEGIN TRY

BEGIN TRANSACTION

/* Create holding Table */

IF EXISTS(SELECT * FROM tempdb.dbo.sysobjects WHERE ID = OBJECT_ID(N'tempdb..#Temp'))

BEGIN

DROP TABLE [dbo].[#Temp]

END

/* Get X number of records to Archive */

CREATE TABLE #Temp (ID bigint)

INSERT INTO #Temp

SELECT TOP (@BatchSize) ID

FROM [ProductionDatabase].[schema].[TableName] WITH(READUNCOMMITTED) --In OLTP Environment, avoid potential table lock

WHERE <<Column to filter on>> < @ArchiveDate

/* Archive the data to the storage table */

INSERT INTO [ArchiveDatabase].[schema].[TableName] (col1, col2, col3...)

SELECT TOP (@BatchSize) col1, col2, col3...

FROM [ProductionDatabase].[schema].[TableName] MyData WITH(READUNCOMMITTED) --In OLTP Environment, avoid potential table lock

INNER JOIN #Temp T ON MyData.ID = T.ID

SELECT @Error = @Error + @@ERROR

/* Delete the data from the Production table */

DELETE MyData FROM [ProductionDatabase].[schema].[TableName] MyData

INNER JOIN #Temp T ON MyData.ID = T.ID

SELECT @Error = @Error + @@ERROR

IF @Error = 0

BEGIN

COMMIT TRANSACTION

END

ELSE

BEGIN

ROLLBACK TRANSACTION

END

END TRY

BEGIN CATCH

SELECT @Err = ERROR_NUMBER(), @ErrMsg = ERROR_MESSAGE()

BEGIN

ROLLBACK TRANSACTION

SET @EmailSubject = '' + @Env + ' :: DAILY DATA PRUNING FAILURE (DATABASE) dbo.MyTable : '+ CONVERT(VARCHAR(19),GETDATE(),121)

SET @EmailBody =

'********************************************************************** ' + CHAR(13)

+ 'ERROR ' + @ErrMsg + CHAR(13)

+ '**********************************************************************' + CHAR(13)

/* Send Email */

EXECUTE msdb.dbo.sp_send_dbmail @recipients = 'someonewhoneedstoknow@company.com'

,@subject=@EmailSubject

,@body = @emailBody

,@body_format = 'TEXT'

,@sensitivity = 'Confidential'

,@importance='High'

END

END CATCH

END

______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

MyDoggieJessie SSC-Forever Points: 44486 More actions · Answer 4

Note: I threw that code together as a quick example only and most like will need tweaking so I don't advise running it "as-is" without verifying it beforehand

______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 5

durai nagarajan (2/3/2014)
1. How many months/GB of data do you want to keep active at any time.
1 month
2. What is the format of the date in the INT column? Is it (hopefully) YYYYMMDD?
like this DATEDIFF(SECOND,{d '1970-01-01'}, '04/1/2013') with '04/1/2013' beign today's date
3. Is the table referenced by FKs from any other table?
No
4. Does the table have an indexed view on it?
No
5. What is the PK of this table and does it really have to be unique or is this more like an audit table (which you seem to indicate)? Yeah, I know. Strange question but it's actually very important.
ID an int column but not identity
6. Are there any other unique indexes that actually have to be unique?
No
7. Which edition of SQL Server do you have? Standard Edition or Enterprise Edition?
Enterprise
8. How long does it take to delete a month of information from the table or haven't you tried that yet?
Haven't tried yet , we have requested for test server.
9. How many rows are in the largest month, so far?
more than 1.8 million
10. Do the older months need to be updateable or is this truly and audit or history table of sorts?
This is a mail audit
11. Can we change the indexes on the table?
No

Based (especially) on the first question, I don't see any need for special tricks here.

For the first take, I'd just copy the last 30 days of data into another table (should take less than a minute especially if you use SELECT/INTO to both create and populate the new table), rename the original table as an archive table and rename the new table as the original name. The SELECT/INTO should have an ORDER BY on it for the date column and use OPTION(MAXDOP 1) if you have no intention of adding a clustered index by date.

Once a day, run a job that copies the rows that are 30 days or older to your archive table (making sure you don't insert dupes and then delete those same rows from the active table. That won't take long because, even in a HEAP, the rows to be deleted will mostly all be together at the logical beginning of the table and shouldn't cause a full table lock. That means that inserts will be able to continue at the logical end of the table. On the average, you'd only be deleting something less than 60,000 rows per day. That should take about 6 seconds because you have no FK's and, I assume, relatively few indexes on the table.

Some will suggest the ardure of partitioning for this particular table but I don't believe it'll be worth all that because you're only going to keep 1.8 million rows, which is comparatively small for today's machines and databases.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

durai nagarajan SSCoach Points: 15778 More actions · Answer 6

Thanks guys for your input

They have decided to keep 3 months data so we are planning to move the current 3 months data to new table and drop the existing and rename the new table.

sorrry for the delay.

Regards
Durai Nagarajan

free_mascot One Orange Chip Points: 27169 More actions · Answer 7

Jeff, love to read the questioner you have asked? This reflect the experience you are carrying. Simply superb!!!

Cheers,

FM

---------------------------------------------------
"Thare are only 10 types of people in the world:
Those who understand binary, and those who don't."

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 8

durai nagarajan (2/11/2014)
Thanks guys for your input
They have decided to keep 3 months data so we are planning to move the current 3 months data to new table and drop the existing and rename the new table.
sorrry for the delay.

Believe it or not, that's should be just the first step. If you're always going to keep 3 months worth of data online, you have to start asking questions like whether or not that means just the last 90 so days with nightly rolloffs (archive/delete) data older than that or if you want to keep 3 months previous plus the current active month, etc. If you want to rolloff entire months instead of just a day at a time because of "Plan B", then you might want to consider Partitioned Tables and the "3 month sliding" window if your DELETEs take too long.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 9

free_mascot (2/12/2014)
Jeff, love to read the questioner you have asked? This reflect the experience you are carrying. Simply superb!!!
Cheers,
FM

Thanks for the feedback, FM. I'm actually smack dab in the middle of setting up partitioning on a 27GB table and a 300GB table and just completed building a presentation on the subject for my local PASS group. The questions I asked are based on all the workarounds I had to do to do it all the right way instead of just throwing partitioning at the table. LOTS of caveats if you want to do it the "right way" for Audit tables of that size especially trying to figure out how to NOT make a whole bunch of file space "Read Only" for each partition.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)