Sooner or later every DBA needs to purge old data from database tables. And it often happens that such tables are large enough. This can be done using table partitioning. Such approach is used rarely but in case of really large table it is very useful. In this article we`ll show an example of data purging based on partitioning from real life experience.
So we had a table with application logs in one of our client`s production databases. It was about 80 millions rows in it. We needed to leave only actual data in this table and moreover this table had to be available for inserts during purging. Size of actual data was 10 millions rows. Of course we could have setup a job which would delete rows from the table in small chunks in a loop. But it can be time consuming and would place locks on the table. So we decided to go with partitioning.
Here is a high level plan:
1. Rename current logs table.
2. Create a new table named like the old one.
3. Create partitions with old data and actual data in renamed table.
4. Add partition with actual data to the log table.
Here are some challenges we had to overcome:
- You can add partitions to the table only if the table is empty.
- Creating partitions on the table with millions of rows may take some time and table may get filled.
Here is the workaround. We renamed log table two times. We did it second time after creating the partitions. So here is how it looks like.
- Log table is renamed
- New log table is created
- Partitions are added to the table
- The data is copied from second table which was renamed to the new table. There were collected a few thousands records in it during the time of partition creation.
Let`s get to the code.
Assume we have a table listed below
CREATE TABLE [dbo].[AppLogs](
[LogEvent] [nvarchar](50) NULL,
127.0.0.1 [nvarchar](50) NULL,
[the_date] [datetime] NULL,
[Quantity] [nvarchar](100) NULL,
[Id] [nvarchar](100) NULL,
[ppc] [nvarchar](100) NULL
)
1. Rename the table
GO
EXEC sp_rename 'dbo.AppLogs', 'AppLogs1';
GO
2. Create new table with the same structure
CREATE TABLE [dbo].[AppLogs](
[LogEvent] [nvarchar](50) NULL,
127.0.0.1 [nvarchar](50) NULL,
[the_date] [datetime] NULL,
[Quantity] [nvarchar](100) NULL,
[Id] [nvarchar](100) NULL,
[ppc] [nvarchar](100) NULL
)
3. Create partition function, partition schema and partitions.
BEGIN TRAN
CREATE PARTITION FUNCTION[PartitionFuncAppLogs](datetime)
AS RANGE RIGHT FOR VALUES (N'2013-01-01T00:00:00.001')
CREATE PARTITION SCHEME [PartitionSchemaAppLogs]
AS PARTITION[PartitionFuncAppLogs] TO ([PRIMARY], [PRIMARY])
CREATE CLUSTERED INDEX[CX_PartiionSchema] ON[dbo].[AppLogs1]
(
[the_date]
)WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PartitionSchemaAppLogs]([the_date])
DROP INDEX[CX_PartiionSchema] ON[dbo].[AppLogs1] WITH ( ONLINE = OFF )
COMMIT TRAN
4. Add partition to the new logs table.
ALTER TABLE dbo.AppLogs1 SWITCH PARTITION 2 TO dbo.AppLogs
In current code sample we omit the steps of second rename of the table and recreation of the log table. They are similar to points 1 and 2. Also it is interesting to mention that there were transactions waiting to be inserted to the logs table at that moment. After creation of the tables there were about 100 rows of new data. Creation and adding of partition takes a few milliseconds.
Here is how to check which partition stores the data
SELECT $PARTITION.PartitionFuncAppLogs ('2013-01-01T08:22:07.345')
Using this approach we get following benefits
– Table was available, no down time..
– Data is separated
– Time taken is not much
A few notes
– Partitioning is available only in Enterprise edition of SQL Server
– At the moment of creation of a new table and adding a partition there can data insertion running on the table. In such case it will not be possible to add a partition. You will have to repeat this step again.
– This approach may not succeed in case of some specific data schema.
Good luck with your table partitioning.