As a DBA, we are often tasked with maintenance work on the database. Often this includes making sure that old data is purged from the tables. This is especially true with logging tables. Sometimes multiple applications write to the same logging tables, and they can grow very quickly.
Let’s say we have a logging table that has not been pruned recently or maybe, ever. We need to delete any data that is older than 60 days. We analyze the table and find there are 30 million records that need to be deleted. What is the best way? One way is to just write a delete statement like
DELETE FROM Table1 WHERE MyDate < getdate() – 60;
This will do the job, but there are some big issues with this method.
- The delete will take a long time. SQL Server will wrap the operation in a transaction so it can do a rollback if there are errors. This means you are deleting from the table and writing to the transaction log, which will increase its size and might cause an out of disk space error. Also, if something does happen and SQL Server must do a rollback, the rollback process can take longer than the original operation ran. If the error happened 45 minutes after the delete started it could take up to 90 minutes to rollback (depending on how your system is set up).
- During the delete, there is a good chance you will lock the table. And if a rollback occurs then that chance goes up. If you lock the table that all the applications log to, you could bring the entire system to a standstill.
- Large deletes use a lot of resources, especially IO. This can slow down the system and cause lag issues with the application, which may be unacceptable.
So, how should we delete large amounts of data? I find the best way is to delete it in manageable chunks.
First, look at the column you are using in your where clause for the delete (let’s say it is named CreatedDate). Is it indexed? If not, then you will most likely end up doing table scans which we all want to avoid. If you can index the column, that would be best, but many times you won’t be able to due to various reasons. I would try doing a small delete based on that column and look at the execution plan to see what is going on and to get a feel for how long it takes.
Delete Top (10000) From TableA Where CreatedDate < GetDate() – 60;
Looking at this we see it does do a table scan and it takes 45 seconds to run. That is not acceptable. So, we find the tables primary key. In this case let’s say it is ID. Try using that in conjunction with the date column.
Delete From TableA where ID IN ( Select Top 10000 ID From TableA Where CreatedDate < GetDate() – 60 );
This allows you to delete using the primary key which will be more efficient. This method takes 18 seconds to run. A lot faster than 45 seconds but still not quick for 10,000 rows.
Now I am going to do some testing with various row counts.
- 20,000 rows take 35 seconds.
- 10,000 rows take 18 seconds.
- 5,000 rows take 4 seconds.
5,000 is the sweet spot is seems. If I delete 10,000 rows it takes 18 seconds but if I delete 10,000 rows by doing two 5,000 row deletes it takes 8 seconds. In this case doing a smaller delete multiple times is faster than doing a larger delete one time.
At this rate, it will take 6.6 hours to delete 30,000,000 rows. That is not great, but it is not horrible either. You can set up a loop to run the 5,000 row delete until it is complete. You could also set up a cursor, but I hate cursors and don’t believe they should be used.
This allows you to avoid large transactions (and rollbacks if necessary), it should not lock the table, and it spreads the IO out over the 6 hours it takes to run so it is not just one giant hit. Also, it allows you to use the delete process you just created in a SQL Agent job that can run the daily so you always have just 21 days of data going forward.
When you must delete a large amount of data, there are multiple ways to do it. You just want to be sure that whatever method you use does not cause more harm than good. Avoid large transactions that can lock the table and take forever to complete. Also, don’t blow up the transaction logs and cause an out of disk space outage.