In my experience as a developer and then as a database administrator, one thing that seems to get overlooked when designing a new application or database is a data retention policy. I have seen this with multiple DBMSs (SQL Server, DB2, and MySQL). Most people are more concerned with getting data into the database than removing it at some future date. This leads to data bloat and all the issues that are associated with that.
- Slow / Large Backups
- Slow Restores
- Application lag
When I look at a database that is having this type of issue, I normally find some tables used for logging or error messages that have many years' worth of data in them. This data is completely useless due to its age but it has never been cleaned out. In one example the database was 105 GB and one logging table was 95 GB. So 90% of the data in that database was completely useless. By purging all but the last 3 months of data from that table, we were able to shrink the overall size to about 12 GB. I then created a job that would delete data more than 90 days old and set it to run weekly. This had a huge impact on backups and restores and the application started responding faster due to it not having to deal with all that extra data.
When you are planning and designing a new application or database, think about what data you will need and what data you will not. Core data like purchase orders or transactions might need to be kept for a long time due to legal or company policy. In this case, think about how you can archive the data so that it will not bog down your application in a few years. Mark tables used for logging or errors and have a plan in place from day one to only keep so many days or months worth of data.
The moral of this story is to always remember that at some point you will have to decide what to keep and what to save. This decision should be part of the design process from the start and not an emergency call at 4:00 am.