This post is part of the SQL Community Project #DBAJumpStart by John Sansom. As I mentioned previously, this post has been split into 3 more manageable chunks as I know you’re all busy people. Today heralds the third and final instalment of the mini series, you can find part one here and part two here.
“If you could give a DBA just one piece of advice, what would it be?”
Last year John asked 19 successful and experienced SQL Server professionals and me (for some baffling reason) this exact question. Due to some technical problems and a number of time constraints this post is a smidgen later than some of the other collaborators, but better late than never. In the post below I share my own thoughts with you don’t forget that you can find all our collective wisdom inside DBA JumpStart, a unique collection of inspiring content just for SQL Server DBAs. Be sure to get your free copy of DBA JumpStart today!
Define your DR plan.
Your company’s data is it’s lifeblood. Don’t let HR fool you people aren’t what makes a company successful, protecting it’s data is. Microsoft and Apple are both still here despite the departures of Bill Gates and Steve Jobs, but if one of these companies had lost the database with all their financial information and had no disaster recovery process then it would be a much different story.
Nearly every day i hear customers talking about their backup strategy and I have to correct their thinking. Having a backup strategy is not the same as having a restore strategy. If you put the process of backing up your data in front of how you are going to retrieve your data in a disaster scenario then you will come up short. At this point you’ll wish you had read my section on creating documentation and having an up to date CV/resume. As a vigilant DBA it’s really important that you think about how you are going to restore your databases and how long that will take and the point in time you will be able to recover to rather than how much data you can afford to store on your servers.
The week before the PASS summit I recorded a session called “I got 99 problems but my backups aint one” for the PASS Virtual DBA Chapter. This section covers a few of the core messages I mention in that webcast.
When planning your restore strategy there are two main things to consider, these are:
- Recovery Point Objective (RPO)
- Recovery Time Objective (RTO)
Recovery Point Objective (RPO) is the amount of data you can afford to use. As a vigilant DBA I will ask the business owners of the databases I run for them. This is an important distinction, I do not own these databases I maintain and protect these for other people. I as a vigilant DBA am providing a service to other people within and potentially outside of my organisation. When I receive a new database I need to know who owns this from a business perspective. Once I know that I will book a meeting with that person to ask them about their data and how important it is. Only at that point can I decide which types of backups and how frequently those backups can be performed.
It is not uncommon for databases to be backed up using a full backup in the simple recovery model, but what if your business owner cannot afford to lose up to 24 hours worth of data? Are they even aware of that risk?
Recovery Time Objective (RTO) is the amount of time you will have to restore your data to a usable state. Again, I have seen some horrible “backup plans” these include monthly full backups with transaction log backups every 30 mins. The reply I hear is, but we have a 30 minute RPO, we’ll smash that. That may well be (as long as your log chain hasn’t been broken) but how long could it take to redo and undo transactions in the recovery process for up to one months worth of data? I don’t mind saying that is not a job I would fancy doing!
We as a collective unit of database professionals need to be smarter and take a wider view. Yes I am covered by backing things up in this fashion. Is that really the right solution though?
Options you can use to make your backup, and subsequently restores faster. These include, but aren’t limited to:
- Writing to multiple backup devices
- Increasing the BufferCount parameter.
- Increasing the MaxTransferSize parameter.
- Use compression
By writing to multiple backup devices at once you can actually write data away in parallel, this means we can write to multiple blocks via multiple disk spindles at the same time. This makes the write process much faster, to aid this further by using compression (2008 Enterprise, 2008R2 and on Standard edition. N.B. there are third party applications that will support previous versions and provide more than one level of compression) you are writing less information away further reducing the traditional write bottleneck. Interestingly compression isn’t enabled by default, so make sure that you test the extra CPU usage won’t cause you any problems and then enable it.
BOL defines BufferCount as “Specifies the total number of I/O buffers to be used for the backup operation.” and MaxTransferSize as “Specifies the largest unit of transfer in bytes to be used between SQL Server and the backup media.” In layman’s terms. You can tune these parameters to make your backups and restores much much faster. There is of course a trade off.
Increasing the values for these values will consume extra memory during the backup / restore process. The calculation is pretty straightforward though it’s BufferCount * MaxTransferSize. One thing you may not be aware of is that this are of memory is outside of the memory you can configure in SQL Server, we used to call this area MemToLeave. If you haven’t already set your maximum memory configuration setting then you really should look at it. I typically recommend that people read the memory chapter from SQL Server 2008 Internals and Troubleshooting by Christian Bolton. Christian is the lead author of the book and the author of that particular chapter, it will show you how to work out how you should set the maximum memory option for your environments.
Baselining, you thought we had already covered this didn’t you? Well yes I had, but we are performance tuning and we can’t tune anything without a baseline! We need to know how things are working now now so we can tune them. Microsoft doesn’t display the parameter values it uses for the BufferCount and MaxTransferSize . You can however see this information if you use a couple of trace flags. Trace flag 3213 will provide the values used, these need to be used in tandem with either trace flag 3604 (to screen) or 3605 (to log).
Summary
All of the points that I wanted to bring up:
- Automating regular tasks
- Baselining
- Documenting
- Define your DR plan.
are all ways of bringing to light some of the skills you need to be successful in your job as guardian of your company’s data. Being vigilant means that you know you need to use these skills and will use them to cater for all eventualities.
Hopefully you have found this series useful, I’d like to thank you for putting aside the time to read it and would love to hear your comments.