August 25, 2017 at 3:59 am
Hello,
Like many organisations, we run DBCC CHECKDB on a weekly basis. However it is such a resouce drain that it can and does impact some of our applications. After reading the top resources on optimising performance of DBCC CHECKDB (Paul Randal, Aaron Bertrand, Bob Ward, Jonathan Kehayias) I find the idea of restoring a full backup to another server and running CHECKDB there very appealing. However, we have a couple of hundred production servers spread over several global regions. Building an array of servers to restore backups to and subsequently run DBCC CHECKDB would crash and burn in our infrastructure team.
I need to be able to do this on as few servers as possible. How would I go about determining how much capacity (# of servers, CPU, memory, storage) I would require?
Another obstacle is the way we perform full backups. We have a mixture of direct-to-tape and direct to disk. Another team look after this and they are the ones who control the movement of those files.
On a simple high level, I envisage the process would work as follows:
Backup copied to CHECKDB server. (how to automate this I do not know)
DB restored.
Backup deleted.
DBCC CHECKDB executed in TABLOCK mode and with trace flags 2549, 2562
Errors found generate alerts
DB is dropped
Trigger next backup is copied and repeat above...
Theoretically this process would run continuously and would take a week or so to get through every database.
When you consider how many databases are involved and that the backups are either on local disk or a tape somewhere, this now feels like a logistical nightmare and I have no idea where to start.
I'm hoping that someone has overcame this before and can point me in the right direction. Maybe a case-study or documentation etc. I couldn't find any article/blog etc online.
Thank you in advance for your helpful responses.
kindest regards
Paul
August 25, 2017 at 4:14 am
Paul
Do you already have something set up that goes to all servers and gathers information about the most recent backup of each database? If not, that would be a good place to start. It would mean information on each backup is all in one place, making it much easier to automate the restores. Once you've got that in place, choose ten databases (say) at random, or the top ten from the list that haven't been done yet. Then, assuming you have enough room on your server, restore them one at a time. As soon as the first restore has finished, start the DBCC check on it. That will mean you're running DBCC checks in parallel with restores, which should make the whole thing run a bit faster. I've set something like this up myself before. It's a lot of work, but it's worth it, since you're testing you can restore your backups and taking load off your live servers for DBCC checks. It also means you can check that all databases have been backed up, which is much thorough than just checking for job failures. Good luck!
John
August 25, 2017 at 4:38 am
Thanks John!
Yes, I have the ability to establish the most recent backup details of all the databases in a particular region.
Testing the restores was also at the forefront of my mind as well. A huge benefit.
The first issue is that all those backup files are remote. Residing on each SQL Server or worse, on a tape somewhere. Sadly the majority are on tape. For the server ones, I'm thinking I could build some robocopy command based on the file directory location, then go from there. For the tape backups, I could request they are restored (not copied onto the server for me to restore). But I have a feeling if I ask them to restore every database (oh and by the way do them one at a time,wait for my signal to start the next) they will tell me to go away in a very impolite manner. Don't blame them really!
Maybe this is just too impractical in the real world.
August 25, 2017 at 4:46 am
I would just restore the databases from where the backups are. You may need to tweak some NTFS permissions, and maybe change some drive letters to UNCs, but it'll cut one step out of your process. I've never worked with tape backups, but surely you can see the tape backups made on one server from another server?
John
August 25, 2017 at 6:42 am
Considering the amount of mixed technology you're dealing with, I don't have a simple solution for you. The one thing I can add to the conversation, I would absolutely do all this with PowerShell if I had to do it. Just thinking about dealing with the direct to tape backups (and there's a technology that can't die fast enough) means you have to have a lot of control mechanisms which are way beyond the capability of T-SQL. Unfortunately, that's about all I can add. Other than that, approach it like eating the elephant, one bite at a time.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
August 25, 2017 at 7:25 am
Thank you for the helpful responses.
I am now reading through the tape backup software documentation on how to perform a restore directly from tape using the command line.
Will update here with anything interesting or indeed the solution.
August 25, 2017 at 9:00 am
I did something nearly similar to this for one of my previous employers and managed to for about 4 years especially for sharepoint and some financial apps elbeit support by good infrastructure budget. It was not this complex as all the backups initially were done on the storage from where the batch programs do the copy (before purge) to the restore-servers and then the restore jobs would pick them up and compete the actual restore and remove the old DB on each. I used the xcopy and it worked well.Configuration of DBMail on all of them would make it more manageable and organized.
I advise to look for an SQL-aware tool in ur case.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply