Database Backup and corruption Jobs

  • Hi,

    Please advise of the below scenarios

    1Q) I am trying to reduce the sql agent back up job running time. The job pickups a SSIS package which basically takes full backup of all user db's; which is taking 7+ hours to succeed  ( Due to the fact that 2 of the databases are of 4 tb and 17 tb in size). So What could be some of the things you would do or advise to reduce the time. Now, I have looked into things like no tempdb or unneceesary stuff being backed up however the DB's are growing and it will take more and more time so planning ahead for future.

    2Q) There is no corruption check in the database level like no DBCC checkdb job running and the only time it is being checked is when you take a backup it has an option to VERIFYthe backup integrity ( kind of corruption check). Now, I am worried if this practice is good to avoid few blocking or locking issues if we run the job with DBCC check code considering the fact that you are dealing with TB size of db's which are busy with reads and writes most of the time.   Please share your thoughts. One of my thoughts were corruption spreads so it is better to catch it asap however have different opinions on the table.

    Thanks!

    • This topic was modified 5 years, 2 months ago by  sizal0234.
    • This topic was modified 5 years, 2 months ago by  sizal0234.
    1. There are a few things. First, you can parallelize the operations, so that the database backups run concurrently. Second, you can parallelize the writes by sending the backup to multiple files for the backup. You want to separate these to different physical devices where possible to speed up backups. Note, you need all files for a restore. Three, improve the IO subsystem for backups.
    2. You misunderstand the verifyonly option. This ensures the file is intact. This does not check for corruption inside the pages of the database. You can have corruption in the db being written to the backup file for months and only find out when you access a corrupt page. This can certainly be a problem in large databases where pages might not be accessed often.

    To catch corruption, you need to run dbcc. Since corruption can flow through backup files, you can run this on another system. Restore the db and run dbcc there. Not perfect, but this tests two things: restores and dbcc. IF you do this regularly, you can catch corruption sooner.

     

  • Backup compression is something else you may want to consider.

    Sue

  • Thank you! for the guidance. It helps.

  • The actual backup may not be taking 7+ hours. If they have Verify backup enabled, then SQL does a "logical" restore, reading through the whole backup and checking each page. If you do an sp_who2 during the verify phase, you will see a database restore running, and yes I got a fright when I saw this for the first time. I recommend disabling the verify backup, and rather doing automated test restores to another server on some sort of scheduled basis. This alone may cut your backup time in half, and reduce the load on your production server and on the IO infrastructure for that server.

    As Sue has said, make sure you are using Backup Compression, in which case there is no benefit in using multiple files, because Compression uses multiple files under the hood. It is worth while looking to do a combination of FULL and Differential backups. Our clients with big databases do a full backup over the weekend, then differentials Monday - Saturday. (I'm assuming you have log backups in between, or the DBs are in SIMPLE mode.)  A bit of coordination may be required, say the 4TB has a full on Sat, and the 17TB has a full on Sun, DIFF for the rest of the time. You can get smart if you have different file groups and do file group backups. This is specially handy if some of the file groups are Read Only, but this may well need a lot of reconfiguration of the DB. You would need to do a bit of research and testing to get the most out of this.

    Consider splitting the DBCC CHECKDB into it's subsections: CHECK TABLE on each table (maybe in subsets over a week) then Check Alloc and Check Physical over the weekend. This way you cover everything the CHECKDB does, at least once a week. Alternately you can run the CHECKDB on the database using the regular restores that verify your backups. Just be careful, you may be subject to licences on the SQL server you are using for the restores and CHECHDB.

    Another idea is to use AoHA, and do some of this against your readable secondary, again, this will incur a license cost. You can do backups and DBCC checks against the secondary. AoHA also gives Automatic Page Repair, so it's possible a corrupt page will never cause a problem. Be aware that the DBCC check on the secondary DOESN'T say the primary isn't corrupt, it only tells you that you have a healthy secondary you can fail over to in the event of a DBCC error on the primary.

    Leo
    Nothing in life is ever so complicated that with a little work it can't be made more complicated.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply