What does your backup process look like?

  • Hi,

    I'd be interested to hear how most of you have your backups set up. I know a lot of it depends on RPO/RTO etc. but I've just implemented Always On Availability Groups and I think it's a good time to re-visit everything related to backups. I currently have three backup jobs running on the primary (using Ola Hallengren's scripts):

    1. Database Maintenance - HOURLY - Transaction Log backups.

    2. Database Maintenance - DAILY - Full backup of system databases, differential backup of user databases.

    3. Database Maintenance - WEEKLY - Cleanup, user and system database integrity checks, index and statistics maintenance, and full backups.
    StepNo StepName
    1    sp_delete_backuphistory
    2    sp_purge_jobhistory
    3    CommandLog Cleanup
    4    Output File Cleanup
    5    DatabaseIntegrityCheck - USER_DATABASES
    6    IndexOptimize - USER_DATABASES
    7    DatabaseBackup - USER_DATABASES - FULL
    8    DatabaseIntegrityCheck - SYSTEM_DATABASES
    9    DatabaseBackup - SYSTEM_DATABASES - FULL

    I know with moving the backups to the secondary, we won't be doing DIFFs anymore. So, aside from removing that step from the daily backups, are there any other changes that you would make to this configuration? Also, do most of you compress your backups?

    Thanks in advance. I look forward to hearing your thoughts.

    Mike

    Mike Scalise, PMP
    https://www.michaelscalise.com

  • Hi Mike,

    I personally would increase integrity checks for all databases to daily if possible and add restore validation.  By restore validation I mean automating the recovery of every database with the integrity checks.  I have a process that does this every night for all databases from all servers.  This video from Grant Fritchey always stands out in my mind - https://www.youtube.com/watch?v=Ah0jabU9G8o

    Good job refining your process.  It's great to be re-thinking things regularly.

  • heb1014 - Monday, July 30, 2018 2:34 PM

    Hi Mike,

    I personally would increase integrity checks for all databases to daily if possible and add restore validation.  By restore validation I mean automating the recovery of every database with the integrity checks.  I have a process that does this every night for all databases from all servers.  This video from Grant Fritchey always stands out in my mind - https://www.youtube.com/watch?v=Ah0jabU9G8o

    Good job refining your process.  It's great to be re-thinking things regularly.

    Tom,

    Thank you for the response and for sharing Gran'ts video. He is definitely a proponent of frequent DBCC checks (+ checksum + verify + restores). I'll keep all of that in mind as I recreate the jobs.

    Again, thanks.

    Mike

    Mike Scalise, PMP
    https://www.michaelscalise.com

  • Also, if you're doing restores of the databases to non-production environments, you can offload the logical consistency checks to those databases. You then can just do the physical_only checks on your prod servers which radically reduces the load there. Another thing to look into for load reduction is doing all the log backups on secondary databases where you're using availability groups.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey - Tuesday, July 31, 2018 5:58 AM

    Also, if you're doing restores of the databases to non-production environments, you can offload the logical consistency checks to those databases. You then can just do the physical_only checks on your prod servers which radically reduces the load there. Another thing to look into for load reduction is doing all the log backups on secondary databases where you're using availability groups.

    The man himself!

    Ok, Grant. Let me make sure I understand what you're suggesting (Note: I would be doing restores to a non-production instance.)

    So, are you saying that I would run DBCC with physical_only on my database on the secondary server, then back up the database, then restore it to the non-production instance of SQL Server, THEN do a comprehensive DBCC on that non-production instance?

    I get that running DBCC can be taxing to production. My original vision was to do DBCC (full) on the secondary (which isn't getting hit with the load of the primary), then take a backup, thus offloading the primary. Assuming I'm understanding correctly (based on what I wrote above), what's the advantage of doing the backup/restore first, then DBCC on a third, non-production server. Is it just the philosophy of moving as much off of production (including AOAG secondaries--which are technically production) as possible?

    It's possible I'm missing something here--if you could clarify, that'd be great. Thanks for helping me understand this better!

    Mike

    P.S. Yes, definitely planning to do t-log backups on the secondary!

    Mike Scalise, PMP
    https://www.michaelscalise.com

  • Mike Scalise - Tuesday, July 31, 2018 7:10 AM

    Grant Fritchey - Tuesday, July 31, 2018 5:58 AM

    Also, if you're doing restores of the databases to non-production environments, you can offload the logical consistency checks to those databases. You then can just do the physical_only checks on your prod servers which radically reduces the load there. Another thing to look into for load reduction is doing all the log backups on secondary databases where you're using availability groups.

    The man himself!

    Ok, Grant. Let me make sure I understand what you're suggesting (Note: I would be doing restores to a non-production instance.)

    So, are you saying that I would run DBCC with physical_only on my database on the secondary server, then back up the database, then restore it to the non-production instance of SQL Server, THEN do a comprehensive DBCC on that non-production instance?

    I get that running DBCC can be taxing to production. My original vision was to do DBCC (full) on the secondary (which isn't getting hit with the load of the primary), then take a backup, thus offloading the primary. Assuming I'm understanding correctly (based on what I wrote above), what's the advantage of doing the backup/restore first, then DBCC on a third, non-production server. Is it just the philosophy of moving as much off of production (including AOAG secondaries--which are technically production) as possible?

    It's possible I'm missing something here--if you could clarify, that'd be great. Thanks for helping me understand this better!

    Mike

    P.S. Yes, definitely planning to do t-log backups on the secondary!

    After we get the backup strategy all straightened out, I think Grant should hook us up with some free copies of his new 2017 performance tuning book 😉

  • Mike Scalise - Tuesday, July 31, 2018 7:10 AM

    The man himself!

    Ok, Grant. Let me make sure I understand what you're suggesting (Note: I would be doing restores to a non-production instance.)

    So, are you saying that I would run DBCC with physical_only on my database on the secondary server, then back up the database, then restore it to the non-production instance of SQL Server, THEN do a comprehensive DBCC on that non-production instance?

    I get that running DBCC can be taxing to production. My original vision was to do DBCC (full) on the secondary (which isn't getting hit with the load of the primary), then take a backup, thus offloading the primary. Assuming I'm understanding correctly (based on what I wrote above), what's the advantage of doing the backup/restore first, then DBCC on a third, non-production server. Is it just the philosophy of moving as much off of production (including AOAG secondaries--which are technically production) as possible?

    It's possible I'm missing something here--if you could clarify, that'd be great. Thanks for helping me understand this better!

    Mike

    P.S. Yes, definitely planning to do t-log backups on the secondary!

    I had to go look this up to be sure, but yeah, you can't run the logical DBCC on a secondary and then know that the the primary is logically consistent. Same goes for the physical checks. Backups allow for the logical checks to be offloaded to other servers, but availability groups only allows for the log backups to be moved. So, any server that could be the primary server, should have physical consistency checks run on it. Then, for the logical checks, you can offload those to a secondary box by restoring the database, but only through the restore operation. It's because of how availability groups runs off of transactions, not pages modified. So a transaction on one server could lead to a consistency issue because of reasons, while that same transaction on the secondary ran fine.

    Does that help?

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • heb1014 - Tuesday, July 31, 2018 7:34 AM

    After we get the backup strategy all straightened out, I think Grant should hook us up with some free copies of his new 2017 performance tuning book 😉

    HA!

    No.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Mike Scalise - Tuesday, July 31, 2018 7:10 AM

    So, are you saying that I would run DBCC with physical_only on my database on the secondary server, then back up the database, then restore it to the non-production instance of SQL Server, THEN do a comprehensive DBCC on that non-production instance?

    NO!
    The only thing that tells you is that the secondary is intact. It tells you nothing at all about the primary (which is probably the DB you care about the most)

    My original vision was to do DBCC (full) on the secondary (which isn't getting hit with the load of the primary), then take a backup, thus offloading the primary.

    NO!
    Same problem. That tells you that the secondary is fine.  Nothing whatsoever about the primary.

    You need to either checkDB the primary, or take a full backup from the primary, restore it elsewhere and checkDB that.
    The secondary DBs in an AG setup don't share the same IO subsystem, and corruption will not get transferred from one to another via the log replay, so a checkDB on one node tells you only what the state of the DB on that node is.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Grant Fritchey - Tuesday, July 31, 2018 7:44 AM

     Then, for the logical checks, you can offload those to a secondary box by restoring the database, but only through the restore operation.

    And only if the backup was taken from the primary.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster - Tuesday, July 31, 2018 7:47 AM

    Grant Fritchey - Tuesday, July 31, 2018 7:44 AM

     Then, for the logical checks, you can offload those to a secondary box by restoring the database, but only through the restore operation.

    And only if the backup was taken from the primary.

    Oh yes. Didn't mean to leave that out.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I think I get it now. Based on what you've all said, does this sound reasonable:

    NODE1 (primary), NODE2 (secondary), NPS1 (non-production server)

    - Weekly: run checkdb (physical_only) on NODE1, take a full backup of my database from NODE1, restore it to NPS1, run checkdb (logical) on NPS1.
    - Daily: run checkdb (physical_only) on NODE1, restore weekly full backup + t-log backups to NPS1, run checkdb (logical) on NPS1.
    - Hourly: backup t-logs from NODE2.

    If so, great! If not, am I getting closer?

    Gail, you mentioned:

    You need to either checkDB the primary, or take a full backup from the primary, restore it elsewhere and checkDB that.
    The secondary DBs in an AG setup don't share the same IO subsystem, and corruption will not get transferred from one to another via the log replay, so a checkDB on one node tells you only what the state of the DB on that node is.

    Since checkdb is only checking one node, if my understanding above makes sense (the weekly, daily, hourly processes), then should I also on a weekly and daily basis be restoring the database from NODE2 to NPS1, running checkdb (physical_only) on NODE2, and then running checkdb (logical) on NPS1, that way my secondary goes through all of the same consistency checks as the primary day in and day out? If so, the only difference I'd see between the two is that the t-logs for both are being backed up on NODE2.

    Also, Grant--thanks for the explanation. I'm not sure what you mean by "but only through the restore operation" in this sentence. Are you saying run checkdb as a secondary step following the restore to the non-production box?

    Then, for the logical checks, you can offload those to a secondary box by restoring the database, but only through the restore operation.

    Again, thank you.

    Mike

    Mike Scalise, PMP
    https://www.michaelscalise.com

  • Mike Scalise - Tuesday, July 31, 2018 9:48 AM

    - Daily: run checkdb (physical_only) on NODE1, restore weekly full backup + t-log backups to NPS1, run checkdb (logical) on NPS1.

    You don't need to do the second part. You did the checkDB on the weekly backup when it was taken. If it was clean then, it was clean.

    Since checkdb is only checking one node, if my understanding above makes sense (the weekly, daily, hourly processes), then should I also on a weekly and daily basis be restoring the database from NODE2 to NPS1, running checkdb (physical_only) on NODE2, and then running checkdb (logical) on NPS1, that way my secondary goes through all of the same consistency checks as the primary day in and day out? If so, the only difference I'd see between the two is that the t-logs for both are being backed up on NODE2.

    t.b.h. I'd probably just run full checkDB on the secondary ever week and be done with it. It's not serving a production load, so it can probably absorb the impact from CheckDB without notice.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Gail,

    Thank you. I've taken into account all of the advice. Would you mind taking a quick look to see if you notice anything I'm missing (or doing that I shouldn't be)?

    NODE1 (primary), NODE2 (secondary), NPS1 (non-production server)

    - Weekly:
        > Take a full backup of my database from NODE1
        > Restore it to NPS1.
        > Run checkdb (physical_only) on NODE1.
        > Run checkdb (full) on NPS1.
        > Take a full backup of my database from NODE2.
        > Run checkdb (full) on NODE2.
    - Daily:
        > Run checkdb (physical_only) on NODE1
        > Run checkdb (physical_only) on NODE2
        > Restore full backup + t-log backups to NPS1?
    - Hourly:
        > Backup t-logs from NODE2.

    Also, I wasn't sure exactly which part of the daily run you meant when you said "You don't need to do the second part." Do you mean I don't need to restore the full backup + t-log backups to NPS1 or that I don't need to run the full checkdb on NPS1 daily.

    Mike Scalise - Tuesday, July 31, 2018 9:48 AM
    - Daily: run checkdb (physical_only) on NODE1, restore weekly full backup + t-log backups to NPS1, run checkdb (logical) on NPS1.

    You don't need to do the second part. You did the checkDB on the weekly backup when it was taken. If it was clean then, it was clean.

    As always, thank you.

    Mike

    Mike Scalise, PMP
    https://www.michaelscalise.com

  • Mike Scalise - Tuesday, July 31, 2018 9:48 AM

    I think I get it now. Based on what you've all said, does this sound reasonable:

    NODE1 (primary), NODE2 (secondary), NPS1 (non-production server)

    - Weekly: run checkdb (physical_only) on NODE1, take a full backup of my database from NODE1, restore it to NPS1, run checkdb (logical) on NPS1.
    - Daily: run checkdb (physical_only) on NODE1, restore weekly full backup + t-log backups to NPS1, run checkdb (logical) on NPS1.
    - Hourly: backup t-logs from NODE2.

    If so, great! If not, am I getting closer?

    Gail, you mentioned:

    You need to either checkDB the primary, or take a full backup from the primary, restore it elsewhere and checkDB that.
    The secondary DBs in an AG setup don't share the same IO subsystem, and corruption will not get transferred from one to another via the log replay, so a checkDB on one node tells you only what the state of the DB on that node is.

    Since checkdb is only checking one node, if my understanding above makes sense (the weekly, daily, hourly processes), then should I also on a weekly and daily basis be restoring the database from NODE2 to NPS1, running checkdb (physical_only) on NODE2, and then running checkdb (logical) on NPS1, that way my secondary goes through all of the same consistency checks as the primary day in and day out? If so, the only difference I'd see between the two is that the t-logs for both are being backed up on NODE2.

    Also, Grant--thanks for the explanation. I'm not sure what you mean by "but only through the restore operation" in this sentence. Are you saying run checkdb as a secondary step following the restore to the non-production box?

    Then, for the logical checks, you can offload those to a secondary box by restoring the database, but only through the restore operation.

    Again, thank you.

    Mike

    Sorry I wasn't clear. Only through the restore process can you offload the DBCC checks in that manner.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 15 posts - 1 through 15 (of 43 total)

You must be logged in to reply to this topic. Login to reply