Baking up and restores VLDBs

  • Since you have NetApp and right after I complained about snapmanager - do you have access to it?

    My issues with SnapManager I believe were primarily that snapman appears to have a very narrow window of compatibility between the target application versions, OS versions hosting those applications, ontap versions and snapman version but if that can be maintained it works quite well.

    I do not remember if it was possible to get a bak file out of snapman if you needed it. I would consider this a critical omission if it can't provide that...

  • second page bump

  • I used Vmware+NetAPP level instant backups

    They are great until your databases are small, <200-500Gb

    With that workload, our servers are bare metal. Local SSD for data and NVMe for tempdb. Because no money in the world can buy you a storage which matches performance with local SSDs. Powerful storages have decent throughtput, but latency can't be less that 0.5ms which is a roundtrip between your server and a storage.

    Local SSD can have latency of 0.001ms and achieve 250K IOps. Of course, local SSD can fail even in RAID, so AlwaysOn is a must.

  • When I metioned NetApp, it was on a previous job where I had hunderds of servers with small databases

    On a new job with VLDBs and 80Tb we use NetApp just as primitive fileshare. A \\UNC target for the backups

    • This reply was modified 1 year, 1 month ago by  tzimie.
  • Snapmanager for SQL allows you to database level backups and avoid the OS freeze. We implemented it as a solution to back up a 3-4 terabyte sharepoint site database in a 24x7 organization. Even if we had managed to reduce the backup time to an hour it would have been too long.

    Snap Protect from Commvault is partially storage agnostic, but it was fairly complicated to implement with application consistency and requires a big infrastructure footprint.

    You mentioned you are using a bridged 2 x 10 Gbps connection - have you considered adding additional NICs to the server to back up over a backup network? It has been several years since I have done any server work, but at that time found that any kind of nic bonding tended to parallelize network streams poorly. Adding two additional nics each as their own host and then using SMB multi path may not provide any additional network performance, but it would be available and still have redundancy. Note that SMB multi-path can be CPU intensive and requires some configuration to get working.

  • CreateIndexNonclustered wrote:

    Snapmanager for SQL allows you to database level backups and avoid the OS freeze. We implemented it as a solution to back up a 3-4 terabyte sharepoint site database in a 24x7 organization. Even if we had managed to reduce the backup time to an hour it would have been too long.

    Snap Protect from Commvault is partially storage agnostic, but it was fairly complicated to implement with application consistency and requires a big infrastructure footprint.

    You mentioned you are using a bridged 2 x 10 Gbps connection - have you considered adding additional NICs to the server to back up over a backup network? It has been several years since I have done any server work, but at that time found that any kind of nic bonding tended to parallelize network streams poorly. Adding two additional nics each as their own host and then using SMB multi path may not provide any additional network performance, but it would be available and still have redundancy. Note that SMB multi-path can be CPU intensive and requires some configuration to get working.

    One question.  You are only talking about methods to backup.

    Have you regularly tested the restores?

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • Michael L John wrote:

    CreateIndexNonclustered wrote:

    Snapmanager for SQL allows you to database level backups and avoid the OS freeze. We implemented it as a solution to back up a 3-4 terabyte sharepoint site database in a 24x7 organization. Even if we had managed to reduce the backup time to an hour it would have been too long.

    Snap Protect from Commvault is partially storage agnostic, but it was fairly complicated to implement with application consistency and requires a big infrastructure footprint.

    You mentioned you are using a bridged 2 x 10 Gbps connection - have you considered adding additional NICs to the server to back up over a backup network? It has been several years since I have done any server work, but at that time found that any kind of nic bonding tended to parallelize network streams poorly. Adding two additional nics each as their own host and then using SMB multi path may not provide any additional network performance, but it would be available and still have redundancy. Note that SMB multi-path can be CPU intensive and requires some configuration to get working.

    One question.  You are only talking about methods to backup.

    Have you regularly tested the restores?

    Yes.

    Application consistent means application consistent - if that is what you were getting at.

  • Yes, we have dedicated server to test the restores

  • CreateIndexNonclustered wrote:

    Snapmanager for SQL allows you to database level backups and avoid the OS freeze. We implemented it as a solution to back up a 3-4 terabyte sharepoint site database in a 24x7 organization. Even if we had managed to reduce the backup time to an hour it would have been too long.

    Snap Protect from Commvault is partially storage agnostic, but it was fairly complicated to implement with application consistency and requires a big infrastructure footprint.

    You mentioned you are using a bridged 2 x 10 Gbps connection - have you considered adding additional NICs to the server to back up over a backup network? It has been several years since I have done any server work, but at that time found that any kind of nic bonding tended to parallelize network streams poorly. Adding two additional nics each as their own host and then using SMB multi path may not provide any additional network performance, but it would be available and still have redundancy. Note that SMB multi-path can be CPU intensive and requires some configuration to get working.

    1. Is it possible to use Snap Protect from Commvault for local SSDs? Should I have twice (or less) additional space for copy-on-writes?  Can it handle 250k Iops? What latency is added by that additional layer? Frankly, I doubt that it can sustain workloads we have
    2. I don't know if it is possible or not to add additional NICs. I believe we can install it on biggest servers, but our cold storage for backups (NetApp) inbound is already saturated, so there is no point. But I believe NIC teaming of 2x10Gbps gives us 17Gbps in peak. But I am not a network expert.
  • tzimie wrote:

    CreateIndexNonclustered wrote:

    Snapmanager for SQL allows you to database level backups and avoid the OS freeze. We implemented it as a solution to back up a 3-4 terabyte sharepoint site database in a 24x7 organization. Even if we had managed to reduce the backup time to an hour it would have been too long.

    Snap Protect from Commvault is partially storage agnostic, but it was fairly complicated to implement with application consistency and requires a big infrastructure footprint.

    You mentioned you are using a bridged 2 x 10 Gbps connection - have you considered adding additional NICs to the server to back up over a backup network? It has been several years since I have done any server work, but at that time found that any kind of nic bonding tended to parallelize network streams poorly. Adding two additional nics each as their own host and then using SMB multi path may not provide any additional network performance, but it would be available and still have redundancy. Note that SMB multi-path can be CPU intensive and requires some configuration to get working.

    1. Is it possible to use Snap Protect from Commvault for local SSDs? Should I have twice (or less) additional space for copy-on-writes?  Can it handle 250k Iops? What latency is added by that additional layer? Frankly, I doubt that it can sustain workloads we have
    2. I don't know if it is possible or not to add additional NICs. I believe we can install it on biggest servers, but our cold storage for backups (NetApp) inbound is already saturated, so there is no point. But I believe NIC teaming of 2x10Gbps gives us 17Gbps in peak. But I am not a network expert.

    1. It is possible to use local storage, yes. I thought about it a bit more last night however and on a physical box it is probably not practical to implement after the hardware has been purchased, it would need to be planned for while the server is being specced out. For other than the very smallest deployments you need another disk dedicated to the snaps that needs to be of similar fault tolerance and performance as the primary disks. There are also considerations around volume layout. The guidance was pretty confusing, whether each database should have its own volume or each SQL file should have its own volume. We determined pretty conclusively that the logs should not share a volume with another log. This would be difficult to manage without using storage spaces, but then that again is something that would have to be decided and planned for before the server hardware is delivered. Whether it works are promised at your scale, I don't know, but well below that it did work extremely well.
    2. If the SAN is already saturated it is a moot point. What I have always observed with various types of link aggregation is that when the network traffic flows between a single pair of hosts, nearly all the traffic tends to go out a single interface in the LAG, even when it is more than one network session. Maybe there is some implementation that does parallelize more effectively, before Windows teaming existed, there were wild differences in aggregation behavior even across servers from the same vendor. That specific problem was always something I looked for if I found network performance issues in something.
  • Michael L John wrote:

    CreateIndexNonclustered wrote:

    VSS snapshots are SQL consistent and the feature no longer requires SQL Enterprise. They are supposed to be very fast - they may not be fast without adequate maintenance, but thorough maintenance is just a non-optional cost of an instance such as that. I have only had them either function normally, and quiesce for a few hundred milliseconds at most, or not function at all and quiesce for dozens of seconds or minutes. (cough veeam, cough snap manager)

    I cannot even begin to imagine having a recurring 12-hour maintenance window to run backups. I suppose if I did I would prefer traditional backups but I have never had that, even in my "8-5" shops.

    What kind of maintenance are you referring to?

    Have you every thoroughly tested the data consistency (loss) from a snapshot of any type?  I have.

    I didn't see this yesterday.

    Snapshot based backup maintenance is almost entirely change management and patch management. If your backup provider is patched only to 2021, everything else involved needs to be kept within the bounds of all the products' life cycles at that same point in time. The environments I supported that had successful snapshot backups never perfectly achieved that, but it was never a problem. If they were configured correctly, problems only arose after egregious mismanagement of patching or high important patches omitted somewhere.

    Your comment about finding corruption in a snapshot suggests that you either did not implement application consistent snapshots, or something was configured incorrectly/not configured. I have also had problems with backup products that did generic snapshot-based backups that claimed to be SQL-consistent, but implemented it through VDI that created minutes long freeze times.

    Running a snapshot on something like SQL, Exchange, SharePoint, etc without the correct application writers will absolutely cause backups to be bad, sometimes incur long freeze times, and even had Veeam corrupt an ADFS WID database in the live server once when it failed to quiesce the server properly.

    Just because it is a snapshot backup doesn't mean you can skip testing restores anymore than you could skip testing regular SQL backup restores.

    Choosing the wrong backup product that doesn't meet requirements or configuring it incorrectly doesn't mean that the problem is anything other than Layer 8.

    Your backups currently run in 12 hours. What would you do if organization requirements changed to where you need more than two full backups made in a 24 hour period?

    I go to extreme lengths to avoid using snapshot-based backups, but when VLDBs are involved its commonly the best solution.

  • I had a 160TB db at one point.  A large % was blob data and the associated table was INSERT only. We also had another activity log table that was also INSERT only.  In this situation, you can use partitioning and read-only filegroups to improve backup/restore operations.  The read-only filegroups you can do a one-off backup of and exclude them from the normal weekly/daily backups - just backup the read/write filegroups.  You might also be able to use file/filegroup backups if you have filegroups that are mostly readonly. In this case you don't need to take a FULL backup of these filegroups very often and the DIFF will remain small.  On the restore side, things can be optimized with online piecemeal restores.

    The downside is that this adds complexity and it could be painful to implement on an existing VLDB.  You really need to understand how this works and have the process tested regularly (always important but really critical in this case).  If you get it wrong you might not be able to fully recover your database.  If you can get away with doing normal FULL/DIFF backups I'd recommend that - but thought it was worth a mention.

    I have a blog post on this if you are interested in reading more about it.  I remember there were some good MCM videos available on piecemeal restores which are now available here.

    DBA Dash - Free, open source monitoring for SQL Server

  • This was removed by the editor as SPAM

Viewing 13 posts - 16 through 27 (of 27 total)

You must be logged in to reply to this topic. Login to reply