How Do You Patch 100 Database Servers?

  • Comments posted to this topic are about the item How Do You Patch 100 Database Servers?

  • My Infrastructure colleagues use this https://learn.microsoft.com/en-us/azure/azure-sql/virtual-machines/windows/automated-patching?view=azuresql for our SQL Servers, which makes use of Azure Arc.

    This is the modern take on WSUS which (IMHO) does the job better, as the scheduling and maintenance windows are easier to define and maintain than WSUS ever was due to the multiple Group Policies seetings which needed configuring.

    The advantage of automated patching is you can create multiple deployment ring groups (Test (Test Servers), Phase1 (UAT servers), Phase2 (Secondary Production Servers) and Phase 3 (Primary and Standalone Production Servers), giving us complete control over which servers we target and when.

    For client updates we use Windows Update for Business (WUfB), it has it's flaws, but is much better / more reliable than WSUS and Windows Update ever was for a hybrid workforce.

    • This reply was modified 1 month ago by  Adrian Scott. Reason: grammatical improvement
  • Using Qualys for patch management , stil the passive & active servers of clusters are done as different jobs.

  • Most servers have autopatching enabled and assigned to different groups that defines on which weekday at which hour they are allowed to restart.

    Some servers (as our SQL Servers) are patched manually which means that one of us writes a mail to the customers (just a small team in my case) a few days ago that the server will be unavailable at Friday, 17:00 (05:00 pm) for 0.5-2 hours and install the SQL and Windows updates in this time / reboots the server.

    And although we don't work at weekends (and don't check the server regularly) we hadn't had any problems / stability issues etc. that were caused by bugs in the updates, so Friday evening is okay for us.


    I think Windows and SQL Server are not the main problem regarding patching. The BIOS and Firmeware and maybe specific drivers for servers and network devices (router, switches, batteries ...) are much harder and riskier to patch regularly.

    God is real, unless declared integer.

  • Certainly in the AWS cloud patching is on AWS's half of the shared responsibility model.  You can request a delay to a patch but it isn't an indefinite delay.

    I am pretty sure that AWS's observability and logging capabilities detect problems such as a patch causing a DB server to be unable to use more than 8 cores.

    On the other half of the shared responsibility model is the testing of the customer application using the database that can only be done by the customer.

    If the starting principle when developing software is "this must be testable at every level by mechanical means" then the risk associated with patching becomes much lower.

    If you try and bolt on mechanical testing after the app is written then the depth and breadth of tests will be much smaller and therefore the risk increases.

    It would be interesting to know what Microsoft's approach is to testing SQL Server.  Is the code base covered in unit and integration tests and to what extent?

  • This is the self-selected "We've figured this out" group. @adrian's is the closest to what I've set up, using rings and automation for patches. The cloud delays are important, since I don't love evergreen patched on someone else's schedule.

     

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply