Do any of you think that a rollback of a version for your application is easy? Most people struggle, and when I talk DevOps and improving your process, the number one question is about rollbacks. In fact, recently I had a few people that struggled to even listen to the early parts of DevOps because all they could think about were previous failed releases and the need to roll back. They kept saying, what about rolling back. Hopefully you can stick with me a bit longer.
I'd argue that rollbacks are the process that needs DevOps more than ever, with smaller, more regular releases and practice at making changes. Whether forward or backward, we ought to be able to rev our software easily. I ran across a piece from the cloud platform blog at Google called Reliable releases and rollback - CRE life lessons. The title is an interesting one, but suspend some of your database skepticism until the end.
It's easy to consider rolling back in the early parts of the article and say "it's way easier to roll back your application", and it is. Applications just stomp down new (or old) versions on top of what's there, which is often an easy thing to do. As they say at Google, "rollbacks are normal", which certainly seems to fit with the application paradigm.
In fact, they recommend rolling back a good release. After all, it's much easier to practice this sort of thing when you have a working new version of software. When the release breaks your system, as mentioned in the piece, everyone's stress level rises and the fixes often aren't well built. Even when they work, which isn't anywhere near all the time, there are often problems later. The idea should be to roll back and ensure everyone knows how to undo a version change. They can document the reasons for rollback and get the previous state of the application running. I hadn't thought about this, but it makes sense. Practice in advance and be prepared. You can always re-deploy the working version.
What about databases? They have a solution, and I like it. They want the app developers to build two versions of the application. One pre-schema change, and post. That way you deploy the first version, then the schema change(s). Then you deploy the second version. If there's an issue, you rollback to the first version and undo the database changes. This sounds hard, but once you get into the swing of building code that survives additive changes to the database, this is easy.
This doesn't solve any destructive changes to the database, like dropping objects or manipulating data. I would suggest that drops are a completely separate release, and have a full backup (or snapshot) taken and saved for awhile. For data manipulation, save off the previous state of data, just in case you need to reload things.
Becoming better at not only delivering changes to the customer, but also removing them when issues are detected is a valuable skill, and since we're likely to have a bad release at some point, this might be a way to even further reduce the risk of deploying database changes.