This editorial was originally published on May 6, 2015. It is being re-run as Steve is traveling.
I ran into the word idempotent in the Stairway to Integration Services. I had heard the word, but I hadn't really considered how important it can be for a DBA or developer until that time. It's a term used in computer science, as well as other sciences, but I think it's one that many of us don't consider when we're writing code, especially code used to deploy software to other systems.
Most of us have written scripts like this:
if exists (select object_id from sys.objects where name = 'uspGetSales') drop procedure uspGetSales; create procedure uspGetSales as ...
At the end of running this code, we have the system in a state. If we run this over and over, we'll regularly return to the same state, which is often exactly what we want to occur. That works great when deploying code, but what about this:
insert into states select 'CO', 'Colorado';
If I run that multiple times, what will happen? Either I'll get errors if one of these fields is a PK, or I'll get multiple inserts. If I'm running this as part of a software deployment, do I want either of those conditions? Do I want my end user to experience either one?
No, I don't. Certainly if I'm manually making changes to systems I can probably avoid issues, but that's not what I want to do. Maybe you do, but I don't. I'd like to be able to restart my deployment if something fails, without causing other issues. I'd like idempotent code, like this:
if not exists (select abbrev from states where abbrev = 'CO') insert into states select 'CO', 'Colorado';
That way if I happen to run this code twice, or ten times, I arrive in the same place.
I realize that building scripts and deployment processes that are idempotent is a pain. It's work, but it's also scriptable and repeatable work that is easy to automate over time with a few patterns. I also realize that a little extra work to prevent issues is often an investment that's worth making for both my customers, and my reputation.