This month’s invitation is from Mala Mahadevan (b) asking us how we manage the database code. Since this is my passion project, I have a few basic tips to share.
I have yet to see a perfect implementation but even a partial one benefits you. My experience is mostly with enterprise-scaled environments - large servers, minimal downtime, brownfield development and also mostly single-tenant.
But these concepts and building blocks should be generic enough.
Throughout this article, I’ll use the word schema. It’s an overloaded term but in this case, I’m referring to a general “shape” of the database (object definitions with their results sets, columns, etc.) and not the schema returned by sys.schemas
.
With that out of the way, everything can be summed up by this meme:
Source of Truth
You can’t handle the source of Truth
— Me
We should take inspiration from application development because it’s very similar. Database code is also a code.
It’s just a tiny bit harder because you have to persist the state, so you can’t just drop the DB and rebuild it from scratch for any change.
If you want automatic deployments, you have to store your scripts somewhere. The hint is in the name: source control should store source code.
And when I say source control I mean git. Even though no one understands git, you’ll soon enough learn the golden path scenario of about 5-6 commands. And if it doesn’t work, you’ll nuke the branch and start again until it does (it’s the only way to be sure).
You should strive to:
- Move the source of Truth from the production database to the source code (no half-measures)
- Create an empty environment just from your source-controlled scripts
- This will get you one step away from developing on a shared environment and one step closer to isolation
How should I version my scripts?
There are two main schools of thought: state-based and migration-based (or a hybrid). In the end eve
State-based approach
- You define the start and end state (e.g. script out the table as is and how it should look in the end)
- Usually, a schema compare tool generates the migration for you (e.g.
alter table… add column
) - Comparing different schemas (dev <-> test or dev <-> prod) will produce different migration scripts
- State-based approach is great for collaboration
- You can see the desired state
- Since source code is usually stored as one script file per object, you can see git conflicts when two people try to edit at the same time
- The migration scripts are only as good as the schema compare tool. It often struggles with renames, online deployments or anything beyond AdventureWorks-level complexity.
Migration-based approach
- Mostly the opposite of the state-based approach
- You write numbered migration scripts and apply them in order
- You store a version table on the target and only apply the missing migrations
- Better control over deployment script (manually created)
- Unless you run the migration scripts against an environment, you don’t see the desired state
- Conflicts are harder to detect as changes against the same object can be in different script files
- Possible out-of-order merges and deploys
Hybrid approach
- The best of both worlds
- state-based approach for source control
- migration-based approach for deployments
- More moving parts mean it’s more brittle
In the end, it’s all migration scripts and you can switch back and forth between the approaches to some extent.
And the rest
There is much more to cover. From the top of my head:
- git branch strategy
- monorepo vs manyrepo
- Store application code along with the database code vs separate storage
- Store all databases in a single repo vs one repo per DB
- Variations of the above
- You have to pick your poison either scaling or dependency hell
- Unit testing
- Hotfixes and schema drift
- Empty schema vs minimal dataset vs curated dataset vs generated data
- Online deployments
- Rollback vs rollforward
- Deployment feature flags
- Free tools, paid tools, homegrown tools
- Monitoring and alerting
As you can see this is a pretty complex topic which doesn’t have a silver bullet answer yet and the tools and approaches are still evolving.
I hope you found this crash course useful.
Thank you for reading.