How to create and refresh development and test databases automatically using SQL Clone and SQL Toolbelt
A PowerShell automation script to build a SQL Server database from source control, seed it with dummy data, document it, and then deploy copies to any number of test and development servers.
To deliver database changes quickly and reliably, there are several tasks that must be automated. It can be a daunting job to ensure that the whole team has the latest database build, when there is a proliferation of copies, and the database is big.
This article provides a PowerShell script that will automatically build and distribute copies of a database, for test and development work, where the source code for the database is stored in the version control system (VCS). The PowerShell automation script will build the database from the latest build script, in version control, insert seed data (optional) and document the database (optional). In the provision phase, it uses SQL Clone data virtualization to distribute lightweight database copies to the required servers. SQL Clone creates an image of the source database, and from the image it creates ‘clones’ of the database, or refreshes existing clones, on each of the SQL Server development and test instances that you specify. The automation script comes with a few safety checks to ensure, for example, that before dropping existing clones and creating new ones from a new image, any changes to local database clones are written to a directory in the VCS as a safeguard. The net result is that all the development and test databases should end up with metadata and data that is identical to that specified in the source, for that database version.
The solution uses Redgate’s SQL Clone, SQL Compare and a few other optional tools from the SQL Toolbelt.
The good and bad of shared development databases
I’ve been in IT long enough to remember an innocent time in database development, where we generally had just one development database on one server, and we all developed and tested on it.
In many respects, we managed quite well because, after all, a relational database is a multi-user, multi-process device, and is its own development environment. We stored the database object scripts in source control, and would check out objects, such as tables or views, to edit them, do our work and then perform a check-in to include the changes in the next development version. There were complications when one developer decided to alter an object with interdependencies on the work of another developer, so it was just as well we worked in the same room, within shouting distance. The advent of schemas put an end to the obvious collisions.
Each nightly build would create a new version of the database, purely from the scripts in source control. In theory, this meant that nothing unwanted, nothing that hadn’t been specifically included, could creep into the build. In practice, even database developers are human, and mistakes were common. Nightly integration tests were run to prove that each build could produce a functioning database.
Of course, having a tested, functioning database in development did not necessarily mean we had a database that was deliverable to production. Generally, we would scrabble around to ensure that the updated tested version of the database worked with any existing production system that relied on the database, practicing in staging until everything went right. It wasn’t pretty and could take a long time. We would miss things that we assumed would be in place, and we’d find conflicts that had to be resolved.
Automating database delivery
Nowadays, we like to build, integrate and deploy the database automatically, via a script. We practice this process as often as we can. There are no halcyon days when the database is too immature to build and deploy. Instead, we prefer to start to take the database through to staging as soon as we’ve written anything more than a couple of ‘hello world’ database objects. This means we get immediate feedback of a problem, and we can also say, at any point in time, whether we have a version that is potentially deliverable. We want to be able to deliver a database very quickly and reliably, at any stage of the database lifecycle, if the application developers are waiting for a database change, or if there is an important security fix to put in place.
The more rapid the delivery cycle, the more requirements there are for copies of the latest build. We have developers who need to work on their own isolated copy of the database, but we also need several copies of the database for development and testing, rather than sharing just one. We often need a test cell with servers and databases to support a wide variety of tests, performed in parallel, and in some tests we need to validate the build against other versions of the database.
The more rapidly we must deliver, the harder it is to keep the database documentation up-to-date, and the better we need to get at automating that process. If parts of the data that we use to develop with are sensitive in terms of privacy, company security or are financial, we need to mask, obfuscate or create from scratch the data that we develop and test on. All these factors mandate an automated build process.
A PowerShell Script for database provisioning
At some point, one must stop talking in generalities, cease waving one’s arms whilst glossing over real technical problems and produce a practical script that deals as sensibly as possible with these challenges and can reliably automate database provisioning to the development and test servers.
There are several dangers inherent in this, of course. Firstly, no two development teams have ever done database development in the same way, so scripts must be changed to conform to your development environment. Secondly, real scripts are too idiosyncratic. They reveal too much about the quirks of the way you go about the process of developing databases.
To try to overcome these problems, I’ve created a PowerShell script that, I hope, is easy to adapt but deals with all the issues in a general way. The general principles are:
- All databases that are specified in a list must, if the build succeeds, end up being at the version of the current successful build.
- Before we update a development database, we preserve changes just in case the developer has forgotten to save their work in source control.
- We audit as much of the provisioning process as possible.
- We need to allow for arbitrary pre- and post-build scripts, mainly for inserting data.
- It should be possible to inspect the entire database build script.
- The PowerShell script to automate the whole process should have no hard-coded variables. These should be held in a separate build data file.
What the solution does, at a glance
Figure 1 lays out, broadly, what the script does at each stage of the database provisioning process.
What you need to do to get it to run
You will need, at minimum, a couple of SQL Server instances, a copy of SQL Compare, and a copy of SQL Clone. You need to install SQL Clone and set up the SQL Clone Server and the client development and tests servers on which you will create the clones.
If you have the toolbelt, you’ll be able to use SQL Data Compare, SQL Data Generator and SQL Doc, as required. You will need to download the PowerShell automation solution from GitHub. This consists of two PowerShell files: a build data file (MyInstallationData.ps1) and a process script (RedgateProvisioning.ps1). The process script take the instructions for the build and provisioning, and all parameters, from the build data file. Nothing by way of configuration information is held in the process script.
You will need to change the values stored in the build data file to define your database, server environment, network paths, list of clones, and so on. I’ll describe the structure of the build data file later in the article.
How the solution works
The solution has two broad phases of operation, the build phase, and the provisioning phase.
The build phase creates the latest database version on the build server, documents it and stocks it with test data. To do this, it uses various SQL Toolbelt tools, namely SQL Compare, SQL Data Compare, SQL Doc and SQL Data Generator. Only SQL Compare is essential.
The provisioning phase provides each designated development or test instances with a copy of the build database, produced by the build phase. Traditionally, this might mean taking a backup of the build database and restoring it to each of the development and test servers, or copying across the MDF file to each server and attaching it to the instance. In either case, this means that you need to copy the same bytes of data many times over the network. This becomes increasingly expensive in terms of both time and disk space as the size of the build database grows, and the copies proliferate.
In this solution, the provisioning phase uses SQL Clone, which copies the bytes only once, and then uses disk storage virtualization technologies, built into Windows, to virtualize the data on each target instance.
SQL Clone creates one full copy of all the data and metadata that makes up the source database; the image. From this image, it can then create clones on each of the development SQL Server instances. Each clone will be only a few tens of MB in size, but has access to all those same bytes by reading from the image. The only data stored locally for each instance are data pages containing changes made directly to the local clone databases.
The Build phase
The script will:
1. Generate the build script
We check to see if build scripts already exist in the database’s Source
directory (at DatabasePath
). If they don’t, then the process script will create them from a designated development server (current.DevServerInstance
) and database (current.Database
), using SQL Compare, and will then store them at DatabasePath\Source
. This allows you to put an existing database into source control. This was useful for testing the script, but I left it in just in case one or two of you haven’t got your database in source control! We assume that this Source
directory will already be a source control directory, or will be turned into a GitHub repository. Whichever way it happens, you specify the directory in MyInstallationData.ps1.
Similarly, the script will check for the existence of a source data
directory, which will hold the INSERT
script for static data and any other data needed for testing. If it doesn’t exist, the script will, if required, generate a data INSERT
script (DataSyncFile.sql
) from the existing data in the current
database, using SQL Data Compare. Be careful with this, it is designed merely for the small amount of static data that is required for a database to run. Test data needs to be loaded from native BCP format. Insert statements would take too long.
In reality, there will be a number of different data sets for various forms of testing. As an alternative to using SQL Data Compare, or native BCP, we can specify that we need to generate the data using a SQL Data Generator project file (.sqlgen
) as a post-build step.
Once the object-level build scripts exist in the database’s Source
, it will use them to generate a single database build script. It uses SQL Compare to read the Source
directory of scripts, compare that with the model
database on the target build server instance, and generate a synchronization script that will synchronize the two. In effect, this produces a database build script (Database.sql
), with all the objects in the correct dependency order.
2. Build the latest version of the database, from version control
This stage builds the new database version, on the designated build server (build.NewBuildServerInstance
) using the build script from the previous stage.
If a build database already exists, representing the previous day’s build, it will destroy it and then build the new version, having first checked for any existing clones (created from the image of this database – see the Provisioning phase). If any clones exist, it compares this build database to the clones that were created from it, using SQL Compare, and write any changes made to the local clone as a “diff” script (which will have the clone database name appended with the date) to a Changes
directory, just in case they weren’t checked in, before we deploy new clones.
Having done this, it checks for any active user processes on the build database, and if it’s OK to do so, drops the existing build database. It then creates a new database with the same name and then executes the build script.
There is also a “build check” step, where it uses SQL Compare to compare the new build with the source code directory to make sure that it was successful (i.e. that the build database matches exactly the database described by the scripts in the Source
directory)
3. Fill the database with test data
If the script finds a SQL Data Generator project file, DataGenerator.sqlgen
, in the database’s Source
directory, it will use it to load the new build database with generated test data. Otherwise, it will use the DataSyncFile.sql
script containing the INSERT
statements.
4. Generate database documentation
The script will generate database documentation (HTML) for the new build database, or refresh the existing documentation, using SQL Doc, if we specify a path to the SQL Doc executable in the build data file, and save it to version control
The Provisioning phase
During this phase the script will use SQL Clone to create an image of the database and then, from this image, create or refresh clones on the list of clones (named Clones) in MyInstallationData.ps1 that are in one of the target servers that have been registered on the SQL Clone Server.
5. Create an image of the build database
The script gets the names of the build SQL Server instance and database, and creates an image of this database at the image location specified for it on SQL Clone Server by the image_id.
6. Create clones
The script creates as many clones as you specify from the image. It goes through the list of clones that you provide in the data file and, if the clone exists, removes it. We have already saved any alterations. It then creates it from the image.
The Installation data file (MyInstallationData.ps1)
You will need to change the build data file to define your database, server environment, network paths, and list of clones, all in a PowerShell PSON Data structure. This is in several sections and subsections
Tools
A list of the locations of the various tools:
- The path to where we have SQL Compare installed
- The path to where we have SQL Data Compare installed (leave $null if not wanted)
- The path to where we have SQL Data generator installed (leave $null if not wanted)
- The path to where we have SQL Doc installed (leave $null if not wanted)
Source
The various directories in which you want to store files and logs:
- The path to the Source directory for this database
- The location of the executable SQL data insertion script.
- The location of the SQL Data Generator file for any columns you need to obfuscate (leave $null if not wanted)
- Where you want to put the reports for a database.
- Where changes between clone and build are stored
Current
The details of the current development shared server if you don’t yet have a VCS-based source code directory (leave as $null
or delete if you don’t need or want to use this. It is only used if the system can’t find your source code directory
- The Server Instance of the development SQL Server to get the source from (optional)
- The name of the database
Build
The location of the database that you want to build:
- The Server of the SQL Server Instance you want to use for the build
- The name of the database you want to call it
- The SQL Data Generator project you want to use for the data (optional)
Image
The details of the image that you want to create:
- The name of the image we want to create
- The clone server URL
- The ID of the SQL Clone image location (usually 1)
Clones
A list of all the cloned databases, referenced by the NETNAME
of the clone servers and the name of the clone database
Alternative Strategies
Normally when I do a PowerShell script that uses a proprietary tool, I like to provide an additional alternative version that uses whatever is provided by Microsoft or has a free software license, but in this case, it would be rather impractical. It is possible to do a build from object scripts: I’ve illustrated how to do so in my article, How to Build and Deploy a Database from Object-Level Source in a VCS.
Cloning can present a much bigger problem. One alternative mechanism for provisioning is to copy the MDF file onto each machine and attach it on each server. One can, of course run a backup from the database you want to copy, and restore it to each clone. Both these techniques are slow, end up with a mass of data being moved around the network and use a lot of disk space. It can also leave more complications for security due to the file permissions that are necessary. The editor tells me that Simple-Talk is due to publish an article by Grant Fritchey that tells you how to do this.
For further information
The script itself is rather too long to be embedded in an article. Besides this, I keep adding features and trying to improve it. Everyone who tries it wants additional features. I’ve therefore decided to place it on Github in the hope that someone else will take it and improve it.
Phil-Factor/DeployViaToolBeltAndClone