Problem Space
If you're working as a DBA for any of the development teams that are working off new paradigms of Scrum, Agile, Iterative or any of the other Extreme programming flavors, then you're familiar with two topics; Refactoring and Test Driven Development. Refactoring, in a nutshell, for those who don't know, is "the process of changing software in such a way that it does not alter the external behavior of the code yet improves its internal structure."(1) That's straight out of Martin Fowler's book called Refactoring, Improving the Design of Existing Code. What it means, again the short version, is making small changes, frequently, to your code, rather than large changes infrequently. It's principally arrived at by implementing Test Driven Development. Truncated explanation, write a test, then write your code, and every time you touch the code, from then on out, run the test to verify the changes did not affect the expected behavior of the code. These are simple concepts, but incredibly difficult to achieve. We then get to toss databases into the mix. Now all hell breaks loose.
As I'm sure you're aware, making changes to databases, any changes, any time, endangers your data. Since we exist to protect data, then we have to make sure any changes are safe. Hence, a test driven development approach for databases becomes a very attractive idea. That's until you explore what it entails. If you're interested, pursue Scott Ambler's works, Agile Database Techniques and Refactoring Databases, which will tell you everything you need to know about database refactoring. All due respect to Mr. Ambler (and lots of that), he does leave a few details, rather large details, off his extensive writings on the topic. One of these is how to do unit testing in TSQL. After a lot of trial and error and searches on the web, I managed to track down TSQLUnit at SourceForge. This software largely follows the nUnit approach to unit testing, which would enable you to begin the process of unit testing your stored procedures, functions, triggers & views. I'll leave methods of unit testing database structural changes to your imagination (mainly because I still haven't worked out all the details of how we're going to apply it).
With TSQLUnit in place, we've got a mechanism for running test suites, so we're good to go, right? Wrong. Now we have to write 'x' number of stored procedures, one for each of the existing stored procedures in a database. That's a heck of a lot of code. You could take the approach of refactoring, writing a test as you come to a procedure. However, while we're willing, ready and able to take on the added work of testing our code, we're still, at the root, lazy. So, we brought in one other concept and a tool to fit the concept, code generation using CodeSmith.
Assumptions
For the purposes of this article, I'm going to assume several things. You've got a working a knowledge of TSQL, TSQLUnit, C# and CodeSmith. All or none of these may be true, but all were used to arrive at the end point. Also, fair warning, while I have a "working knowledge" of these tools, I'm by no means an expert at them. I may have some flaws in my approach. If you spot any, please drop a note in the comments section. You're working these unit tests up against databases that are not in new development. This is because anything in new development should be getting the unit tests written as you write the stored procedures. You're going to be working against read procedures for this article, but with some 'if' and 'switch' statements in your code, you could use this pattern to generate tests for create, update and delete procedures as well.
Creating a Template
First, and easiest, we need to get a list of stored procedures from a database. CodeSmith comes with a bunch of sample code that will get you on your way. Part of that is the Schema Explorer, which will help you walk through database objects and get information about them back out. It simply makes it so you don't have to create your own DMO/SMO processes (but we're lazy remember, and reinventing the wheel is a lot of unnecessary work). Using this, you can walk through a list of stored procedures and get their parameters. With the list of parameters, we can DECLARE a list of parameters within our test stored procedure that will be used to feed the application stored procedure and its respective list of parameters. The TSQLUnit procedures identify unit tests by procedures having 'ut_' at the beginning of the name. Further, they require a test suite name 'something_.' After that we usually just put in the procedure name, as you see in the code below.
<%@ Import Namespace="CodeSmith.BaseTemplates" %/> <%@ Import Namespace="SchemaExplorer" %/> <%@ Import Namespace="System.Collections" %\> <%@ Import Namespace="System.Text.RegularExpressions" %\> <%@ Import Namespace="System.Data" %> <%string procName; bool addComma = false; foreach (CommandSchema storedProc in StoredProcedures) { procName = "ut_" + TestSuite + "_" + storedProc.Name; %> IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[<%= procName %>]') AND OBJECTPROPERTY(id,N'IsProcedure') = 1) DROP PROCEDURE [dbo].[<%= procName %>] GO CREATE PROCEDURE [dbo].[<%= procName %>] AS DECLARE @Errorint SET NOCOUNT ON <%foreach(ParameterSchema parameter in storedProc.NonReturnValueParameters) {%> DECLARE <%=parameter.Name%> <%=parameter.NativeType%><%if (parameter.Size > 0) {switch(parameter.NativeType) { case "nvarchar": case "nchar": case "ntext": case "varchar": case "char": %>(<%=parameter.Size%>)<% break; case "decimal": case "numeric": %>(<%=parameter.Scale%>,<%=parameter.Precision%>)<% break; } //end switch } //end if %>
The other thing we'll need to generate are a pair of temporary tables. One will hold the results of the query that we're testing. The other will hold the anticipated results. We found so many ways that the the anticipated results could be generated that, instead of picking one and forcing eveyrone to use it, we decided to leave that up to the individuals creating the tests. Never forget, that just because you're automating the generation of your code, you're not replacing the human brain. Just so you know, various options include, having a copy of the stored procedure that is unedited and using this to compare the data sets with the edited version, simply inserting anticipated data into the temporary table, running a select statement against the database to get a set of data out based on parameters available, creating tables for storing test data, among others. Since we would have had to repeat the code for creating the temp tables, we put it into a function.
--table for holding results from read procs CREATE TABLE #Actual(<%=TableStructure(storedProc)%>) --table for holding expected results for later comparison CREATE TABLE #Expected(<%=TableStructure(storedProc)%>) public string TableStructure(CommandSchema storedProc) { bool addComma; string structure; structure = ""; foreach(CommandResultSchema result in storedProc.CommandResults) { addComma = false; foreach(CommandResultColumnSchema column in result.Columns) { if (addComma){structure += "\t,";} addComma=true; structure += column.Name; structure += (" " + column.NativeType); switch(column.NativeType) { case "nvarchar": case "nchar": case "varchar": case "char": structure += ("(" + column.Size +")"); break; case "decimal": structure += ("(" + column.Scale +"," + column.Precision + ")"); break; } structure += "\r\n"; } } return structure; }
Now, we've declared variables and created temporary tables. We're almost ready to run the procedure, but, we need to set the variables to some known set of values that will result in the desired test data. Again, we're generating place holders for manipulating this data, but it's up to the user to input the correct values.
<%foreach(ParameterSchema parameter in storedProc.NonReturnValueParameters) {%> SET <%=parameter.Name%> = <%switch(parameter.NativeType) { case "nvarchar": case "nchar": case "ntext": %>N''<% break; case "varchar": case "char": case "text": case "datetime": case "smalldatetime": %>''<% break; }%> <%} //end set statements %>
With everything prepared, there's nothing left but to run the procedure and verify the output. We generate the procedure call so that it will put its results into a temporary table. Then we begin verifying the results. First, obviously, we check for errors. If any, this test has failed, so we pass it back to the TSQLUnit procedures.
<%if (readProc){%>INSERT INTO #Actual <%}%>EXEC @Error = <%= StoredProcedures[0].Database.Name%>.dbo.<%= storedProc.Name%> <%foreach(ParameterSchema parameter in storedProc.NonReturnValueParameters) { %> <%if (addComma){%>,<%} addComma=true; %><%=parameter.Name%> = <%=parameter.Name%> <%}//end parameters%> IF @Error <> 0 or @Error IS NULL exec tsu_failure '<%= procName %> Failed'
At the end, we have to run a comparison between the two data sets in order to verify that they match. For that, we have a special stored procedure that we call, passing in the two temporary tables.
CREATE PROC [dbo].[assert_tables_equal] @tableOnevarchar(50),--EX: '#tmp' @tableTwovarchar(50)--EX: '#tmp2' AS SET NOCOUNT ON BEGIN DECLARE @sql varchar(8000) SET @sql = ' IF(SELECT COUNT(*) FROM '+ @tableOne + ') =' SET @sql = @sql +' (SELECT COUNT(*) FROM '+ @tableTwo + ')' SET @sql = @sql +' BEGIN' SET @sql = @sql +' IF' SET @sql = @sql +' (SELECT COUNT(*) FROM '+ @tableOne + ') <>' SET @sql = @sql +' (SELECT COUNT(*) FROM' SET @sql = @sql +' (SELECT * FROM '+ @tableOne + ' UNION SELECT * FROM '+ @tableTwo + ') tables)' SET @sql = @sql +' BEGIN' SET @sql = @sql +' EXEC tsu_failure ''The table contents are not equal.''' SET @sql = @sql +' END' SET @sql = @sql +' END' SET @sql = @sql +' ELSE' SET @sql = @sql +' BEGIN' SET @sql = @sql +' EXEC tsu_failure ''The table counts are not equal.''' SET @sql = @sql +' END' EXEC(@sql) END
Conclusions
At the end, we can run this against any number of procedures from any database and create a script, automagically for running unit tests against these procedures. We've found it be immensely helpful when we go to performance tune stored procedures to be able to show the business users and developers we support that, in fact, not only does the procedure run faster after our tuning, but it's still returning the results that you anticipate.
This barely scratches the surface of what's possible with generated SQL code. Most of our new development is now using generated TSQL for the CUD part of our CRUD procedures (we found that the read procs are usually a little to custom, in most cases, to generate them). Much more is available at the CodeSmith discussion forum. This is also but the barest of introductions to Agile methods within the database. For more on this, I can't recommend anything more than Scott Ambler's books and web site.
A lot of this may have to be completely reworked soon. Microsoft announced a new version of Visual Studio for database developers. It incorporates, among other things, a full suite of unit testing, in either managed code or TSQL. While it's something to look forward to, the price tag might scare people off. If so, this type of approach is available.
Special thanks to Scott Abrants, my co-worker, who's helped & guided me through the unit testing process and gave me a hand with this article.
Footnotes
(1) Martin Fowler, "Refactoring, Improving the Design of Existing Code", Addison-Wesly, Boston, xvi