Recently I was working on cleaning up some names in a table and transforming them to a consistent format. In doing so, I realized that the process was more complicated than I expected. There are many combinations of how a name is put together, as well as the rules for combining names together. As a fan of unit testing, I put together a short test to help ensure that the queries I use will actually produce the correct results.
Framing the Problem
I've got a table with a few columns, but only a few contain a name. In this case, I had a table like this:
CREATE TABLE [dbo].[Contacts] ( ContactID int NOT NULL PRIMARY KEY, Salutation varchar(20) NULL, FirstName VARCHAR(80) NULL, MiddleName VARCHAR(80) NULL, LastName VARCHAR(80) NULL, Suffix VARCHAR(20) NULL, -- .... (other columns) ) GO
One of the tasks that needs to occur is that the various parts of the name need to be combined for some queries. This means while I might have a name split up as
Salutation FirstName MiddleName LastName Suffix
---------- --------- ---------- -------- ------
Mr. Delaney S Jones
I'll need to return this as "Mr. Delaney S. Jones", or some variation. With potential empty or NULL values in certain places, I need to ensure my queries can handle the various data elements and still produce a valid result. A simple task, but one that can easily become an issue for many developers.
Quick Tests
One of the things I'll see developers do is something simple. First, they'll get the table in their development environment and add some data. The easiest thing to do is just add a couple rows. For the sake of this article, I'll ignore the other columns in the table.
INSERT dbo.Contacts ( ContactID , Salutation , FirstName , MiddleName , LastName , Suffix ) VALUES ( 1, 'Mr.', 'Tom', 'Edward', 'Brady', '' ), ( 2, '', 'Peyton', '', 'Manning', 'Sr' ), ( 3, 'Sir', 'Drew', 'C', 'Brees', '' ) GO
With a little data, let's look at a query:
select SELECT Salutation + ' ' + FirstName + ' ' + MiddleName + ' ' + LastName + ' ' + Suffix FROM dbo.Contacts;
That gives me these results:
Everything looks great. I'm done, right? Not really. I'm sure some of you realize that I need other formatting, and there are a few issues with the results. I've got extra spaces, no periods in places, no commas, etc. The rules for formatting names, even simple names, can be complex.
Creating the Test
It's at this point that I need a test. I certainly could start playing with code and trying to ensure that I have all the proper formats for each case, but I'm depending on my eyes to manually compare results with what my brain thinks should be there. A good idea once, but not across repeated result sets, especially as those grow.
Let's start building a test. I'll use tSQLt as the framework, since I like it and it's comfortable for me to write tests. I'll build a stored procedure that covers the test and a class to put it in. This is a normal stored procedure, and inside I'll use the AAA (Assemble, Act, Assert) pattern to build the test. Here's the outer wrapper:
EXEC tsqlt.NewTestClass @ClassName = N'NameTests' GO CREATE PROCEDURE [NameTests].[test Check Coalese of missing name parts] AS BEGIN -------------------------- -- Assemble -------------------------- -------------------------- -- Act -------------------------- -------------------------- -- Assert -------------------------- END;
This gives me a base, so let's start filling out the test. usually work my way through these sections, using some patterns that make writing tests easier. I'll look at each section below.
Assemble
The first part of this is to setup the tables I need with some data. I want a consistent starting environment, so I'll add this to the Assemble section. I'll first mock out my table with the tsqlt.FakeTable function, which gives me a copy of my table, without any constraints, triggers, or identity properties. Next, I'll add some test data.
EXEC tSQLt.FakeTable @TableName = N'Contacts', @SchemaName = N'dbo'; INSERT dbo.Contacts ( Salutation, firstname, middlename, lastname, suffix ) -- SQL Prompt formatting off VALUES ( 1, 'Mr.', 'Tom', 'Edward', 'Brady', '' ), ( 2, '', 'Peyton', '', 'Manning', 'Sr' ), ( 3, 'Sir', 'Drew', 'C', 'Brees', '' ) -- SQL Prompt formatting on
This test data is what I want to verify. I've chosen three cases here, but I can add more.
Next, I need to set up the results I expect. In this case, I'll use a pattern that often works well with the creation of an "Expected" table. I create the table that si the shape of the result set I expect, which is a single column in this case. I'll then add the values I should get.
CREATE TABLE #Expected ([fullname] [VARCHAR](100)); INSERT #Expected ( fullname ) VALUES ('Mr. Tom E. Brady'), ('Peyton Manning, Sr.'), ('Sir Drew C. Brees');
Note that these values need to be the ones I expect, based on business rules. In this case, I'm saying that a middle name should be an initial and there should be a period after the salutation and suffix. You might note these aren't great business rules.
Once I have this, I'm almost done. THe last step is to create a table to hold my results. This should look like the Expected table.
CREATE TABLE #Actual ([fullname] [VARCHAR](100));
I could create this as a SELECT ... INTO #Actual FROM #Expected WHERE 1 = 0 or any other method that gives me a table for results.
Act
The Act section of the test is where I run the item being tested. I could have a stored procedure or function here, but in this case, I'll use an inline query. I want the results of the query to go into the #Actual table, so I need an insert here.
INSERT #Actual ( fullname ) SELECT COALESCE(Salutation + ' ', '') + COALESCE(firstname + ' ', '') + COALESCE(SUBSTRING(middlename, 1, 1) + '. ', '') + lastname + COALESCE(', ' + suffix, '') FROM dbo.Contacts;
This is the query I'll be working on to ensure my tests pass. If I find issues, I'll adjust them here, and then use this query in my code. Note that this means I have a disconnect from my test to the actual query, which could be in a C# class or ASP.NET file. In the case of issues, I know where the query that works is; it's in my test.
Assert
The assert section is where I validate something is true. In this case, I'll assert the #Expected table is equal to the #Actual table, meaning my expected result set looks like my actual one. I do this with a tSQLt function.
EXEC tSQLt.AssertEqualsTable @Expected = N'#Expected', @Actual = N'#Actual', @Message = N'Incorrect query';
This call will actully compare the shape and size of the two tables. The columns, their types, and the data inside. If they're equal, meaning my query returns what I expect, this will work.
Running the Test
To run a tSQLt test, I use a simple T-SQL call.
EXEC tsqlt.Run @TestName = N'[NameTests].[test Check Coalese of missing name parts query for Contacts table]';
This runs a single test. I could also use the tSQLt.RunAll or tSQLt.RunTestClass as well. In this case, I get these results.
[NameTests].[test Check Coalese of missing name parts query for Contacts table] failed: (Failure) Incorrect query Unexpected/missing resultset rows! |_m_|fullname | +---+-------------------+ |< |Peyton Manning, Sr.| |= |Sir Drew C. Brees | |= |Mr. Tom E. Brady | |> |Peyton Manning, Sr | +----------------------+ |Test Execution Summary| +----------------------+
I read these as a test failure, and the row with Peyton Manning has a problem. The Expected value has "Sr." at the end, while the actual value doesn't have a period. The arrows to the left help me understand in which table the data has appeared.
At this point, I'd need to alter my query. I could use this instead:
SELECT COALESCE(Salutation + ' ', '') + COALESCE(firstname + ' ', '') + COALESCE(SUBSTRING(middlename, 1, 1) + '. ', '') + lastname + COALESCE(', ' + suffix + '.', '') FROM dbo.Contacts;
This could pass the test.
+----------------------+ |Test Execution Summary| +----------------------+ |No|Test Case Name |Dur(ms)|Result | +--+-------------------------------------------------------------------------------+-------+-------+ |1 |[NameTests].[test Check Coalese of missing name parts query for Contacts table]| 13|Success| ----------------------------------------------------------------------------- Test Case Summary: 1 test case(s) executed, 1 succeeded, 0 failed, 0 errored. -----------------------------------------------------------------------------
The problem is that that simple change passes the test, but might not make sense for all suffixes. There really should be better logic in the query, or more constraints for data storage. The same issue might appear in the Saluation, where some peopel may enter a period and some may not. Ideally, this might be stripped during data entry and then added in the query. Or vice versa.
Summary
This is a quick and easy way to test your queries, adding data as needed. For example, if I wanted to check what happens with this set of data:
- Salutation: Miss
- Firstname: Brandie
- Middle: D.
- Lastname: Chastain
- Suffix: Ret.
I'd just add another row in the VALUES clause for both the initial insert into the faked table, as well as the expected results.
These are many rules for transforms of data. Having a formalized testing setup ensures that you can alter your test to include new cases, as well as ensure that any future changes to your code, or your query in this case, do not cause regressions.
There aren't any perfect ways of testing SQL code, and all have some overhead and can be cumbersome, but adding testing isn't hard, and it can help you learn to write solid code. Especially if you share tests with other developers and learn what patterns they find helpful to test.