When looking to shorten the data warehouse development lifecycle, one should always try to employ proven design patterns to save time. This article illustrates a design pattern that I use to create multi-value dimensions from code set (lookup) tables. This design pattern uses an algorithm that takes a record set of lookup codes, i.e. a code set, pivots them into bit columns and then provides all possible bit permutations for them using dynamic SQL. The number of permutations = 2x (where x = the number of codes in the given code set).
For the purposes of this article, only temporary objects will be created, but this method can be adjusted to connect to an actual code value table and create / propagate an actual dimension table (see the attached SQL script for the complete code).
The first step is to create a temporary “flattened” code value table and insert some code values. Normally, there would be a parent code set table with a description of the code set, but we’ll just focus on the code value table. Here, we’ll use an “Item” code set with 10 code values. The 10 codes will yield 1024 bit permutations (210).
DECLARE @CodeValue AS TABLE (CodesetName VARCHAR(20), Code VARCHAR(3), CodeDescription VARCHAR(100)); INSERT @CodeValue (CodesetName, Code, CodeDescription) SELECT 'Item', 'TST', 'T-shirt' UNION ALL SELECT 'Item', 'SST', 'Sweatshirt' UNION ALL SELECT 'Item', 'BCP', 'Ball cap' UNION ALL SELECT 'Item', 'SCF', 'Scarf' UNION ALL SELECT 'Item', 'CFM', 'Coffee mug' UNION ALL SELECT 'Item', 'MTN', 'Mittens' UNION ALL SELECT 'Item', 'JCK', 'Jacket' UNION ALL SELECT 'Item', 'SKS', 'Socks' UNION ALL SELECT 'Item', 'HWM', 'Hand warmer' UNION ALL SELECT 'Item', 'PPS', 'Pen and pencil set';
Next, we’ll create some formatting variables that will be used within a dynamic SQL statement. We’ll leverage the COALESCE function to build comma-separated lists.
DECLARE @ColumnList NVARCHAR(MAX), @SelectList NVARCHAR(MAX), @JoinList NVARCHAR(MAX), @SQL NVARCHAR(MAX); SELECT @ColumnList = COALESCE(@ColumnList+',','') + '[' + Code + ']' FROM @CodeValue; SELECT @SelectList = COALESCE(@SelectList + ', ','') + '[' + Code + '] = x' + CASE RowID WHEN 1 THEN '' ELSE CONVERT(VARCHAR(1000), RowID) END + '.x', @JoinList = COALESCE(@JoinList + CHAR(10),'') + CASE RowID WHEN 1 THEN '' ELSE 'CROSS JOIN x AS x' + CONVERT(VARCHAR(1000), RowID) END FROM Codes c;
Then, we’ll format a tokenized SQL statement and replace the tokens with their corresponding formatted values.
SELECT @SQL = REPLACE(REPLACE(REPLACE( 'WITH x AS ( SELECT x FROM (VALUES(0),(1)) AS y(x) ), z AS ( SELECT @SelectList FROM x @JoinList ) SELECT ROW_NUMBER() OVER(ORDER BY @ColumnList) - 1 AS RowID, @ColumnList FROM z ORDER BY RowID;', '@SelectList', @SelectList), '@JoinList', @JoinList), '@ColumnList', @ColumnList);
Lastly, we execute the SQL statement to product the record set.
EXEC sp_executesql @SQL;
The first 10 rows of the results set will look like the following. Note that the record set is zero-indexed, thereby adhering to dimension design best-practices. The columns can be subsequently renamed to meet your requirements.
RowID TST SST BCP SCF CFM MTN JCK SKS HWM PPS -------------------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 0 0 0 0 1 0 3 0 0 0 0 0 0 0 0 1 1 4 0 0 0 0 0 0 0 1 0 0 5 0 0 0 0 0 0 0 1 0 1 6 0 0 0 0 0 0 0 1 1 0 7 0 0 0 0 0 0 0 1 1 1 8 0 0 0 0 0 0 1 0 0 0 9 0 0 0 0 0 0 1 0 0 1 10 0 0 0 0 0 0 1 0 1 0
Putting all the code together looks as follows:
BEGIN TRY SET XACT_ABORT ON; SET NOCOUNT OFF; DECLARE @CodeValue AS TABLE ( CodesetName VARCHAR(20), Code VARCHAR(3), CodeDescription VARCHAR(100) ); INSERT @CodeValue ( CodesetName, Code, CodeDescription) SELECT 'Item', 'TST', 'T-shirt' UNION ALL SELECT 'Item', 'SST', 'Sweatshirt' UNION ALL SELECT 'Item', 'BCP', 'Ball cap' UNION ALL SELECT 'Item', 'SCF', 'Scarf' UNION ALL SELECT 'Item', 'CFM', 'Coffee mug' UNION ALL SELECT 'Item', 'MTN', 'Mittens' UNION ALL SELECT 'Item', 'JCK', 'Jacket' UNION ALL SELECT 'Item', 'SKS', 'Socks' UNION ALL SELECT 'Item', 'HWM', 'Hand warmer' UNION ALL SELECT 'Item', 'PPS', 'Pen and pencil set'; DECLARE @ColumnList NVARCHAR(MAX), @SelectList NVARCHAR(MAX), @JoinList NVARCHAR(MAX), @SQL NVARCHAR(MAX); SELECT @ColumnList = COALESCE(@ColumnList + ',', '') + '[' + Code + ']' FROM @CodeValue; WITH Codes AS (SELECT RowID = ROW_NUMBER() OVER (PARTITION BY CodesetName ORDER BY Code), Code FROM @CodeValue) SELECT @SelectList = COALESCE(@SelectList + ', ', '') + '[' + Code + '] = x' + CASE RowID WHEN 1 THEN '' ELSE CONVERT(VARCHAR(1000), RowID) END + '.x', @JoinList = COALESCE(@JoinList + CHAR(10), '') + CASE RowID WHEN 1 THEN '' ELSE 'CROSS JOIN x AS x' + CONVERT(VARCHAR(1000), RowID) END FROM Codes AS c; SELECT @SQL = REPLACE( REPLACE( REPLACE( 'WITH x AS ( SELECT x FROM (VALUES(0),(1)) AS y(x) ), z AS ( SELECT @SelectList FROM x @JoinList ) SELECT ROW_NUMBER() OVER(ORDER BY @ColumnList) - 1 AS RowID, @ColumnList FROM z ORDER BY RowID;', '@SelectList', @SelectList ), '@JoinList', @JoinList ), '@ColumnList', @ColumnList ); EXEC sp_executesql @SQL; END TRY BEGIN CATCH PRINT ERROR_MESSAGE(); END CATCH;
Happy warehousing!