This is the third tip designed to help ETL developers be more productive. In the first article we discussed how you can use a keyboard shortcut to call the most often used SELECT query, in the second – how to quickly look up for extended table properties.
Here, consider having a SQL code simplifying source-target data mapping so that:
- Data warehouse designers can use the resulting table structure for a data warehouse table design
- SSIS ETL developers can have a complete SQL code ready to use in SSIS source data flow component
- The entire data warehouse development team can benefit from maintaining consistency between data extracts
To start using this technique, first create a stored procedure, assign a keyboard shortcut to it, and then invoke the shortcut against every source table.
Creating Stored Procedure for Source-Target Mapping
You can use sp_SourceTargetMapping stored procedure to generate source-data mapping code that
- Converts source data types
- Minimizes space used
- Replaces NULLs with default values
1. Converting source data types
The sp_SourceTargetMapping converts source types to destination data types according to the best practices for data warehouse design:
- Replaces date/time stamps with integer keys and uses smart date keys to partition fact tables
- Sets 'Y'/'N' values for flag-type attributes to maintain consistency and readability across data of this type
For source data type: | Converts to: |
DATETIME | INT (in yyymmdd format) |
BIT | CHAR(1): 'Y'/'N' |
2. Minimizing space used
The sp_SourceTargetMapping calculates minimal textual column size according to maximum length of data used in the source column.
For source data type: | Assigns data size to: |
VARCHAR, NVARCHAR, CHAR, NCHAR, TEXT, NTEXT | Maximum length of source data actually used |
Although saving space for dimensional attributes is not critical for a data mart, keeping dimensional table column sizes reasonably short allows to:
- Accelerate ETL lookups
- Minimize OLAP cube attribute size. This can be valuable for loading big dimensions or executing MDX queries.
Note. Remember Analysis Services do not support variable data sizes, so your VARCHAR storage benefits will get lost when data is translated to OLAP cube attributes.
3. Replacing NULLs with Default Values
In a data mart, foreign keys should never have NULLs. Avoiding NULLs for dimensional attributes simplifies ETL lookups.
The sp_SourceTargetMapping assigns columns that contain NULLs the following default values:
When values of these column types are NULL: | Assigns value to: |
VARCHAR (MAX), NVARCHAR, CHAR, NCHAR, TEXT, NTEXT | '*' |
INT, SMALLINT | -1 |
DATETIME | -1 |
Warning. If the default value carries a business meaning, choose another value that is impossible for business.
Tip. Replace the default values with Unknown, Not Applicable, or Not Available with descriptive strings later within database views.
For sp_SourceTargetMapping stored procedure code, download sp_SourceTargetMapping.sql file.
Assigning a Keyboard Shortcut to the Stored Procedure
After you created the sp_SourceTargetMapping stored procedure, assign a keyword shortcut to it so that you can quickly access it.
Note: Remember to close SSMS and re-open it to make the shortcut setting effective.
Calling the Stored Procedure with a Keystroke
To execute the stored procedure using a keyboard shortcut:
- Drag a table name from SSMS Object Explorer to Query Editor
- Select the table
- Press Ctrl+4
Exploring Data Transformation Results
When you execute the code generated earlier against the "Person.Contact" table of AdventureWorks2008R2 database for example, notice the differences between source and target data types and sizes. The differences are highlighted in red color.
Source | Destination | |||
Column_name | Type | Length | Type | Length |
BusinessEntityID | int | 4 | int | 4 |
PersonType | nchar | 4 | nchar | 4 |
NameStyle | NameStyle | 1 | varchar | 1 |
Title | nvarchar | 16 | nvarchar | 8 |
FirstName | Name | 100 | nvarchar | 48 |
MiddleName | Name | 100 | nvarchar | 32 |
LastName | Name | 100 | nvarchar | 44 |
Suffix | nvarchar | 20 | nvarchar | 6 |
… | … | … | … | … |
ModifiedDate | datetime | 8 | int | 4 |
The automatically generated code that produced the above results looks like this:
SELECT CONVERT(int, ISNULL([BusinessEntityID],-1)) AS [BusinessEntityID], CONVERT(nchar (2), ISNULL([PersonType],'*')) AS [PersonType], CASE NameStyle WHEN 1 THEN 'Y' ELSE 'N' END AS [IsNameStyle], CONVERT(nvarchar (4), ISNULL([Title],'*')) AS [Title], CONVERT(nvarchar (24), ISNULL([FirstName],'*')) AS [FirstName], CONVERT(nvarchar (16), ISNULL([MiddleName],'*')) AS [MiddleName], CONVERT(nvarchar (22), ISNULL([LastName],'*')) AS [LastName], CONVERT(nvarchar (3), ISNULL([Suffix],'*')) AS [Suffix], CONVERT(int, ISNULL([EmailPromotion],-1)) AS [EmailPromotion], [AdditionalContactInfo], [Demographics], [rowguid], CASE WHEN ModifiedDate ISNULL THEN -1 ELSE CONVERT(VARCHAR, [ModifiedDate], 112) END AS [ModifiedDate] FROM [Person].[Person] |
Realizing Benefits and Next Steps
Now, look at the code above and ask yourself:
- How long would it take to write this code manually for every data column transformation?
- How much time would it take to calculate the best size for destination columns?
- How many revisions the script would go through before you're confident you didn't forget anything?
If your answer is: the amount of time required is significant, then you are ready to start extracting benefits from this solution. You can now quickly generate the data transformation code and be certain that it is consistent with the rules.
I encourage you to start using this technique right away in your daily practice. With time, add your logic and transformation rules to adjust it better to your business needs.
For more information ideas and the best practices for data warehouse design and implementation, read these great books:
- The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling (Second Edition)
- The Kimball Group Reader: Relentlessly Practical Tools for Data Warehousing and Business Intelligence
- The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning
- The Microsoft Data Warehouse Toolkit: With SQL Server2005 and the Microsoft Business Intelligence Toolset
- Expert Cube Development with Microsoft SQL Server 2008 Analysis Services