This article will expand on the knowledge gained from the Using Biml as an SSIS Design Patterns Engine and Biml Language Elements articles by discussing metadata, the main building blocks used in all Biml projects. Metadata is used to control the automation and code generation of all of the Biml projects created for Integration Services projects. This article will discuss the concepts to consider while working with metadata like storage, design, and change control.
Note: BIDS Helper Version 1.7.0 was released on Mar 17, 2015 and introduced some code changes that will give your existing Biml code warnings. This feature did not exist prior to this release. The most notable change was the organization within the Varigence namespace. This update removed the Hadron namespace and replaced it with the Biml namespace. Compiling code after this release will result in warnings that did not exist prior to the release, as shown below in Figure 1.
Figure 1 – BIDS Helper Version 1.7.0 Validation Message Box
The warning can be ignored at this point, but fixing the code is done by replacing the existing Hadron reference to the Biml namespace.
Change the following code:
<#@ import namespace="Varigence.Hadron.CoreLowerer.SchemaManagement" #>
To the new namespace reference:
<#@ import namespace="Varigence.Biml.CoreLowerer.SchemaManagement" #>
Article Foundation
Biml metadata can exist in two distinct states: persisted metadata and in-memory metadata. This distinction defines the available lifespan of the metadata objects and will dictate the method of interaction. In this article, I will build upon our Biml knowledge by discussing metadata and the concepts needed to use this metadata within your Biml projects. We will also focus on benefits of these two metadata options and how these choices will affect our overall solutions.
To understand metadata programming, we will discuss the “scaffolding,” or required pieces, that need to be in place to create reusable frameworks. These frameworks can start off small and later add more robust features as your Biml knowledge increases.
The example code used in this project can be executed in Visual Studio 2008, 2010, 2012, and 2013 with the BIDS Helper 1.7.0.0 (and higher) add-in installed. If you do not have BIDS Helper installed, see the previous Stairway to Biml article: Biml Basics – Level 2.
Metadata
Metadata is a term that is often used to represent various concepts in information technology. Within this article, Metadata will be defined simply as “data that describes data.” In a more concrete example, metadata is the data that describes the objects and object properties that will be used within BimlScripts to enhance Integration Services objects.
Metadata in the context of most Biml projects will represent items like: schema names, table names, column names, column details (length, precision, scale, and nullability) and column mappings.
You can choose from various Biml extension methods to obtain object metadata including the following: ImportDB, GenerateTableNodes, GetDataTable and GetDatabaseSchema. These methods load the source and/or destination metadata into in-memory objects and are not preserved unless specific programming steps are taken to persist the metadata.
Note: ImportDB will be deprecated in the future and should not be used going forward.
Biml Generation Process
To work with metadata in a Biml project we will need to examine the Biml generation process and where in the generation process to inject metadata into your Biml scripts, see Figure 2. This process starts by selecting the Biml file or files within the Miscellaneous folder, right clicking the pop out menu and selecting Generate SSIS Packages. The process is described in more detail below.
Figure 2 – Biml Generation Process
Step 1 [Biml File(s)] – User right clicks one or multiple files and selects the Generate SSIS Packages menu option.
Step 2 [Biml and BimlScript] – The Biml Engine loads all of the selected files into memory to start processing. Processing starts with Tier 0 (if available) before moving to Tier 1, Tier 2, etc. This continues until all of the tiers are processed in order. If you need a refresher on Tiers, Biml and BimlScript, see the previous Stairway to Biml article: Level 5 - Biml Language Elements.
During this step all of the files Biml code (or for that matter XML) are combined into a single file. The one file will be passed to the next step. Metadata can only be added during the process portion within the Biml Engine.
Note: There is a good programming trick, thanks to Marco Schreuder, (http://blog.in2bi.eu/biml/viewing-or-saving-the-compiled-biml-file-s/) that allows us to see the contents and structure of the Biml file prior to the compilation process.
1. Add an additional Biml file to the project.
2. Select the new Biml file and add the following code.
Figure 3 – Complied Biml Code Example.
3. Select all of the Biml files used to generate the solution along with the newly created file above and right click the files and select Check Biml for Errors or Generate SSIS Packages.
4. Check the location for the newly created Biml file.
Note: Both options will work because Check Biml for Errors will complete all processing and stop prior to the compile process.
Step 3 [Biml Only] – BimlScript has been run and removed before starting the compile process. At this point, the Biml file will describe all of the objects that will be complied into items within the integrations services project. Object like databases, schemas, and tables that are used in support of creating packages will not have items that are created within the project. The compile process starts and the one Biml file is compiled into one or more Integration Services objects.
Note: Biml that describes Analysis Services objects can be created but the complier within BIDS Helper will ignore and discard those objects after the compile process completes.
Step 4 [SSIS Package(s)] – Integration Services Packages are created and added to the Packages folder within the Visual Studio project.
In-Memory Metadata
In-Memory metadata is a type of metadata that is not persisted in any way. This metadata is read during the Biml generation process and discarded after the entire process completes. This was the only type of metadata that was used in all of the previous Stairway to Biml series.
In-Memory metadata is the easiest Biml metadata to setup and use. It can produce some great projects with very little development effort. Past this benefit, in-memory metadata does have some development limitations.
The first limitation is that all changes to in-memory metadata must occur within code in a programming language (BimlScript). As an example, what if the source column name – ProductID, does not match the destination column name – ProdID? A programmatic mapping step would be required to define the additional Biml to create the appropriate mappings for the above column during processing. As more and more conditions are needed, in-memory metadata becomes harder to work with.
Another in-memory metadata limitation is version control and package reproducibility. When metadata is read directly from a source or target database, that metadata is only available in the database structure at the time the generation process is run. Reproducing package(s) that were generated with a specific database’s metadata will require access to the exact same version of the database when the generation process was run. This means to version control all of the Biml code will also require a database backup or all of the scripts required to create a database in the same state it was in when the Biml code was run.
The above limitations make persisted metadata a better option for more detailed projects. A persisted metadata solution can start with basic metadata tables and extend to a more robust solution as additional metadata requirements are needed.
Persisted Metadata
Persisted metadata is a type of metadata that is stored in tables or files and is maintained outside of the Biml Generation Process. It can be modified and configured prior to the Biml Generation Process. We will discuss two of the options to store metadata: Table Persisted Metadata and File Persisted Metadata.
Excel metadata will be discussed in the file section, but with the Master Data Services add-in, an Excel file can be used to store data in a database. Thus, it falls into both Table Persisted and File Persisted metadata.
We will not discuss in detail the BimlStudio Metadata Model because it is not supported in BIDS Helper.
Think about the column mapping example above and how persisting the data into a table would make that development task much easier. If the data was loaded into a table or tables with the source and destination metadata along with the mapping information, all that would be needed was a call to read the data to create the additional Biml object.
Table Persisted Metadata
Metadata persisted within a database table is one of the best options to store data integration (ETL) mapping metadata to support a Biml project. Persisted metadata is stored, extended and modified prior to the Biml generation process. Storing metadata enables SQL data modification via T-SQL instead of writing C# in BimlScript to update the metadata during the Biml generation process. Persisted metadata has a minor limitation; it requires objects (most likely database tables) to be created to store the project metadata.
The set of tables that store the Biml project metadata can expand as you increase your Biml library. Start with a basic set of tables for your first project and expand as you need more metadata elements.
There are various ways to load data into your metadata tables. One method to load metadata is from within T-SQL scripts that access system tables like sys.tables and sys.columns. Using T-SQL to load metadata is a common task for most data professionals. This method can also be modified to use PL/SQL to get Oracle metadata using system tables like SYS.ALL_TABLES and SYS.COLUMNS. Another method to add metadata to tables consists of using Biml utility methods and BimlScript to load metadata into tables.
The recommend method to load Biml metadata is outside of the Biml generation process as depicted in the Figure 4 below. Creating scripts that load source metadata outside of the Biml generation process enables source specific code to stay outside of the Biml files. Changing a data source would only require new SQL scripts to load the metadata tables. This updated metadata would not require any code change within the Biml Process Dependent area because the metadata table structure did not change.
Figure 4 – Components of the Biml Metadata Pattern
When metadata table structures are generic within the generation process, the Biml code can be created to run against any source-to-destination mapping without any BimlScript modification. Using the Table Persisted metadata method creates a good framework that becomes much more robust and easily adapts to different data sources without modifying any BimlScript code.
Note: SQL Server extended properties could also be considered as an option for persisting metadata, but due to various limitations (such as the fact that they are not implemented consistently across all of the database vendors and most production systems will not allow updates to underling production objects), I will not discuss this option.
File Persisted Metadata
Another persisted metadata storage option is using Biml Tables and FileFormats collections. These objects can be stored in separate Biml files or within the same Biml file.
This enables a more modular approach and is similar to the behavior that is used in Mist / BimlStudio. These files can be saved in the miscellaneous folder along with the other Biml files and referenced from any other Biml file within the project using the RootNode function. For more information on RootNode, see the previous Stairway to Biml article: Level 5 - Biml Language Elements.
Biml Table and Flat File Format files can have the metadata extended with the use of Biml annotations. Annotations enable the tagging of Biml elements with additional metadata that can be used from BimlScript. In Figure 5 below, the Biml Table has two annotations: one that tags the table with a TableGroup of Finance and one that tags the column with a destination column (DestColumn) of DeptID.
<Biml xmlns="http://schemas.varigence.com/biml.xsd"> <Tables> <Table Name="Department" SchemaName="AdventureWorks2012.HumanResources"> <Annotations> <Annotation AnnotationType="Tag" Tag="TableGroup">Finance</Annotation> </Annotations> <Columns> <Column Name="DepartmentID" DataType="Int32"> <Annotations> <Annotation AnnotationType="Tag" Tag="DestColumn">DeptID</Annotation> </Annotations> </Column> <Column Name="Name" DataType="String" Length="50" /> <Column Name="GroupName" DataType="String" Length="50" /> <Column Name="ModifiedDate" DataType="DateTime" /> </Columns> <Keys> <PrimaryKey Name="PK_Department_DepartmentID"> <Columns> <Column ColumnName="DepartmentID" /> </Columns> </PrimaryKey> </Keys> </Table> </Tables> </Biml>
Figure 5 – Biml Table File with Annotation
Note: Biml annotations are not related to Integration Services annotations, and in fact Biml does not support Integration Services annotations.
By adding annotations to the Biml Table above, code can reference the tags and use the additional metadata to test for conditions within BimlScript. The GetTag() function is used to access Biml annotations. In Figure 6 below, the code sample shows the TableGroup tag is checked while looping over all of the table objects in the project (actually not all project files but the selected Biml files used while processing) and a package is created for all tables with the TableGroup tag of Finance.
<Biml xmlns="http://schemas.varigence.com/biml.xsd"> <Packages> <# foreach (var table in RootNode.Tables.Where(t => t.GetTag("TableGroup") == "Finance")) { #> <Package Name="Finance Group - <#= table.SsisSafeName #>"> … </Package> <# } #> </Packages> </Biml>
Figure 6 – BimlScript that References Biml Table File.
Biml Flat File format files can be used like table files to describe the metadata elements of flat files and set properties like length and data type requirements without relying on defaults.
<Biml xmlns="http://schemas.varigence.com/biml.xsd"> <FileFormats> <FlatFileFormat Name="MetadataFileFormat" ColumnNamesInFirstDataRow="true"> <Columns> <Column Name="Category" DataType="String" Length="128" Delimiter="Comma" /> <Column Name="DisplayName" DataType="String" Length="256" Delimiter="Comma" /> <Column Name="Maturity" DataType="String" Length="32" Delimiter="Comma" /> <Column Name="Frequency" DataType="String" Length="32" Delimiter="Comma" /> <Column Name="Status" DataType="String" Length="32" Delimiter="Comma" /> <Column Name="CompactName" DataType="String" Length="32" Delimiter="Comma" /> <Column Name="Url" DataType="String" Length="4000" Delimiter="Comma" /> <Column Name="HeaderRowsToSkip" Delimiter="Comma" /> <Column Name="DateGrain" DataType="String" Length="32" Delimiter="LF" /> </Columns> </FlatFileFormat> </FileFormats> </Biml>
Figure 7 – Biml Flat File Format File
Excel Persisted Metadata (File/Table)
Biml metadata can be stored in Excel and used with .NET within a BimlScipt to access the Excel file and read the metadata within it. The main benefits to this method are portability and the ability to store the file in the Integration Services project and easily save metadata with the project. This is the easiest method to integrate with source control because all of the required code and metadata can be stored in the same project.
Using Excel with the Master Data Services add-in, Excel is able to move data from an Excel workbook into the Master Data Services database and persist data just like the persisted tables described above. The benefit to this method is a user friendly interface that can be adapted to any metadata requirement and allow business users to help with the capture of metadata.
Conclusion
In this article we have taken a tour of the options that can be used to store and work with the metadata within your Biml projects. This might sound like an overwhelming place to start, but my first project using Biml was done with persisted metadata. The metadata tables I used were very basic and only stored a minimum set of attributes that I needed for the project. Start your first project in a similar way with metadata tables for basic attributes.
With all of the details outlined in the article, the next article will walk through a simple persisted metadata project that stores data into SQL Server tables that will be used within our integration services packages.