The article will provide an overview of Master Data Management and Master Data Services recently released with Microsoft SQL Server 2008 R2. In addition we will provide a sample SSIS Package for loading data into the MDS Repository Staging tables.
Microsoft recently released Master Data Services in SQL Server 2008 R2 CTP. Without getting into the strategic benefits , definitions etc. of MDM(Master Data Management) this tool will be very helpful for any application in automating the setup and maintenance of look up or cross reference tables , hierarchies, dimensions, or utilizing reference data.
What Is Master Data?
Master Data Management (MDM) comprises a set of processes and tools that consistently defines and manages the non-transactional data entities of an organization (which may include reference data). MDM has the objective of providing processes for collecting, aggregating, matching, consolidating, quality-assuring, persisting and distributing such data throughout an organization to ensure consistency and control in the ongoing maintenance and application use of this information.
Analyst have defined the requirements forsupporting MDM
- Identification
- Central System of Record
- Single Customer View
- Data Quality Compliance
Data Quality Compliance is further defined as:
- Profiling
- Generalized Cleansing
- Parsing & Standardization
- Matching
- Enrichment
- Monitoring
Identification:
Support unique identification of customer information across heterogeneous data sources through advanced matching and linking.
Central System of Record:
Deliver of a Central System of Record based on our data quality tools and integrating data sources, consolidating these sources into single customer view. Leverage a range of SQL Server data and application integration capabilities to integrate with a wide variety of data sources, including legacy data sources. Provide integration with different latency characteristics and styles (for example, real time and batch) and provide integration with downstream Microsoft business intelligence (BI) and analytical features.
Single Customer View:
Utilize Central System of Record to provide a single view or master reference file, and to provide data to existing sources resulting in accurate and consistent data across the enterprise.
Data Quality Compliance:
Support data quality compliance through monitoring and corrective-action techniques based on our data profiling and validation data quality tools.
Profiling: The First Line of Defense
Identify data quality issues that require immediate attention and avoid processing unacceptable source data sets.
Generalized Cleansing:
Cleansing and Correcting data values to meet business standards, relationship constraints or custom business rules.
Parsing and Standardization:
Address Correction, Validation & Standardization Flexible, efficient, and intelligent techniques to parse and restructure data into a common formats, and validate. For example, postal authority standards for address data or user-defined business values and patterns.
Matching:
Unique Identifiers, Attributes, Transactions & Deduplication Match and consolidate records. Uniquely identify an entity across different business functions, add, categorize and describe entity's relationship to other business entities. Identification, linking or merging related entries within or across datasets.
Enrichment:
Geographical coding , Name, Phone , Email etc . . . Identify and implement externally sourced data to increase or repair content of internal data for enhanced value (for example, consumer demographic attributes or geographic descriptors).
6. Monitoring:
Profile, Value, Pattern & Validation Utilize automated processes to monitor conformance of source data to data quality requirements and business rules defined by the organization.
MDS Structure - Overview
- Models Identifies model such as Customer, Product
- Entities Identifies association of attributes. Name . Address, Phone etc...
- Members are analogous to the records in a database table.
- Attributes are analogous to the columns within a table . Address Line 1, City State, Postal Code etc . . .
- Attribute Groups are explicitly defined collections of particular attributes.
- Hierarchies organize members into either Derived or Explicit hierarchical structures.
- Collections are customized subsets of members contained within hierarchies or other collections.
- Business Rules can be created and applied against model data to ensure that custom business logic is adhered to
- Subscription Views are views that can be created in order to provide an appropriately named view for external systems to subscribe t
- Versions provide system owners / administrators with the ability to Open, Lock or Commit a particular version of a model.
Setting up the Model and Creating a SSIS Load Package
Nick Barclay: BI-Lingual has provided an excellent post on getting started with this tool. In his post he provide a walkthrough on how to set up a sample Geography Model and the associated Entities, Attributes and Hierarchies. In addition he provides the TSQL to load them. I have expanded on his work and provided a basic SSIS Package for loading the Model. The model is created and maintaned thru the MDS web application installed with MDS:
MDS requires that you load data required for your Entities and their associated Member, Attributes and Hierarchies onto MDS Staging Tables.
Staging Tables
There are three staging tables.
- tblStgMember - Members staging table.
- tblStgMemberAttribute - Attribute assignment staging table.
- tblStgRelationship - Parent / Child relationship staging table..
The approach to loading the staging tables is as follows:
- Load leaf members into the City, StateProvince and CountryRegion entities
- Load related StateProvince attributes into the City entity
- Load related CountryRegion attributes into the StateProvince entity
- Load freeform Spanish and French country names into the CountryRegion entity.
From an SSIS perspective we have developed a straigh tforward package that will rely on the use of Multicast, Transfrom Derived Column Transfrom and Aggregrate.Transform.
- Use Multicast Transform to provide seperate streams of data for seperate Members, Attributes and Free From Attributes from the same source.
- Use Derived Column Transforms to set up the required metadata
- Use Aggregate Transform to generate distinct input for Members and Attributes
Model Entity Metadata Tagging
The key to loading the Members is to identify the the required Metadata. In Nicks blog he walks you through a series of queries to determine you Model ID , Entiry ID and associate table names from the internal MDS tables. Once your have the required Model metadata, we use a Derived Column transform with Expressions to set up the following metadata columns:
- ModelName
- EntityName
- MemberType_ID
- MemberName
- MemberCode
Obviously Nick has done all the heavy lifting in regards to setting up a model in MDS, I thought it would be helpful to have an SSIS Package to compliment his very informative post.
Last I intentionally kept this package basic, they are many opportunities for improvement and making the entire process data driven, which I will explore next.
Additional References
Kirk Haselden has provided a white paper on The What, Why, and How of Master Data Management and an additional reference for understanding MDM is David Loshin's book.
Additional SSIS MDM/SSIS Tools
Ira Warren Whiteside
Actuality Business Intelligence
Melissa Data SSIS Total Data Quality Toolkit
"karo yaa na karo, koshish jaisa kuch nahi hai"
"Do, or do not. There is no try."