This is part 3 of my 29 part series called Better Know A SSIS Transform. Hopefully you will find the series informative. I will tell you a little about each transform and follow it up with a demo basic you can do on your own.
The Conditional Split provides a way to evaluate incoming rows and separate those rows by an expression your design. After these rows are separated they are sent to different outputs so they can either be cleansed, loaded separately, or detect changing data (a good substitute for the slowly changing dimension). I will provide you some scenarios where you may have to use the conditional split for these reasons and how you would use it. There are of course other possible reasons you may use the Condition Split but these what I typically use it for.
Cleansing Data Example
The scenario is I have a package that loads Company A customers. The data that we receive is not always complete though. Often I will have a zip code for a customer but no city or state. Because this is a known issue the IT department has purchased a zip code extract that list all zip codes and their associated cites and states.
- Add Flat File Source pointing to incoming customer data
- Ensure all zip codes are standardized with a Derived Column Transform
- Use Conditional Split to separate data that does not have a city and state
- Send rows without city and state to Lookup Transform that will match zip codes and return missing city and states. If it doesn’t find a match send the output to a table so the rows can be corrected by hand.
- Use a Union All to combine original good data with corrected data from the Lookup Transform.
- Send to Destination Table
Conditional Split Configuration
- The condition is trimming any blank spaces in the columns and checking to see if the City and State columns are empty. If they are empty those rows are sent to a Bad Data output.
- All rows that don’t meet this condition are sent to the Default output Good Data.
- Another method could be to convert these blank spaces to null before the Conditional Split then just check for null in the Bad Data condition.
Load Data Separate Example
The scenario is I have a package that loads customer mailing lists. Company B sends out promotions and wants to separate those mailing list depending on a customers education level. Those with some college and high school or less education will more likely receive my promotion to attend a career college.
- Add a OLE DB Source to bring in data from my customer table
- Use a Conditional Split to separate customers by education level
- Connect outputs to Flat File Destinations to create mailing lists.
Conditional Split Configuration
- The Completed College output is checking the EnglishEducation column for either a string value of Bachelors or Graduate Degree
- The Some College output is checking the EnglishEducation column for a string value of Partial College
- All other rows are sent to the default output named High School Education or Less
Detecting Changing Data Example
This common scenario is using an alternative method to using the Slowly Changing Dimension. I have incoming records from Company C’s ecommerce system that need to be loaded to my data warehouse. Before these records get loaded I need to check to see if they are either new, updated or duplicate records.
- Add a OLE DB Source pointing to ecommerce database
- Use a Lookup Transform on the destination table joining by the table primary key and rename all Output columns Target_(column_name). Tell the transform to ignore failure when no matches are found. A better method is to use either Checksum a Hash byte column for comparison, but this is a good starting method. The Checksum or Hash byte method creates a unique identifying number for each row so instead of comparing each column of a row you can compare just one column to detect a change.
- Use Conditional Split to determine which records are new, updates or duplicates.
- Send New records to final destination table
- Send Updates to a staging table
- Use an Execute SQL task in the Control Flow to process the updated rows into the destination table. (This method is much faster than using OLE DB Command)
Conditional Split Configuration
- The New Record output is checking to see if the Target_(primary_key) is null. If it is null then we know it’s a new record.
- If the Target_(primary_key) is not null then the Update output will compare each column to the destination table to see if there are any differences so we know that it needs to be updated. Again the best method for doing this would be to use either Checksum or Hash byte to create a unique number that represents a row. Then just compare that one column instead of all columns.
- Anything that doesn’t meet these conditions are duplicates and we do not want to load. Just don’t connect the Duplicate output to anything and these rows will not be loaded.