This is part 2 of my 29 part series called Better Know A SSIS Transform. Hopefully you will find the series informative. I will tell you a little about each transform and follow it up with a demo basic you can do on your own.
When developing SSIS packages you are bound find out that some transforms require your data to be sorted before it can accept new input rows (Ex. Merge & Merge Join). The Sort Transform may be one of the more frequently used transforms for that reason, but it really should be one of the most avoided.
Why Avoid the Sort Transform?
Hopefully after you read this section you will agree with me that the new title of this blog should be Better Know a SSIS Transform Better Not Know a SSIS Transform.
The Sort Transform is a fully blocking asynchronous transform, meaning no rows can pass this transform until all rows have been sorted. If you take a minute to think about it that makes sense. You cannot send rows further down the Data Flow until all the rows have been sorted, so the Sort Transform holds up the show.
The best way to visualize this is to think of a deck of cards. You have a Conditional Split that wants to split the face cards from the numbered cards. Before that can happen you have a Sort Transform that will sort all the cards by numbers and face value. None of the cards can reach the Conditional Split until all the cards have been sorted causing our package to stall at the Sort Transform until all the rows have been processed.
So we know now we want to avoid it when possible. The obvious way to avoid it is by using TSQL in your source, but what if we have a flat file source where you can’t write a TSQL ORDER BY. In this case we’re stuck using the Sort Transform.
How to Avoid
- When you are using a source that allows you to write TSQL then do it!
SELECT ProductID, TransactionDate, Quantity, ActualCost
FROM Production.TransactionHistory
ORDER BY ProductID - Now you need to tell SSIS that your data is sorted. Right-click on the source and click Show Advanced Editor
- Click the Input and Output Properties tab and select OLE DB Source Output
- Change the IsSorted property to True
- Expand the OLE DB Source Output then expand the Output Columns. Select the column(s) that are identified in the TSQL ORDER BY (in this case ProductID).
- Change the SortKeyPosition property to 1. The SortKeyPosition identifies the order in which columns are sorted. This must be set for each individual column in the ORDER BY. If we had multiple columns we were sorting then the the numbers would continue 1 for first column, 2 for second and 3 for third. If the columns are ordered in descending order then the SortKeyPosition numbers would continue –1, –2 and –3.
Strategies for Using
When you have no other option but to use the Sort Transform (Ex. a Flat File Source) then be sure to use it wisely. Think about the order of your Data Flow. If you already have an Aggregate Transform in the package that’s performing a Group By then make sure to use the Sort Transform is after the Aggregate Transform. This way instead of sorting 100,000 rows and then grouping them into 440 rows you can group 100,000 rows into 440 rows and only have to sort 440 rows. **The Aggregate Transform is also a asynchronous transform that should be avoided when possible**
Configuring the Sort Transform
These steps will walk you through a basic configuration of the Sort Transform.
Example Overview
- Add a Flat File Source pointing to any available flat file you have
- Add Data Viewer to see data Presorted
- Add a Sort Transform
- Add Data Viewer to see data after being sorted
- Use a Union All as a temporary destination
Step 1: Configure Flat File Source
- Use any flat file you already have as a source. Create the connection manager to it and use a Flat File Source in the Data Flow
Step 2: Add Data Viewer
- Bring a Sort Transform in the Data Flow and connect the Flat File Source to it.
- Right-click on the Data Flow Path between the Source and the Pivot Transform to open the Data Flow Path Editor.
- Select Data Viewers then click Add and OK to add a Grid Data Viewer. Click OK once more to return to the Data Flow
Step 3: Configure Sort Transform
- Open the Sort Transform Editor
- In the Available Input Columns table use the far left column and check the column(s) you need to sort by.
- Any columns you check will be added to the bottom where you can specify how they will be sorted.
- Input Column: This is the name of the input column that will be sorted
- Output Alias: This is the name the outputted column. So if you want the column to have a new name you can change it here.
- Sort Type: Specify either ascending or descending for how the column will be sorted
- Sort Order: Works like the SortKeyPosition does in the Advanced Editor. Specifies the order if multiple columns are sorted
- Comparison Flag: Allows special string functions like sort punctuation as symbols
- Notice another nice feature in the bottom left of the editor is a check box that will remove rows with duplicate sort values.
- Once the column(s) needed are checked and configured click OK
Step 4: Add Data Viewer and Destination
- Drag over a Union All (if you are just testing) or an actual destination and connect the Pivot Transform to it.
- Right-click on the Data Flow Path between the Source and the Pivot Transform to open the Data Flow Path Editor.
- Select Data Viewers then click Add and OK to add a Grid Data Viewer. Click OK once more to return to the Data Flow
- Run the package to see the data pre and post sorted from the Data Viewers