May 27, 2010 at 8:24 am
Many online articles and blog posts suggest sorting in the database and then setting the IsSorted property on the corresponding SSIS data source in order to satisfy certain SSIS components' requirement that the inputs must be sorted. I've found that in the case of the Merge Join component (and maybe other components as well), this doesn't always work as intended.
Here's an example you can build yourself...
- Locate (or create) a SQL Server database with the SQL_Latin1_General_CP1_CI_AS (case insensitive) collation
- Create a data flow with two OLEDB sources, pointing to these two queries. Note that query #2 is identical to query #1, with one extra row.
-- Query #1
SELECT 'ABb' AS Col1,
1 AS Query1RowNumber
UNION ALL
SELECT 'AbB',
2
ORDER BY Col1
-- Query #2
SELECT 'ABb' AS Col1,
1 AS Query2RowNumber
UNION ALL
SELECT 'AbB',
2
UNION ALL
SELECT 'Ab',
3
ORDER BY Col1
- Using the advanced editor, set the "IsSorted" property on each data source, and set the "SortOrder" to 1 for the Col1 column
- Add a merge join component, join type "full outer join". Make "Col1" the join key, and allow Query1RowNumber and Query2RowNumber to pass through.
- Add a destination of some kind, and put a data viewer on the output of the merge join
The expected result is this, right?
Query1RowNumber Query2RowNumber
--------------- ---------------
1 1
2 2
NULL 3
What you actually get is:
Query1RowNumber Query2RowNumber
--------------- ---------------
NULL 3
NULL 2
1 1
2 NULL
Row number 2 (Col1 = 'AbB') isn't matched in the full outer join, and it is returned as two "unique" rows!
My conclusion from this is that SSIS components which require sorted inputs actually require "SSIS-sorted" inputs, or they can return unexpected results!
One last thing to mention... This issue only manifests itself when sorting character data that sorts differently in SQL versus SSIS. If your key columns are non-alpha, or have no punctuation, mixed-case, accented, or special characters, you should be safe from this issue.
May 27, 2010 at 9:53 am
Is the datatype for Query1RowNumber & Query2RowNumber exactly the same. I'm not asking if the query is the same but the underlying datatype of that column. I know that SSIS tends to be very sensitive to such things such as '2' != '2 '..
CEWII
May 27, 2010 at 10:10 am
Yes, the datatypes are exactly the same. According to the Metadata viewer, both "Col1" columns are DT_STR, length 3, Code Page 1252. Both RowNumber columns are DT_I4.
March 24, 2011 at 3:43 pm
This can be solved by using collations:
http://microsoft-ssis.blogspot.com/2011/03/sorting-in-sql-vs-sorting-in-ssis.html
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply