Collation used by SSIS MERGE JOIN Task

  • Can the collation used by SSIS be changed or influenced during install or run time? We have found that our databases, that use a mandatory "LATIN1_GENERAL_BIN", have incorrect SSIS Merge Join output. Changing our database collation in testing didn't make a difference. What matters is the data. Which Windows collation is SSIS using?

    Example Data:

    FIRSTNAME

    FIRSTNAME

    FIRSTS-A-NAME

    FIRSTS_A_NAME

    FIRST_NAME

    FIRST_NAME

    FIRSTname

    FIRSTname

    FIRS_NAME

    Our only solution right now is to put in a Sort task before the Merge Join task as setting advanced properties isn't enough (as described by Eric Johnson here --> http://sqlblog.com/blogs/eric_johnson/archive/2010/02/03/the-trouble-with-ssis-sorting.aspx)

    Can anyone please advise?

    We are using 64-bit SQL Server 2008 R2 w/ SP1 in Windows Server 2008 R2 ENT w/ SP1.

    UPDATE from ETL team: Explicitly ordering the source with "COLLATE Latin1_General_CS_AS" seems to have the same effect as using a separate sort task. We don't feel that we can rely on our findings, however, unless we have documentation that this collation is what is behind SSIS.

    Can someone please advise?

  • Hi,

    Thanks for the info. I had the same issue where MERGE JOIN transformation drops close to a million records if not used with a SORT TRANSFORMATION. But if I used the SORT Transformation all the records go through, but the performance drops too much as i am dealing with close to 9 and half million records.

    So after using the COLLATE, almost all the 1 million records went through except 25 records. There was no way of tracking those 25 from the MERGE JOIN as i had to use the INNER JOIN (OUTER JOIN's give millions of Duplicates).

    Please let me know if there is something else which can further resolve this issue.

    Thanks in advance

    Gowtham

  • SELECT

    [DLCRTYPE], DLCASEID,

    -- CONVERT(VARCHAR(20),DLCASEID) COLLATE Latin1_General_CI_AS DLCASEID,

    [DLCASEPTY],

    FROM dbo.DMDETAILP dp

    INNER JOIN dbo.CASEMASP cm

    ON cm.CMCRTYPE = dp.DLCRTYPE

    AND cm.CMCASEID = dp.DLCASEID

    AND cm.CMCASEPTY = dp.DLCASEPTY

    ORDER BY CONVERT(VARCHAR(20),DLCASEID) COLLATE Latin1_General_CI_AS;

    --ORDER BY DLCASEID COLLATE SQL_Latin1_General_CP1_CI_AS;

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply