Merge Join Question

  • Hi, I am in process of building a first date cube and I am bit puzzled about connection two dimension tables I created.

    I have two OLE DB Sources:

    1. customer data from transactional database

    2. DimGeography which contains all US zip code and its city.

    I am trying to populate DimCustomer table by

    1. Extract customer data from transaction db.

    2. Look up customer zip code and assign Geography key

    3. Load customer data with geography key into DimCustomer table.

    I figured if I use Merge Join, I would be able to easily join two data set and move them into DimCustomer table, but I am keep getting 'The IsSorted property must be set to True on both sources of this transformation' error and I am lost.

    Can anyone give me some advice?

    Thank you!

  • SSIS requires that you put your data through a sort transformation before using a MERGE JOIN - regardless of whether the source data is already sorted.

    Have you investigated the LOOKUP transform, by the way? It sounds like it may be more appropriate for your needs.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Thank you sir! Lookup worked.

  • Phil Parkin (5/28/2009)


    SSIS requires that you put your data through a sort transformation before using a MERGE JOIN - regardless of whether the source data is already sorted.

    Phil,

    this is not totally true. If you read the data from a database using a SQL statement inside a data source, SSIS does not know how data is sorted. But using the advanced editor of the Data Source, you can tell SSIS how your data will be sorted. In this case you can use your Data Source as a Merge Join input wihtout sorting it first (which,k by the way, is a very expensive operation and should be avoided if in any wa possible).

    Your second tipp is one of the (most often) cheaper alternarnatives, however...

    WM_JUSTMY2CENTS

    Guenter

  • Thank you Guenter - there's always something to learn with this product!

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • gtschech (6/3/2009)


    Phil Parkin (5/28/2009)


    SSIS requires that you put your data through a sort transformation before using a MERGE JOIN - regardless of whether the source data is already sorted.

    Phil,

    this is not totally true. If you read the data from a database using a SQL statement inside a data source, SSIS does not know how data is sorted. But using the advanced editor of the Data Source, you can tell SSIS how your data will be sorted. In this case you can use your Data Source as a Merge Join input wihtout sorting it first (which,k by the way, is a very expensive operation and should be avoided if in any wa possible).

    Your second tipp is one of the (most often) cheaper alternarnatives, however...

    WM_JUSTMY2CENTS

    Guenter

    Performance seems to improve drastically if you use a sorted data source and a join as opposed to a lookup. Especially if the lookup dataset is large.

  • matthijs.vogt (6/4/2009)


    gtschech (6/3/2009)


    Phil Parkin (5/28/2009)


    SSIS requires that you put your data through a sort transformation before using a MERGE JOIN - regardless of whether the source data is already sorted.

    Phil,

    this is not totally true. If you read the data from a database using a SQL statement inside a data source, SSIS does not know how data is sorted. But using the advanced editor of the Data Source, you can tell SSIS how your data will be sorted. In this case you can use your Data Source as a Merge Join input wihtout sorting it first (which,k by the way, is a very expensive operation and should be avoided if in any wa possible).

    Your second tipp is one of the (most often) cheaper alternarnatives, however...

    WM_JUSTMY2CENTS

    Guenter

    Performance seems to improve drastically if you use a sorted data source and a join as opposed to a lookup. Especially if the lookup dataset is large.

    Your're right; I forgot to mention the site restrictions for Lookups.

    Maybe this comes from my current project where I found that the programmers used the Sort/Join pattern everywhere even with secondary tables of 100 or less records, so I switching to the lookup pattern improved performance dramatically when the table that serves the lookup is small and fits into memory...

    Thanks for your added value!

    Guenter

  • To further clarify a point, the Merge Join may have to be directly connected to the data source when the data source 'IsSorted' property is set to True. If there is any other transform component in between then the Merge Join does not determine the data as sorted e.g. a Derived column transform in between data source and Merge Join component (I believe the column indicated to SSIS as sorted was used in the Derived Transform but haven't determined whether the same would be true if the 'sorted' column was not used in the DT column but just passed through).

    Paul R Williams.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply