Problem with Merge Join

  • I have a Merge Join component in a Data Flow task which updates / inserts rows into different table. Also the input to the merge join comes from a select query on the same table. This locks the table and package just hangs. Is there any method by which I can avoid this?

  • A MERGE JOIN cannot update anything, so I will assume you have two source components (probably OLEDB), two SORT components, a MERGE JOIN, a CONDITIONAL SPLIT, an OLEDB Update, and an OLEDB Destination component.

    The problem is you are using the MERGE JOIN to look up records to see if they are already in the destination table so you can make the choice of inserting a new record or updating the existing record. Did I get that correct? Details are useful.

    If this is all the case, you have a couple of choices. You could use a read uncommitted isolation level or a nolock query hint on the OLEDB source that is getting the data from the table that is being locked. This is the "wrong" way, but it will usually work. The reason it would work is the SORT components are synchronous, so you would have had to have read all of the data before it gets inserted into or updated. This is VERY IMPORTANT if you go this route as reading uncommitted data in your case would be very bad.

    The other option is to use the same data connection for this OLEDB source and the update and destination components. Then change the "RetainSameConnection" property on this connection manager to true. This will make the package only create one connection using this connection manager and re-use it. So, your share lock from the SELECT statement will be from the same spid as the exclusive locks for the insert and update - so they will not be conflicting. You may have to not use the "FastLoad" options on the OLEDB destination, but it will depend on details I don't have.

  • Thanks for the reply.

    Just curious to know if there is any other method to filter out records that need to be Updated / Inserted other than using Merge Join

  • Yes.

    There is a Slowly Changing Dimension transformation that is designed specifically to insert or update records based on a key. It is ok for tables less than 1000 records.

    You could use a lookup component similar to what you are doing with the merge join. In the case of a lookup component, you would use the error data flow for records that need to be inserted (it did not find them in the lookup). This method is faster than the SCD wizard, but slower than a merge join. It is usually ok for up to about 2500 records.

    The merge join method you are using is ok for about 10,000 records. If you go to http://www.sqlbi.com you will find a free TableDifference component that is basically the Merge Join and the Conditional Split in a single component (instructions on the site).

    Finally, staging all of the records and making the update and insert through T-SQL is pretty much the only viable method if the table has more than 10,000 records in it.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply