SSIS Balanced Data Distributor Data Flow Component (Table Locking)

  • I have a question regarding the Microsoft Balanced Data Distributor.

    http://blogs.msdn.com/b/sqlperf/archive/2011/05/25/the-balanced-data-distributor-for-ssis.aspx

    Does anyone have any insight into how SQL Server handles target table locking if you use the BDD to split a data stream up into multiple segmented streams and load into the same table. The premise is that you should be able to use the BDD to make use of parallelism and potentially load data faster. I'm a little skeptical though.

    For instance:

    I want to pull 10 million rows from a source table so I create a package with a simple data flow. In the data flow I have

    1. One source component to "select * from SourceTableA".

    2. One BDD component to split the data into 10 distinct streams. So each stream should pipe 1million rows each.

    3. Each stream will have a OLEDB destination component pointing to the same TargetTableB

    The BDD is supposed to allow you to pipe multiple segments of data in parallel to speed the load. How then is locking handled on the target side. My conventional wisdom tells me that the first stream will acquire a table lock and the other 9 streams should be blocked until the first finished and so on. To me the benefit is defeated.

    Am I missing some method in how the BDD is supposed to work?

    Thanks,

    Letron

  • Letron Brantley (1/12/2016)


    I have a question regarding the Microsoft Balanced Data Distributor.

    http://blogs.msdn.com/b/sqlperf/archive/2011/05/25/the-balanced-data-distributor-for-ssis.aspx

    Does anyone have any insight into how SQL Server handles target table locking if you use the BDD to split a data stream up into multiple segmented streams and load into the same table. The premise is that you should be able to use the BDD to make use of parallelism and potentially load data faster. I'm a little skeptical though.

    For instance:

    I want to pull 10 million rows from a source table so I create a package with a simple data flow. In the data flow I have

    1. One source component to "select * from SourceTableA".

    2. One BDD component to split the data into 10 distinct streams. So each stream should pipe 1million rows each.

    3. Each stream will have a OLEDB destination component pointing to the same TargetTableB

    The BDD is supposed to allow you to pipe multiple segments of data in parallel to speed the load. How then is locking handled on the target side. My conventional wisdom tells me that the first stream will acquire a table lock and the other 9 streams should be blocked until the first finished and so on. To me the benefit is defeated.

    Am I missing some method in how the BDD is supposed to work?

    Thanks,

    Letron

    I just read through the link you suggested. It suggests that the parallel loading of a table will work if that table is a heap. Otherwise it suggests a UNION ALL to unparallelify (sorry about that!) the data streams, prior to performing the INSERT.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Letron Brantley (1/12/2016)


    I have a question regarding the Microsoft Balanced Data Distributor.

    http://blogs.msdn.com/b/sqlperf/archive/2011/05/25/the-balanced-data-distributor-for-ssis.aspx

    Does anyone have any insight into how SQL Server handles target table locking if you use the BDD to split a data stream up into multiple segmented streams and load into the same table. The premise is that you should be able to use the BDD to make use of parallelism and potentially load data faster. I'm a little skeptical though.

    For instance:

    I want to pull 10 million rows from a source table so I create a package with a simple data flow. In the data flow I have

    1. One source component to "select * from SourceTableA".

    2. One BDD component to split the data into 10 distinct streams. So each stream should pipe 1million rows each.

    3. Each stream will have a OLEDB destination component pointing to the same TargetTableB

    The BDD is supposed to allow you to pipe multiple segments of data in parallel to speed the load. How then is locking handled on the target side. My conventional wisdom tells me that the first stream will acquire a table lock and the other 9 streams should be blocked until the first finished and so on. To me the benefit is defeated.

    Am I missing some method in how the BDD is supposed to work?

    Thanks,

    Letron

    Make it easier for others to read the blog: http://blogs.msdn.com/b/sqlperf/archive/2011/05/25/the-balanced-data-distributor-for-ssis.aspx

  • Lynn Pettis (1/12/2016)


    Make it easier for others to read the blog: http://blogs.msdn.com/b/sqlperf/archive/2011/05/25/the-balanced-data-distributor-for-ssis.aspx

    Ahh Thanks Lynn!! I just hyperlinked it.

  • Phil Parkin (1/12/2016)


    I just read through the link you suggested. It suggests that the parallel loading of a table will work if that table is a heap. Otherwise it suggests a UNION ALL to unparallelify (sorry about that!) the data streams, prior to performing the INSERT.

    Ok thanks Phil,

    I'm still testing but so far I'm not seeing any performance improvement with heap tables. It's like the data is loaded one stream at a time and not in parallel. I'm doing more iterations to verify.

    Thanks,

    Letron

  • Letron Brantley (1/12/2016)


    Phil Parkin (1/12/2016)


    I just read through the link you suggested. It suggests that the parallel loading of a table will work if that table is a heap. Otherwise it suggests a UNION ALL to unparallelify (sorry about that!) the data streams, prior to performing the INSERT.

    Ok thanks Phil,

    I'm still testing but so far I'm not seeing any performance improvement with heap tables. It's like the data is loaded one stream at a time and not in parallel. I'm doing more iterations to verify.

    Thanks,

    Letron

    That's interesting, please post back with your findings. If that's the case, it directly contradicts what the notes in the link say.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Suggest changing the destination connection advanced property for MultipleActiveResultSets to True

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply