SSIS Balanced Data Distributor Data Flow Component (Table Locking)

Question

SSIS Balanced Data Distributor Data Flow Component (Table Locking)

Letron Brantley

Ten Centuries

Points: 1202
More actions
January 12, 2016 at 12:58 pm

#303633

I have a question regarding the Microsoft Balanced Data Distributor.
http://blogs.msdn.com/b/sqlperf/archive/2011/05/25/the-balanced-data-distributor-for-ssis.aspx
Does anyone have any insight into how SQL Server handles target table locking if you use the BDD to split a data stream up into multiple segmented streams and load into the same table. The premise is that you should be able to use the BDD to make use of parallelism and potentially load data faster. I'm a little skeptical though.
For instance:
I want to pull 10 million rows from a source table so I create a package with a simple data flow. In the data flow I have
1. One source component to "select * from SourceTableA".
2. One BDD component to split the data into 10 distinct streams. So each stream should pipe 1million rows each.
3. Each stream will have a OLEDB destination component pointing to the same TargetTableB
The BDD is supposed to allow you to pipe multiple segments of data in parallel to speed the load. How then is locking handled on the target side. My conventional wisdom tells me that the first stream will acquire a table lock and the other 9 streams should be blocked until the first finished and so on. To me the benefit is defeated.
Am I missing some method in how the BDD is supposed to work?
Thanks,
Letron

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Phil Parkin SSC Guru Points: 246965 More actions · Answer 1

Letron Brantley (1/12/2016)
I have a question regarding the Microsoft Balanced Data Distributor.
http://blogs.msdn.com/b/sqlperf/archive/2011/05/25/the-balanced-data-distributor-for-ssis.aspx
Does anyone have any insight into how SQL Server handles target table locking if you use the BDD to split a data stream up into multiple segmented streams and load into the same table. The premise is that you should be able to use the BDD to make use of parallelism and potentially load data faster. I'm a little skeptical though.
For instance:
I want to pull 10 million rows from a source table so I create a package with a simple data flow. In the data flow I have
1. One source component to "select * from SourceTableA".
2. One BDD component to split the data into 10 distinct streams. So each stream should pipe 1million rows each.
3. Each stream will have a OLEDB destination component pointing to the same TargetTableB
The BDD is supposed to allow you to pipe multiple segments of data in parallel to speed the load. How then is locking handled on the target side. My conventional wisdom tells me that the first stream will acquire a table lock and the other 9 streams should be blocked until the first finished and so on. To me the benefit is defeated.
Am I missing some method in how the BDD is supposed to work?
Thanks,
Letron

I just read through the link you suggested. It suggests that the parallel loading of a table will work if that table is a heap. Otherwise it suggests a UNION ALL to unparallelify (sorry about that!) the data streams, prior to performing the INSERT.

The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 2

Letron Brantley (1/12/2016)
I have a question regarding the Microsoft Balanced Data Distributor.
http://blogs.msdn.com/b/sqlperf/archive/2011/05/25/the-balanced-data-distributor-for-ssis.aspx
Does anyone have any insight into how SQL Server handles target table locking if you use the BDD to split a data stream up into multiple segmented streams and load into the same table. The premise is that you should be able to use the BDD to make use of parallelism and potentially load data faster. I'm a little skeptical though.
For instance:
I want to pull 10 million rows from a source table so I create a package with a simple data flow. In the data flow I have
1. One source component to "select * from SourceTableA".
2. One BDD component to split the data into 10 distinct streams. So each stream should pipe 1million rows each.
3. Each stream will have a OLEDB destination component pointing to the same TargetTableB
The BDD is supposed to allow you to pipe multiple segments of data in parallel to speed the load. How then is locking handled on the target side. My conventional wisdom tells me that the first stream will acquire a table lock and the other 9 streams should be blocked until the first finished and so on. To me the benefit is defeated.
Am I missing some method in how the BDD is supposed to work?
Thanks,
Letron

Make it easier for others to read the blog: http://blogs.msdn.com/b/sqlperf/archive/2011/05/25/the-balanced-data-distributor-for-ssis.aspx

Letron Brantley Ten Centuries Points: 1202 More actions · Answer 3

Lynn Pettis (1/12/2016)
Make it easier for others to read the blog: http://blogs.msdn.com/b/sqlperf/archive/2011/05/25/the-balanced-data-distributor-for-ssis.aspx

Ahh Thanks Lynn!! I just hyperlinked it.

Letron Brantley Ten Centuries Points: 1202 More actions · Answer 4

Phil Parkin (1/12/2016)
I just read through the link you suggested. It suggests that the parallel loading of a table will work if that table is a heap. Otherwise it suggests a UNION ALL to unparallelify (sorry about that!) the data streams, prior to performing the INSERT.

Ok thanks Phil,

I'm still testing but so far I'm not seeing any performance improvement with heap tables. It's like the data is loaded one stream at a time and not in parallel. I'm doing more iterations to verify.

Thanks,

Letron

Phil Parkin SSC Guru Points: 246965 More actions · Answer 5

Letron Brantley (1/12/2016)
Phil Parkin (1/12/2016)
I just read through the link you suggested. It suggests that the parallel loading of a table will work if that table is a heap. Otherwise it suggests a UNION ALL to unparallelify (sorry about that!) the data streams, prior to performing the INSERT.
Ok thanks Phil,
I'm still testing but so far I'm not seeing any performance improvement with heap tables. It's like the data is loaded one stream at a time and not in parallel. I'm doing more iterations to verify.
Thanks,
Letron

That's interesting, please post back with your findings. If that's the case, it directly contradicts what the notes in the link say.

The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin

prvmine SSCertifiable Points: 7007 More actions · Answer 6

Suggest changing the destination connection advanced property for MultipleActiveResultSets to True