January 12, 2016 at 12:58 pm
I have a question regarding the Microsoft Balanced Data Distributor.
http://blogs.msdn.com/b/sqlperf/archive/2011/05/25/the-balanced-data-distributor-for-ssis.aspx
Does anyone have any insight into how SQL Server handles target table locking if you use the BDD to split a data stream up into multiple segmented streams and load into the same table. The premise is that you should be able to use the BDD to make use of parallelism and potentially load data faster. I'm a little skeptical though.
For instance:
I want to pull 10 million rows from a source table so I create a package with a simple data flow. In the data flow I have
1. One source component to "select * from SourceTableA".
2. One BDD component to split the data into 10 distinct streams. So each stream should pipe 1million rows each.
3. Each stream will have a OLEDB destination component pointing to the same TargetTableB
The BDD is supposed to allow you to pipe multiple segments of data in parallel to speed the load. How then is locking handled on the target side. My conventional wisdom tells me that the first stream will acquire a table lock and the other 9 streams should be blocked until the first finished and so on. To me the benefit is defeated.
Am I missing some method in how the BDD is supposed to work?
Thanks,
Letron
January 12, 2016 at 1:13 pm
Letron Brantley (1/12/2016)
I have a question regarding the Microsoft Balanced Data Distributor.http://blogs.msdn.com/b/sqlperf/archive/2011/05/25/the-balanced-data-distributor-for-ssis.aspx
Does anyone have any insight into how SQL Server handles target table locking if you use the BDD to split a data stream up into multiple segmented streams and load into the same table. The premise is that you should be able to use the BDD to make use of parallelism and potentially load data faster. I'm a little skeptical though.
For instance:
I want to pull 10 million rows from a source table so I create a package with a simple data flow. In the data flow I have
1. One source component to "select * from SourceTableA".
2. One BDD component to split the data into 10 distinct streams. So each stream should pipe 1million rows each.
3. Each stream will have a OLEDB destination component pointing to the same TargetTableB
The BDD is supposed to allow you to pipe multiple segments of data in parallel to speed the load. How then is locking handled on the target side. My conventional wisdom tells me that the first stream will acquire a table lock and the other 9 streams should be blocked until the first finished and so on. To me the benefit is defeated.
Am I missing some method in how the BDD is supposed to work?
Thanks,
Letron
I just read through the link you suggested. It suggests that the parallel loading of a table will work if that table is a heap. Otherwise it suggests a UNION ALL to unparallelify (sorry about that!) the data streams, prior to performing the INSERT.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
January 12, 2016 at 2:00 pm
Letron Brantley (1/12/2016)
I have a question regarding the Microsoft Balanced Data Distributor.http://blogs.msdn.com/b/sqlperf/archive/2011/05/25/the-balanced-data-distributor-for-ssis.aspx
Does anyone have any insight into how SQL Server handles target table locking if you use the BDD to split a data stream up into multiple segmented streams and load into the same table. The premise is that you should be able to use the BDD to make use of parallelism and potentially load data faster. I'm a little skeptical though.
For instance:
I want to pull 10 million rows from a source table so I create a package with a simple data flow. In the data flow I have
1. One source component to "select * from SourceTableA".
2. One BDD component to split the data into 10 distinct streams. So each stream should pipe 1million rows each.
3. Each stream will have a OLEDB destination component pointing to the same TargetTableB
The BDD is supposed to allow you to pipe multiple segments of data in parallel to speed the load. How then is locking handled on the target side. My conventional wisdom tells me that the first stream will acquire a table lock and the other 9 streams should be blocked until the first finished and so on. To me the benefit is defeated.
Am I missing some method in how the BDD is supposed to work?
Thanks,
Letron
Make it easier for others to read the blog: http://blogs.msdn.com/b/sqlperf/archive/2011/05/25/the-balanced-data-distributor-for-ssis.aspx
January 12, 2016 at 2:17 pm
Lynn Pettis (1/12/2016)
Make it easier for others to read the blog: http://blogs.msdn.com/b/sqlperf/archive/2011/05/25/the-balanced-data-distributor-for-ssis.aspx
Ahh Thanks Lynn!! I just hyperlinked it.
January 12, 2016 at 2:20 pm
Phil Parkin (1/12/2016)
I just read through the link you suggested. It suggests that the parallel loading of a table will work if that table is a heap. Otherwise it suggests a UNION ALL to unparallelify (sorry about that!) the data streams, prior to performing the INSERT.
Ok thanks Phil,
I'm still testing but so far I'm not seeing any performance improvement with heap tables. It's like the data is loaded one stream at a time and not in parallel. I'm doing more iterations to verify.
Thanks,
Letron
January 12, 2016 at 2:25 pm
Letron Brantley (1/12/2016)
Phil Parkin (1/12/2016)
I just read through the link you suggested. It suggests that the parallel loading of a table will work if that table is a heap. Otherwise it suggests a UNION ALL to unparallelify (sorry about that!) the data streams, prior to performing the INSERT.
Ok thanks Phil,
I'm still testing but so far I'm not seeing any performance improvement with heap tables. It's like the data is loaded one stream at a time and not in parallel. I'm doing more iterations to verify.
Thanks,
Letron
That's interesting, please post back with your findings. If that's the case, it directly contradicts what the notes in the link say.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
January 13, 2016 at 3:53 am
Suggest changing the destination connection advanced property for MultipleActiveResultSets to True
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply