October 25, 2018 at 10:24 am
Hi,
I am creating data migration scenario for SQL 2017
I need to load 160 GB from ~250 tables Server A database 1 to Server A database 2
I build ~ 6 SSIS , each SSIS load one or more schema (number of tables and each SSIS in as low as 17 and as much as 60)
Using SSDT 2017, all SSIS executed in Debug move from SSDT on Server A
Originally every SSIS build and Saved using "Import Data" wizard from from SSMS 2017
Each SSIS contain sequential control flows Preparation SQL Tasks (Truncate tables) and Data Flow Tasks (each Data flow has 5 source->destination transformations)
On each Data flow Task I set AutoAdjust BufferSize =True and and for every table with Row count > 100000 I set Maximum insert commit size = 100000
Test Scenario 1
SSIS1 execution time 2 min
SSIS2 execution time 3 min
SSIS3 execution time 20 min (Some Data flow sources have tables with 25M row, total number of row inserted by this SSIS ~ =120M)
Test Scenario 2
Execution of each SSIS as "Execute Package Task")
Execute Package Task 1 (SSIS1 )-- execution time 2 minutes
Execute Package Task 1 (SSIS2 )-- execution time 3 minutes
Execute Package Task 1 (SSIS3 )-- execution time 13 Hours!!!
I can see in in debugger of SSDT that data transformation tasks in SSIS3 becoming extremely slow when executed in step as Execute Package Task
Server has 24 GB of RAM and 4 CPU, database 1 and database 2 located on 1TB SSD drive
Since it should be "one time" migration operation I was plan automate it by running all SSIS as set of "Execute Package Tasks"
What might cause slow execution of in Data transformation when SSIS executed as part of "Execute Package Tasks" ?
October 25, 2018 at 10:55 am
ebooklub - Thursday, October 25, 2018 10:24 AMHi,
I am creating data migration scenario for SQL 2017
I need to load 160 GB from ~250 tables Server A database 1 to Server A database 2
I build ~ 6 SSIS , each SSIS load one or more schema (number of tables and each SSIS in as low as 17 and as much as 60)
Using SSDT 2017, all SSIS executed in Debug move from SSDT on Server A
Originally every SSIS build and Saved using "Import Data" wizard from from SSMS 2017
Each SSIS contain sequential control flows Preparation SQL Tasks (Truncate tables) and Data Flow Tasks (each Data flow has 5 source->destination transformations)
On each Data flow Task I set AutoAdjust BufferSize =True and and for every table with Row count > 100000 I set Maximum insert commit size = 100000Test Scenario 1
SSIS1 execution time 2 min
SSIS2 execution time 3 min
SSIS3 execution time 20 min (Some Data flow sources have tables with 25M row, total number of row inserted by this SSIS ~ =120M)Test Scenario 2
Execution of each SSIS as "Execute Package Task")Execute Package Task 1 (SSIS1 )-- execution time 2 minutes
Execute Package Task 1 (SSIS2 )-- execution time 3 minutes
Execute Package Task 1 (SSIS3 )-- execution time 13 Hours!!!
I can see in in debugger of SSDT that data transformation tasks in SSIS3 becoming extremely slow when executed in step as Execute Package TaskServer has 24 GB of RAM and 4 CPU, database 1 and database 2 located on 1TB SSD drive
Since it should be "one time" migration operation I was plan automate it by running all SSIS as set of "Execute Package Tasks"What might cause slow execution of in Data transformation when SSIS executed as part of "Execute Package Tasks" ?
Frankly, I fail to see why that task ever gets used for anything but totally trivial package executions. Much better to use an Agent Job and job steps to control package execution flow.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
October 25, 2018 at 1:26 pm
it is non domain local server I don't want to go over security setting for importing ssis and saving passwords to run as sql server agent jobs
goal is to load data over weekend time frame and monitor at any point of time loading process using "user friendly GUI" of SSDT
October 26, 2018 at 6:37 am
ebooklub - Thursday, October 25, 2018 1:26 PMit is non domain local server I don't want to go over security setting for importing ssis and saving passwords to run as sql server agent jobsgoal is to load data over weekend time frame and monitor at any point of time loading process using "user friendly GUI" of SSDT
If security is a concern, then why have a non-domain server involved in moving data either from or to a domain server? You can always use a database to store passwords because SSIS lets you do that. But if all you are ever going to do is manually run the package within Visual Studio, and locally from this non-domain server, that's in the category of 100% dependent on a human being to accomplish. Bit of a dangerous dependency if you are the only one who would know how to run it.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
October 26, 2018 at 8:25 am
Full scenario 🙂
production
Server X (hosting db A 400 GB ) and Server Y (hosting db B 400 GB ) located 2000 miles from our office
Project : I need add to db A 250 tables (160 GB) from db B move it to Azure managed instance ,synchronize with prod with minimum downtime and make it new prod 🙂
Currently I am copying compressed backup from Server X and Y to local non domain server Z (take 4-5 hrs)
Restoring it on Z ,adding new schema to db A and running import of 250 tables, once import completed I will compress backup and move it it to blob storage in Azure and restore it
Late synchronization of data will be done with Red Gate SQL data compare
Initially I created all SSIS on my local PC and and run it from SSDT one by one with DEFAULT setting total processing time for all packages was about 12 hrs
To minimize processing time I installed SSDT on non domain server X and copied project there assuming it will it will run faster especially if modify default setting by changing ssis buffer pool and batch par row and commit size...
So far I found following :
1. SSIS calling other SSIS as 6 sequential steps run much slower then each SSIS executed individually.
2. IF SSIS has 10 Data flow tasks (each data flow has 5 tables) it runs much slower then execution of each data flow individually.
Size of temp db is 4 GB , size of log file (simple recovery) in destination db on Server Z is 100 GB
Questions
Why total time of running all individual steps (Data flow tasks) is SSIS is much less then running then sequentially?
During SSIS execution I see that CPU usage 30% and 9 GB from 24 GB memory is in use
October 26, 2018 at 12:18 pm
With that much extra capacity hanging around, I'd be redesigning the packages to have multiple data flow tasks, and considerably more than 5 tables in each one, and making sure each one has max buffer available and considerable number of threads as well ... maybe somewhere in the 30 - 40 range?
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
October 26, 2018 at 2:36 pm
I used The AutoAdjustBufferSize Property based on article (https://www.mssqltips.com/sqlservertip/4221/improving-data-flow-performance-with-ssis-autoadjustbuffersize-property/)
my current "Engine Threads" set to 10 (for 5 tables), I will try to set them to higher value
What I also noted
table with 200000 rows and 10 XML type columns never finish loading with default setting
October 28, 2018 at 9:41 am
If this is a one-time process then why worry about 'automating' it as a parent/child package setup? Put everything in a single package...
As for performance - have you insured that every OLEDB Destination has been configured with an appropriate batch and commit size, or did you leave that as default? If left as a default then all rows must be loaded before they will be committed and that will cause large delays because of the amount of data to be committed (as well as bloating the transaction log).
You also need to insure that the destination has the resources to support the multiple processes. If you are running on a server that has 24 CPUs - and the destination has less resources then the destination is going to limit the processing even when you have multiple threads. Make sure you do not over commit against the destination as that could cause extreme performance issues on the destination.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
November 6, 2018 at 7:49 am
Intermediate test result
250 tables (160 GB) ~ 800,000,000 rows
Setting AutoAdjustBufferSize = True for all Data Flow tasks
Batch size custom ,based on size of table and data type
Commit size left to default
Example table with 400000 row and 15 columns of xml type
1.Batch size 10000 rows loading time 4 hrs
2.Batch size 500 rows loading time 3 minutes
Created main SSIS calling sequentially 9 other SSIS loading time 4 hrs
Calling all 9 SSIS in parallel loading time 3 1/2 hrs
any setting changes on tables containing 40M plus rows (batch size, commit size,buffer pool size) did not speed up particular Data Flow transformation
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply