increase performance for SSIS package with BULK INSERT

Question

increase performance for SSIS package with BULK INSERT

smantia2292

SSC Veteran

Points: 257
More actions
July 23, 2018 at 7:52 am

#358848

I have created a package for our company that takes all the data from a table in our ERP production environment (~18million rows, 168fields) and copies it into our staging environment so that when we apply all of our transformations on the data for our warehouse, we are not competing with our ERP environment. The package currently takes 45-50 minutes to move the data and I am trying to cut that time down as much as possible.
The server hosting the source database is running with:
CPU: Intel(R) Xeon(R) CPU (2.93GHz - 4 processors)
Windows Server 2012 R2 Datacenter
16GB of RAM
The destination:
CPU: AMD Opteron(TM) Processor 6234 (2.40GHz - 4 processors)
Windows Server 2016 Datacenter
16GB of RAM
I have already tried playing with the Rows per batch, Maximum insert commit size (our destination DB is in simple recovery so I don't think this should change much since we are not using the transaction log), and the AutoAdjustBufferSize property and nothing has changed really. I am thinking that the issue lies with our destination server capabilities since most of the time is coming from the write transactions.
Any tips would be much appreciated, thanks in advance!

Viewing 13 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

sgmunson SSC Guru Points: 110651 More actions · Answer 1

Just how much data is in 18 million rows of 168 columns ? What speed is the network path between the servers? How much RAM and CPU is on the source server?

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

Phil Parkin SSC Guru Points: 247221 More actions · Answer 2

Does your target table contain any indexes or keys (primary or foreign)?

smantia2292 SSC Veteran Points: 257 More actions · Answer 3

sgmunson - Monday, July 23, 2018 7:59 AM
Just how much data is in 18 million rows of 168 columns ? What speed is the network path between the servers? How much RAM and CPU is on the source server?

The properties for the table:
Data space: 27GB
Index space: 6.8GB

And I apologize, the above server configuration was for the source server (using MS SQL Server Standard(64-bit)
The destination server has the same RAM/# of processors but is running using Microsoft SQL Server Developer(64-bit) on Windows Server 2016 Datacenter(10.0)
The processor for the destination is AMD Opteron(TM) Processor 6234 (2.40GHz)

smantia2292 SSC Veteran Points: 257 More actions · Answer 4

Phil Parkin - Monday, July 23, 2018 8:10 AM
Does your target table contain any indexes or keys (primary or foreign)?

The destination table does not have any keys, our staging environment is using a truncate/reload method.
The source table contains a non-clustered primary key on the ROWID

Phil Parkin SSC Guru Points: 247221 More actions · Answer 5

So is your package as simple as just a
truncate target
followed by a simple source/target dataflow?

Or does it contain any other transformations?

sgmunson SSC Guru Points: 110651 More actions · Answer 6

smantia2292 - Monday, July 23, 2018 8:12 AM
sgmunson - Monday, July 23, 2018 7:59 AM
Just how much data is in 18 million rows of 168 columns ? What speed is the network path between the servers? How much RAM and CPU is on the source server?
The properties for the table:
Data space: 27GB
Index space: 6.8GB
And I apologize, the above server configuration was for the source server (using MS SQL Server Standard(64-bit)
The destination server has the same RAM/# of processors but is running using Microsoft SQL Server Developer(64-bit) on Windows Server 2016 Datacenter(10.0)
The processor for the destination is AMD Opteron(TM) Processor 6234 (2.40GHz)

Useful info, but what about the network path ? What is the slowest network speed at any point in between the two servers?

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

smantia2292 SSC Veteran Points: 257 More actions · Answer 7

Phil Parkin - Monday, July 23, 2018 8:28 AM
So is your package as simple as just a
truncate target
followed by a simple source/target dataflow?
Or does it contain any other transformations?

Yes that is pretty much it!

Jonathan AC Roberts SSCoach Points: 18408 More actions · Answer 8

Rows per batch should be set to the total number of rows on the table.
Set the table lock checkbox on the staging table
Set the maximum insert commit size to 2147483647
If you have any indexes on the staging table it would be best to drop them before performing this import operation then recreate them at the end.
For the Access mode use OpenRowset Using FastLoad.
Once you have got this data once you should then only import data that has changed on the table. This is possible if you store a datestamp column on the table which indicates when the row was last changed.
All that said it might not be possible to increase the speed a lot more if you are already at the limits of your hardware.

Phil Parkin SSC Guru Points: 247221 More actions · Answer 9

Jonathan AC Roberts - Monday, July 23, 2018 10:15 AM
Rows per batch should be set to the total number of rows on the table.

Regardless of the number of rows to be committed?
Can you justify this, please?

Jonathan AC Roberts SSCoach Points: 18408 More actions · Answer 10

Phil Parkin - Monday, July 23, 2018 10:33 AM
Jonathan AC Roberts - Monday, July 23, 2018 10:15 AM
Rows per batch should be set to the total number of rows on the table.
Regardless of the number of rows to be committed?
Can you justify this, please?

The entire table is being imported.

Phil Parkin SSC Guru Points: 247221 More actions · Answer 11

Jonathan AC Roberts - Tuesday, July 24, 2018 4:18 AM
Phil Parkin - Monday, July 23, 2018 10:33 AM
Jonathan AC Roberts - Monday, July 23, 2018 10:15 AM
Rows per batch should be set to the total number of rows on the table.
Regardless of the number of rows to be committed?
Can you justify this, please?
The entire table is being imported.

So do you think that all of those 18M rows will fit in 16GB of (shared) RAM? Unlikely, I would say. So they're going to spill to tempdb (I think that's what happens, anyway). Having a lower rows per batch (and corresponding max insert commit size) allows things to flow into the target table without the need for spills. But testing is required, as there are several other areas where bottlenecks can occur.

Jonathan AC Roberts SSCoach Points: 18408 More actions · Answer 12

Phil Parkin - Tuesday, July 24, 2018 4:59 AM
So do you think that all of those 18M rows will fit in 16GB of (shared) RAM? Unlikely, I would say. So they're going to spill to tempdb (I think that's what happens, anyway). Having a lower rows per batch (and corresponding max insert commit size) allows things to flow into the target table without the need for spills. But testing is required, as there are several other areas where bottlenecks can occur.

I think the rows are written using a data pump behind the scenes. If rows per batch are set to the number of rows on the table then it fails midway though then the entire transaction would be rolled back. However this can be an advantage as if it fails when batch size is set to less than the number of rows on the table then when the job is restarted it would import the same rows again so you would get duplicates unless you do a truncate first as SSIS checkpoint files will make it restart from the same control. It would use more memory, but as you say testing is required and maybe a lower value is needed.