Max size file in BCP process

Question

Max size file in BCP process

SATYA R

Right there with Babe

Points: 715
More actions
October 1, 2018 at 10:48 pm

#379678

Hi Experts,
We are processing huge data file's on daily process form file to SQL server using BCP command.
For Example : Daily we are planning to run 20 files on parallel mode from files to MSSQL(Using BCP command) , each file size should be 6 to 10GB .
Could you please leave your comments on below queries.
1. Does it impact any performance issue this is kind of setup ?
2. If 1 question is yes then what is the best approach ?
3. Is there any temp space need to be increase during this process in SQL server ?
4. Any other inputs base on your experience, in order to check any configuration or best practice ?
5. How to do the incremental concert in BCP ? Is there any parameter can we pass into command ? for ex : If record was exit into target then update else insert .
Thanks,
RR

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 1

Just a word of caution... I've found that parallel loads are frequently not worth it. Your target doesn't have more read/write heads than it does with a serial load. Because of the extra jumping around it would do, your parallel loads could (as they frequently do) actually be slower than a serial load.

Before you go nuts with the parallel stuff, you need to do the tests with serial loads so that, at the very least, you have a benchmark to know if your efforts on the parallel loads actually have merit.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

SATYA R Right there with Babe Points: 715 More actions · Answer 2

HI Jeff,

thanks for your response.
Can you please leave your comments on below points.

3. Is there any temp space need to be increase during this process in SQL server ?
4. Any other inputs base on your experience, in order to check any configuration or best practice ?
5. How to do the incremental concert in BCP ? Is there any parameter can we pass into command ? for ex : If record was exit into target then update else insert .

sgmunson SSC Guru Points: 110640 More actions · Answer 3

SATYA R - Tuesday, October 2, 2018 7:22 AM
HI Jeff,
thanks for your response.
Can you please leave your comments on below points.
3. Is there any temp space need to be increase during this process in SQL server ?
4. Any other inputs base on your experience, in order to check any configuration or best practice ?
5. How to do the incremental concert in BCP ? Is there any parameter can we pass into command ? for ex : If record was exit into target then update else insert .

I'm pretty sure you meant incremental insert, not concert. However, if you need to compare some 6 to 10 GB of data for any given file, I start asking how large is the table you are inserting that much data into? What kind of row sizes are you dealing with. And then doing a number of these in parallel? You might well overload your system rather quickly unless it's truly massive.. And exactly how big is your server? Your RAM requirements might be pretty darn high, and you might need a fairly massive set of disk drives to handle that kind of volume in parallel any faster than doing so serially. Many average SQL Servers could easily take so long with that much data that doing it in one day would be a challenge for just one file, never mind several... and thus I have to ask, just how many such files will you have on a daily basis? Why are they so large? And what of network bandwidth allowed that many large files to arrive in less than 24 hours to begin with? This kind of volume could easily overload most servers....

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

SATYA R Right there with Babe Points: 715 More actions · Answer 4

Thanks for your response.

I start asking how large is the table you are inserting that much data into?
We have 50 files on daily basis, Each file minimum size is 1 gb and max is 8 to 10 gb. Target systems tables are some how big and it will taken care by customer team . If it is crossed limit of table size then they have some backup plan that is not in our control.

What kind of row sizes are you dealing with?
8,060 bytes max row size we are planning to use. Please let me know your on inputs on this ?

And then doing a number of these in parallel?
Yes, We are planning to go for parallel mode because we have very limited window for this load process. We don't have much time.

I have to ask, just how many such files will you have on a daily basis?
Total 50 Files- min Size 1 gb and max size 8 to 10 gb . That is the reason we are planning to insert data on incremental base to avoid huge data ? Is there any command that can perform incremental in BCP ?

Why are they so large?
Because this is transnational data on daily basis. One day it might be high volume some times not so big.

And what of network bandwidth allowed that many large files to arrive in less than 24 hours to begin with? This kind of volume could easily overload most servers....
Can you please suggest me fist load will go far huge data at one time then next day on wards will go for incremental , Do we have any method in BCP ?

ZZartin SSC-Dedicated Points: 30920 More actions · Answer 5

SATYA R - Tuesday, October 2, 2018 8:48 AM
Thanks for your response.
I start asking how large is the table you are inserting that much data into?
We have 50 files on daily basis, Each file minimum size is 1 gb and max is 8 to 10 gb. Target systems tables are some how big and it will taken care by customer team . If it is crossed limit of table size then they have some backup plan that is not in our control.
What kind of row sizes are you dealing with?
8,060 bytes max row size we are planning to use. Please let me know your on inputs on this ?
And then doing a number of these in parallel?
Yes, We are planning to go for parallel mode because we have very limited window for this load process. We don't have much time.
I have to ask, just how many such files will you have on a daily basis?
Total 50 Files- min Size 1 gb and max size 8 to 10 gb . That is the reason we are planning to insert data on incremental base to avoid huge data ? Is there any command that can perform incremental in BCP ?
Why are they so large?
Because this is transnational data on daily basis. One day it might be high volume some times not so big.
And what of network bandwidth allowed that many large files to arrive in less than 24 hours to begin with? This kind of volume could easily overload most servers....
Can you please suggest me fist load will go far huge data at one time then next day on wards will go for incremental , Do we have any method in BCP ?

So in regards to incremental loads that would need to be controlled by the source system. BCP only does import(inserts) or extracts which is whatever data you tell it to select. So when you say incremental do you mean that the source system is planning to send you a full data set every day and you want to handle only importing the deltas on your end? If so BCP will not do that.

SATYA R Right there with Babe Points: 715 More actions · Answer 6

Ok got it. what is the maximum size file normally process thorough bcp ?
How many parallel process can except bcp by default ?

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 7

SATYA R - Tuesday, October 2, 2018 7:22 AM
HI Jeff,
thanks for your response.
Can you please leave your comments on below points.
3. Is there any temp space need to be increase during this process in SQL server ?
4. Any other inputs base on your experience, in order to check any configuration or best practice ?
5. How to do the incremental concert in BCP ? Is there any parameter can we pass into command ? for ex : If record was exit into target then update else insert .

3. Yes. It's usually NOT a good idea to import directly to the final table. A "staging table" should be used and that, of course, will require extra disk space.
4. Yes. Investigate "Minimal Logging" and Trace Flag 610. Also, do a test because the normal "Best Practice" (and I hate those words because they frequently are not) is to import the entire file at once using the largest batch size possible. Of course, "It Depends" on what your system can handle and how large the actual files are.
5. BCP is not capable of doing incremental inserts/updates (also known by the pet name of "Upserts"). That's one of the many reasons why you should always use a staging table for imports

To bring up another question, have you considered what you want done with rows that fail to import? Rollback the whole shebang or sequester the bad rows for additional processing/repair.

To be honest and from the questions that you're asking, you've not studied either BCP or BULK INSERT to the extent that you need to for this project. I strongly suggest you do a deep dive on those, "Minimal Logging", and Trace Flag 610 (note that Trace Flag 610 is NOT necessary in SQL Server 2016 but the explanations of what it does will prove invaluable to your efforts).

Even though it's nearly a decade old, the following guide is still totally appropriate except that TF 610 is no longer required for 2016.
https://docs.microsoft.com/en-us/previous-versions/sql/sql-server-2008/dd425070(v=sql.100)

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

sgmunson SSC Guru Points: 110640 More actions · Answer 8

SATYA R - Tuesday, October 2, 2018 8:48 AM
Thanks for your response.
I start asking how large is the table you are inserting that much data into?
We have 50 files on daily basis, Each file minimum size is 1 gb and max is 8 to 10 gb. Target systems tables are some how big and it will taken care by customer team . If it is crossed limit of table size then they have some backup plan that is not in our control.
What kind of row sizes are you dealing with?
8,060 bytes max row size we are planning to use. Please let me know your on inputs on this ?
And then doing a number of these in parallel?
Yes, We are planning to go for parallel mode because we have very limited window for this load process. We don't have much time.
I have to ask, just how many such files will you have on a daily basis?
Total 50 Files- min Size 1 gb and max size 8 to 10 gb . That is the reason we are planning to insert data on incremental base to avoid huge data ? Is there any command that can perform incremental in BCP ?
Why are they so large?
Because this is transnational data on daily basis. One day it might be high volume some times not so big.
And what of network bandwidth allowed that many large files to arrive in less than 24 hours to begin with? This kind of volume could easily overload most servers....
Can you please suggest me fist load will go far huge data at one time then next day on wards will go for incremental , Do we have any method in BCP ?

As Jeff has suggested, it's clear you don't know enough about what BCP does to justify designing it into your solution. If you have a file that's as much as 8 to 10 GB in size, and then up to 50 of those per day, I also start wondering if those are complete sets of data. If they are, then maybe you just need the staging table to import them and an error row table for any rows with bad data, then you truncate the original table and just load it from staging every night. That kind of work is probably best done with SSIS. You may need to bring someone in to help you with that, as the volume of data will require a fair amount of testing and tweaking the package for performance. Perhaps the biggest thing that might stand in your way is the disk subsystem. Without a rather large number of drives to spread out the I/O, you might have great difficulty doing this at all. The overall quantity of I/O here might well surpass what one server can handle if it did nothing else. I'd be very surprised if that much data could arrive at the server in less than some number of hours each day. If it takes that long just to get the data in, imagine what having to read ALL that data in SQL Server, plus that much again for the existing data, is going to take....

I'd be very highly motivated to only want to deal with this much data once into the staging tables, and then once into the production tables. Any more than that and you might easily run out of hours in the day. I'd not want to have to make a data comparison, if for no other reason than the extra time needed to add yet another full set of data to have to read. You might well be asking a server to do more I/O in 24 hours than it's capable of. But until you start testing a method of loading the staging tables with this data, you'll have no idea where you are with performance. And if your server isn't pretty much one of the beefiest it's possible to get your hands on, any limitations there will show themselves in a heck of a hurry...

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 9

To be honest, I wouldn't use BCP for imports. Bulk Insert is a little faster.

To be equally honest, I'm also not a big fan of SSIS for this type of thing. Most people end up writing a wad of T-SQL in their packages anyway. Why not take out the middle man entirely?

Also, could you post the CREATE TABLE statement for the target table? It would also be handy if you had the record layout for the files available.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ZZartin SSC-Dedicated Points: 30920 More actions · Answer 10

Jeff Moden - Wednesday, October 3, 2018 8:48 AM
To be honest, I wouldn't use BCP for imports. Bulk Insert is a little faster.
To be equally honest, I'm also not a big fan of SSIS for this type of thing. Most people end up writing a wad of T-SQL in their packages anyway. Why not take out the middle man entirely?
Also, could you post the CREATE TABLE statement for the target table? It would also be handy if you had the record layout for the files available.

Really if this is transactional data he needs to get this resolved on the source system side and only receive the deltas, otherwise his estimates of file size are simply going to continue increasing and anything manageable now likely won't be in the future.

SATYA R Right there with Babe Points: 715 More actions · Answer 11

@jeff : Bulk insert is much faster if file exit in Sql server. but our case file is remote location after reading lot of documents then i came to conclusion that will do the following setup.
1. I will split into files based on Size then will put int loop mode for each file until finish.
2. First time will load all data then going forward will take only incremental data from source and store into files so that files sizes might be less than actual size. I know incremental also will get some how big data but let see how it is working in performance stand point .

@sgmunson:- I know bcp will not work incremental logic but caution am checking excepts like may any other way to handle in bcp .
Staging table concept will not work because file is remote location , create staging table in target loading process has to be multiple, due limited window will not go for staging concept.

SSIS is good one for this kind of setup or not ? Is there any special install/setup require for this ?

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 12

SATYA R - Wednesday, October 3, 2018 9:24 AM
@jeff : Bulk insert is much faster if file exit in Sql server. but our case file is remote location after reading lot of documents then i came to conclusion that will do the following setup.
1. I will split into files based on Size then will put int loop mode for each file until finish.
2. First time will load all data then going forward will take only incremental data from source and store into files so that files sizes might be less than actual value .
@sgmunson:- I know bcp will not work incremental logic but caution am checking excepts like may any other way to handle in bcp .
Staging table concept will not work because file is remote location , create staging table in target loading process has to be multiple, due limited window will not go for staging concept.
SSIS is good one for this kind of setup or not ? Is there any special install/setup require for this ?

Ok... lemme guess... you haven't yet done a test run of a single unsplit file yet, have you? 😉

As for the remote stuff, I don't know why people think that BULK INSERT will be slower than BCP. It WILL require additional privs at the source to execute but it won't be slower.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

sgmunson SSC Guru Points: 110640 More actions · Answer 13

Jeff Moden - Wednesday, October 3, 2018 9:29 AM
SATYA R - Wednesday, October 3, 2018 9:24 AM
@jeff : Bulk insert is much faster if file exit in Sql server. but our case file is remote location after reading lot of documents then i came to conclusion that will do the following setup.
1. I will split into files based on Size then will put int loop mode for each file until finish.
2. First time will load all data then going forward will take only incremental data from source and store into files so that files sizes might be less than actual value .
@sgmunson:- I know bcp will not work incremental logic but caution am checking excepts like may any other way to handle in bcp .
Staging table concept will not work because file is remote location , create staging table in target loading process has to be multiple, due limited window will not go for staging concept.
SSIS is good one for this kind of setup or not ? Is there any special install/setup require for this ?
Ok... lemme guess... you haven't yet done a test run of a single unsplit file yet, have you? 😉
As for the remote stuff, I don't know why people think that BULK INSERT will be slower than BCP. It WILL require additional privs at the source to execute but it won't be slower.

And it appears they are going to try to use incremental logic. Ought to be interesting to see what exactly they plan to do with that at the source. Any bets they try to match up against the SQL Server to determine what the increments are ?

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

SATYA R Right there with Babe Points: 715 More actions · Answer 14

Hi Experts,

I have installed BCP utility in Linux server.
when am executing below command it throwing error, any input to resolve this issue.

/opt/mssql-tools/bin]$ ./bcp test.dbo.ACCOUNT in "u01/bcp/file/ACCOUNT.txt" -T -t, -c -U admin\\admin1-S 10.187.111.99,36686\\admin
SQLState = S1000, NativeError = 851968
Error = [unixODBC][Microsoft][ODBC Driver 17 for SQL Server]SSPI Provider: No Kerberos credentials available (default cache: KEYRING:persistent:23179318)
SQLState = S1000, NativeError = 851968
Error = [unixODBC][Microsoft][ODBC Driver 17 for SQL Server]Cannot generate SSPI context

Thanks,
RR