BCP out slow first time, fast after. Help!

Question

Post reply

BCP out slow first time, fast after. Help!

Eugene Bereznuk

SSC Veteran

Points: 270
More actions
March 20, 2012 at 1:29 am

#258923

Dear colleagues.
I came across a very strange phenomenon in SQL Server 2008 R2, while performing BCP out.
In a few words: 2008 R2, clustered installation on Windows 2008, 64GB RAM on node, fast SAN shared disk array.
In a database there is a table with 5.5 million records, containing "text" field. I have to BCP this table "out" and then "in" to another table (which, normally, is faster than INSERT..SELECT..). Database is on Simple recovery mode, no data compression, row data, indexes and textimage all are on separate drives, there is no fulltext indexing. Source tables all have proper clustered indexes which were rebuild prior to BCP "out".
I create a format file using "bcp dbSource.dbo.TabSource format nul -x -f W:\BCP_Files\TabSource_Format.xml -n -S ServerName -U userlogin -P userpassword" (yes, I do not use trusted connection here. I tried and its irrelevant)
"W" is a disk local to SQL Server, so there is no "over the network" transfer.
Format file is created properly. Then I execute the following:
bcp dbSource.dbo.TabSource out W:\BCP_Files\TabSource.bcp -n -S ServerName -U userlogin -P userpassword -h "TABLOCK" -C RAW -E
Once again, the tablock hint, Raw conversion option and "-E" keep identity option are absolutely irrelevant here. I can omit them, the result will be the same.
Now, the problem in question.
When I run BCP "out" the first time after the database was restored, it takes incredibly long time - more than 90 minutes! It does not matter how I run it - via a job using xp_cmdshell or cmd execute, from within Management Studio on remote machine (SQL Server itself), from within Management Studio or Powershell, or command prompt on my own machine. Result is always the same.
However, if I run BCP "out" for the second time on the same table, result is ~80000 rows per second to local disk. Very good result, that's how BCP should behave.
The same is true for every big table in every database. For the first time BCP "out" is extremely slow. For the second time - very fast (normal, in fact).
Could the community please help?

Viewing 13 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Ian Scarlett SSC-Insane Points: 23197 More actions · Answer 1

I would guess that caching is the answer.

The first time you run BCP, all the rows have to be read into memory from disk.

The second time you run BCP, all the rows are already in memory.

You could try verifying this by clearing the cache (DBCC DROPCLEANBUFFERS) between the first and second run. You should then see that the second run is just as slow.

Eugene Bereznuk SSC Veteran Points: 270 More actions · Answer 2

Seems like you are right, Ian. After cleaning the cache, the speed is low again.

However, this is weird. By default, BCP is supposed to be fast even if the source table is not loaded to cache, isn't it? It is positioned as "the fastest" way to transfer data...

My problem is that I have a very narrow timeframe between the moment the database is restored on the new server, till the moment its data resides in new tables...

What can be done here? How can I speed BCP "out" up? Any ideas, suggestions, theories?

Added later ________________________________________

Not exactly. This time BCP ran for 11 minutes only, instead of 90, with ~6500 rows per second. So, it seems that Ian's theory is only partially correct...

Ian Scarlett SSC-Insane Points: 23197 More actions · Answer 3

It doesn't matter what method you use to get the data out (and BCP is about as fast as it gets), they will all have to read the rows into memory at some point.

Are you BCPing out to the same physical disk that the database is on? If so, putting the BCP file on a different disk should give you some improvement.

Is there anything else that's hitting the database disk at the same time that could be rescheduled?

If there is a delay between the restore of the database and running BCP, you could try getting the data into cache before BCP runs (by doing a simple SELECT cols FROM table).

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 4

Eugene Bereznuk (3/20/2012)
I have to BCP this table "out" and then "in" to another table (which, normally, is faster than INSERT..SELECT..).

If the target table can be dropped before being populate and depending on the indexes required, SELECT/INTO will work at least twice as fast as exporting and then importing the data.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Eugene Bereznuk SSC Veteran Points: 270 More actions · Answer 5

Target table is created before populating. Of course, it's a heap without any indexes at all - they are created only after populating is finished.

Eugene Elutin SSC Guru Points: 59322 More actions · Answer 6

As Jeff suggested, SELECT ... INTO, will outperform BCP in this case, looks like nothing should stop you from using it. You will not need to create you target table before running SELECT ... INTO, opposite to it - you will need to drop it if it does exists.

_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help[/url]

Eugene Bereznuk SSC Veteran Points: 270 More actions · Answer 7

I cannot use SELECT..INTO, because the target table has to be created on specific filegroups, including one for textimage. I create a table beforehand.

Eugene Elutin SSC Guru Points: 59322 More actions · Answer 8

Ah, that's nightmare then, SQLServer still not allows to specify filegroup in SELECT..INTO. May be it gets fixed in SQL2012?

Have you tried to use SSIS for comparison? It may be faster than BCP in your case, as it's uses better driver.

Also, try to execute CHECKPOINT before your first run.

_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help[/url]

Oliiii SSCertifiable Points: 5328 More actions · Answer 9

Eugene (B),

Try with INSERT xxx WITH (TABLOCK) ... SELECT.

If your destination table is a heap then it should always be minimally logged (on SQL 2008 and +) and you should see the same performance as SELECT INTO but this time the table is already created the way you want.

Check the used log size during the INSERT to see if you are minimally logged or not.

If you want to speed things up even more you should just get rid of the TEXT field, it's horrible performance wise and it has a few bug (when you delete rows). You should be able to replace it with varchar(max). Just pay attention to the size of the table since varchar(max) will keep it's data in row if it's less than 8000 by default (can be changed with a sql command though).

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 10

Oliiii (3/20/2012)
Eugene (B),
Try with INSERT xxx WITH (TABLOCK) ... SELECT.
If your destination table is a heap then it should always be minimally logged (on SQL 2008 and +) and you should see the same performance as SELECT INTO but this time the table is already created the way you want.

Have you a link for that information?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Eugene Bereznuk SSC Veteran Points: 270 More actions · Answer 11

Thanks everyone for your help.

Of course I have been using WITH (TABLOCK) on both target and source tables.

As for converting text to something else - I need nvarchar(MAX) to allow special characters. I have already tried it, naturally. The conversion ALTER TABLE...ALTER COLUMN took approximately as long as INSERT..SELECT, so there is no gain.

Anyway, thanks again.

Oliiii SSCertifiable Points: 5328 More actions · Answer 12

Jeff Moden (3/20/2012)
Oliiii (3/20/2012)
Eugene (B),
Try with INSERT xxx WITH (TABLOCK) ... SELECT.
If your destination table is a heap then it should always be minimally logged (on SQL 2008 and +) and you should see the same performance as SELECT INTO but this time the table is already created the way you want.
Have you a link for that information?

Yep, everything is explained in here: http://msdn.microsoft.com/en-us/library/dd425070%28v=sql.100%29.aspx (Including when to use TF610).

There is a pretty neat table in the middle "Summarizing Minimal Logging Conditions". Too bad this kind of info is buried in a document and not widely available.