March 20, 2012 at 1:29 am
Dear colleagues.
I came across a very strange phenomenon in SQL Server 2008 R2, while performing BCP out.
In a few words: 2008 R2, clustered installation on Windows 2008, 64GB RAM on node, fast SAN shared disk array.
In a database there is a table with 5.5 million records, containing "text" field. I have to BCP this table "out" and then "in" to another table (which, normally, is faster than INSERT..SELECT..). Database is on Simple recovery mode, no data compression, row data, indexes and textimage all are on separate drives, there is no fulltext indexing. Source tables all have proper clustered indexes which were rebuild prior to BCP "out".
I create a format file using "bcp dbSource.dbo.TabSource format nul -x -f W:\BCP_Files\TabSource_Format.xml -n -S ServerName -U userlogin -P userpassword" (yes, I do not use trusted connection here. I tried and its irrelevant)
"W" is a disk local to SQL Server, so there is no "over the network" transfer.
Format file is created properly. Then I execute the following:
bcp dbSource.dbo.TabSource out W:\BCP_Files\TabSource.bcp -n -S ServerName -U userlogin -P userpassword -h "TABLOCK" -C RAW -E
Once again, the tablock hint, Raw conversion option and "-E" keep identity option are absolutely irrelevant here. I can omit them, the result will be the same.
Now, the problem in question.
When I run BCP "out" the first time after the database was restored, it takes incredibly long time - more than 90 minutes! It does not matter how I run it - via a job using xp_cmdshell or cmd execute, from within Management Studio on remote machine (SQL Server itself), from within Management Studio or Powershell, or command prompt on my own machine. Result is always the same.
However, if I run BCP "out" for the second time on the same table, result is ~80000 rows per second to local disk. Very good result, that's how BCP should behave.
The same is true for every big table in every database. For the first time BCP "out" is extremely slow. For the second time - very fast (normal, in fact).
Could the community please help?
March 20, 2012 at 1:53 am
I would guess that caching is the answer.
The first time you run BCP, all the rows have to be read into memory from disk.
The second time you run BCP, all the rows are already in memory.
You could try verifying this by clearing the cache (DBCC DROPCLEANBUFFERS) between the first and second run. You should then see that the second run is just as slow.
March 20, 2012 at 2:47 am
Seems like you are right, Ian. After cleaning the cache, the speed is low again.
However, this is weird. By default, BCP is supposed to be fast even if the source table is not loaded to cache, isn't it? It is positioned as "the fastest" way to transfer data...
My problem is that I have a very narrow timeframe between the moment the database is restored on the new server, till the moment its data resides in new tables...
What can be done here? How can I speed BCP "out" up? Any ideas, suggestions, theories?
Added later ________________________________________
Not exactly. This time BCP ran for 11 minutes only, instead of 90, with ~6500 rows per second. So, it seems that Ian's theory is only partially correct...
March 20, 2012 at 3:03 am
It doesn't matter what method you use to get the data out (and BCP is about as fast as it gets), they will all have to read the rows into memory at some point.
Are you BCPing out to the same physical disk that the database is on? If so, putting the BCP file on a different disk should give you some improvement.
Is there anything else that's hitting the database disk at the same time that could be rescheduled?
If there is a delay between the restore of the database and running BCP, you could try getting the data into cache before BCP runs (by doing a simple SELECT cols FROM table).
March 20, 2012 at 7:00 am
Eugene Bereznuk (3/20/2012)
I have to BCP this table "out" and then "in" to another table (which, normally, is faster than INSERT..SELECT..).
If the target table can be dropped before being populate and depending on the indexes required, SELECT/INTO will work at least twice as fast as exporting and then importing the data.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 20, 2012 at 7:03 am
Target table is created before populating. Of course, it's a heap without any indexes at all - they are created only after populating is finished.
March 20, 2012 at 7:12 am
As Jeff suggested, SELECT ... INTO, will outperform BCP in this case, looks like nothing should stop you from using it. You will not need to create you target table before running SELECT ... INTO, opposite to it - you will need to drop it if it does exists.
March 20, 2012 at 7:15 am
I cannot use SELECT..INTO, because the target table has to be created on specific filegroups, including one for textimage. I create a table beforehand.
March 20, 2012 at 7:42 am
Ah, that's nightmare then, SQLServer still not allows to specify filegroup in SELECT..INTO. May be it gets fixed in SQL2012?
Have you tried to use SSIS for comparison? It may be faster than BCP in your case, as it's uses better driver.
Also, try to execute CHECKPOINT before your first run.
March 20, 2012 at 9:06 am
Eugene (B),
Try with INSERT xxx WITH (TABLOCK) ... SELECT.
If your destination table is a heap then it should always be minimally logged (on SQL 2008 and +) and you should see the same performance as SELECT INTO but this time the table is already created the way you want.
Check the used log size during the INSERT to see if you are minimally logged or not.
If you want to speed things up even more you should just get rid of the TEXT field, it's horrible performance wise and it has a few bug (when you delete rows). You should be able to replace it with varchar(max). Just pay attention to the size of the table since varchar(max) will keep it's data in row if it's less than 8000 by default (can be changed with a sql command though).
March 20, 2012 at 7:29 pm
Oliiii (3/20/2012)
Eugene (B),Try with INSERT xxx WITH (TABLOCK) ... SELECT.
If your destination table is a heap then it should always be minimally logged (on SQL 2008 and +) and you should see the same performance as SELECT INTO but this time the table is already created the way you want.
Have you a link for that information?
--Jeff Moden
Change is inevitable... Change for the better is not.
March 20, 2012 at 11:48 pm
Thanks everyone for your help.
Of course I have been using WITH (TABLOCK) on both target and source tables.
As for converting text to something else - I need nvarchar(MAX) to allow special characters. I have already tried it, naturally. The conversion ALTER TABLE...ALTER COLUMN took approximately as long as INSERT..SELECT, so there is no gain.
Anyway, thanks again.
March 21, 2012 at 12:58 am
Jeff Moden (3/20/2012)
Oliiii (3/20/2012)
Eugene (B),Try with INSERT xxx WITH (TABLOCK) ... SELECT.
If your destination table is a heap then it should always be minimally logged (on SQL 2008 and +) and you should see the same performance as SELECT INTO but this time the table is already created the way you want.
Have you a link for that information?
Yep, everything is explained in here: http://msdn.microsoft.com/en-us/library/dd425070%28v=sql.100%29.aspx (Including when to use TF610).
There is a pretty neat table in the middle "Summarizing Minimal Logging Conditions". Too bad this kind of info is buried in a document and not widely available.
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply