November 14, 2005 at 5:05 am
I've read some articles stating that bulk insert is quicker than using transforms. Has anyone got some benchmark figures or seen any claims on how much faster it's supposed to be?
November 15, 2005 at 4:58 am
SQL Server Magazine did a comparisopn showdown a few years ago... the article can be found here:
http://www.windowsitpro.com/SQLServer/Articles/ArticleID/19760/pg/2/2.html
-Mike Gercevich
November 15, 2005 at 11:54 am
The bulk insert will undoubtedly be much faster than using transforms, but an important question is what kind of transforms? If you are modifying data in transforms, then these operations will have to be done in a separate UPDATE after the bulk insert. If you need transforms to clean or correct data (illegal dates, formatting issues, etc.), the bulk copy will probably fail. If the DTS package is running on a client PC, the transformation workload is on the client instead of the server, although you would need some heavy-duty transformation logic for this to outweigh the speed advantage of a minimally-logged bulk insert.
There are exceptions, but I would say use the bulk insert if it will get the job done. Even if an UPDATE or INSERT/SELECT is needed to modify the data, this will probably run faster in T-SQL than in ActiveX transforms. I can't say how much faster, there are too many variables.
November 16, 2005 at 2:07 am
Thank you for replies. The transform is a simple one, all work on the data is done using updates after the update. We are currently importing a table that takes 2h 40m and need evidence to support our theory that being provided with a csv will speed this up. The CSV would be created by those responsible for the Oracle database that we are importing from and they are disputing that an extract, move and load would be quicker than Oracle to SQL DTS. I was hoping that there were official figures to back up my theory to add to the sample testing that I can do myself.
November 16, 2005 at 9:00 am
Bulk insert is the fastest way to load data into SQL Server, especially if it is minimally logged. The data source can only be a text file.
The fastest way to get the data out of Oracle is to dump it to a local disk. The fastest way to transfer the data is to copy the file (even faster if you use FTP) rather than a direct connection that will have row-by-row overhead. And the fastest way to load the data in SQL Server is bulk insert. There are more steps involved, but anyone who claims this will not be faster than a 3-hour DTS transfer either doesn't know what they are talking about or is intentionally lying.
They may not have space on the Oracle server to dump the table to, or they may not want to create a share you can read from for security reasons. (An answer is to dump the file to a network share, or a share on the SQL Server). If they had a valid reason for not wanting to use a CSV file I could understand, but if they simply say they don't think it will be faster then they are wrong. It's easy enough to try it once to see what the difference is.
I'm not an Oracle expert, but I've heard from Oracle users that it can be difficult to export data. It sounds to me they just want this to be your job (maintaining the DTS package) or have it percieved as a SQL Server performance issue, rather than expending any effort to do the job correctly.
November 16, 2005 at 9:12 am
They do the export for another table we use in another system OK. I think it's just that they only have the Oracle experience and not MS SQL or DTS.
December 7, 2005 at 2:57 pm
I personally always use the transformation task since it's the only one that can go both ways. As long as you specify a decent batch size the speed difference between the transformation task and the bulk insert task is minimal from my experience.
December 7, 2005 at 5:06 pm
I don't want this to sound like a flame, but if you haven't seen a speed difference between a DTS data pump task and a bulk insert then either you haven't tried to copy a large enough table or you're not meeting the criteria for minimally-logged bulk copy. There are limitations on database recovery mode and destination table indexes and triggers that can prevent a bulk insert from running as fast as possible.
I have to transfer data back and forth to a Teradata system, and the DTS data pump task is so slow it is unusable for all but the smallest datasets. This is a combination of bad Teradata drivers (IMHO), and a slow VPN connection where the latency affects the row-by-row transfer handshaking. Teradata has their own data transfer utilities that collect rows into large blocks for network transfer, use multiple connections in parallel, and use bulk copy API commands to get the data into SQL Server. These utilities are a pain to set up and much more limited than DTS, but are many times faster. This may be a worst-case scenerio (at least the worst one I've had to deal with), but the speed difference is huge. It's quite possible the Oracle-SQL connection works better than the Teradata-SQL connection, but I haven't used it myself.
I also have to work with remote SQL Servers, also over a VPN but one with less latency. I use DTS packages extensively to move data back and forth, but at some point in the 10MB-100MB range it becomes worthwhile to switch to a bcp out - zip - ftp - bcp in process. If the file is big enough to notice the difference between FTP and Windows Explorer file copy, you should notice a speed improvement with bulk insert.
There is also the issue of the performance impact on the Oracle server. The 3-hour data transfer is using resources on their system the whole time. A data export would finish very quickly, the subsequent copy and import operations would be completly independent of the Oracle database.
December 8, 2005 at 2:35 am
Scott,
You may not want to sound like a flame but you do sound a little miffed. One problem is that this issue is on a customer site, we have no way of creating an equivalent size file on our development server and we do not have the same network, therefore I haven't proved the difference on this myself exactly, however I have seen that it is much better. We can't set up a test until they agree what is to be tested and currently they are reluctant to test this method. My main issue is that between us and a third party Oracle house any official quotes from recognised bodies would hold much more sway than anything we can say that we've proved and may help persuade them to do a test, hence proving to them that this is the way to go.
December 8, 2005 at 9:42 am
I didn't want my reply to Todd to sound testy. I believe he was expressing an honest opinion when he said he had only seen minimal speed differences. I have had other experiences and wanted to share them without sounding like I was trying to shout him down. It's amazing how much you can learn here by having open discussions with others and listening to why they disagree with you.
On the other hand the people you're dealing with would get under my skin pretty quickly, at least from the way you've described the situation. They're a customer, so you have to be professional and keep it to yourself, but I think I be way past "miffed" by now. There are perfectly legitimate reasons not to use bulk insert, but those aren't the reasons they're giving you.
There could be other good reasons, but I don't know enough about Oracle to speculate. I find it hard to believe that dumping data to a csv file is some fiendishly difficult operation in Oracle that would bring the server to its knees. Since they don't seem to have legitimate excuses, let's speculate on some alternatives.
Their approach seems to be: "We don't believe you. We can't say why, but we can block your access to the resources to prove us wrong. So there, Nyaah!" This doesn't seem professional to me.
If someone disagrees with my technical opinion but can't or won't bother to explain why, I have to assume ignorance, stupidity, and/or malice on their part. Even if I am older, overweight, overpaid, and have a bad personality.
PS: Maybe I'm too old to feel the need to insert smileys to indicate humor or sarcasm. Feel free to mentally add them while reading.
May 21, 2013 at 11:11 am
Bueno, luego de leerlo todo, trataré de seguir las mejores técnicas en el plan de réplica de datos desde una base de datos DB2 a otra SQL Server 2012, luego les contaré al final del proyecto cuales fueron los caminos elegidos y por que.
Best Regards, MauricioP.
May 21, 2013 at 11:45 am
Google Translate (5/21/2013)
Well, after reading it all, try to follow the best techniques on data replication scheme from a DB2 database to another SQL Server 2012, then I will tell you at the end of the project which were the paths chosen and why.Best Regards, MauricioP.
Lowell
July 15, 2015 at 10:32 am
mike gercevich (11/15/2005)
SQL Server Magazine did a comparisopn showdown a few years ago... the article can be found here:
<A href="http://www.windowsitpro.com/SQLServer/Articles/ArticleID/19760/pg/2/2.html">http://www.windowsitpro.com/SQLServer/Articles/ArticleID/19760/pg/2/2.html</A>
-Mike Gercevich
The article has moved here: http://sqlmag.com/t-sql/showdown-bcp-vs-dts
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply