June 12, 2018 at 9:00 pm
Hi
I have a table in SQL server which has over 5 million records and around 12 columns and wish to load it into a table in another SQL server.
There are no transformations of data and the structure of the table in source and destination is identical.
There are no indexes in the source table or have any column whereby i can split up the 5 million records into "partitions" in which case i could have parallelized the data load.
I tried using SSIS project with source as SQLServer and destination as SQLServer, however it is taking more than 3-4 hours and was erroring out after 1 million records.
After that i tried source SQLServer and destination as flatfile. It didnt error out however it was taking a lot of time(>5 hours) and i dont know if this is a feasible option in the actual production environment.
The idea is once the flat file is loaded i would import it using SSIS into the destination SQLServer table.
Any other options which are available that can speed up the process of data loading?
June 12, 2018 at 10:21 pm
george_at_sql - Tuesday, June 12, 2018 9:00 PMHi
I have a table in SQL server which has over 5 million records and around 12 columns and wish to load it into a table in another SQL server.There are no transformations of data and the structure of the table in source and destination is identical.
There are no indexes in the source table or have any column whereby i can split up the 5 million records into "partitions" in which case i could have parallelized the data load.
I tried using SSIS project with source as SQLServer and destination as SQLServer, however it is taking more than 3-4 hours and was erroring out after 1 million records.
After that i tried source SQLServer and destination as flatfile. It didnt error out however it was taking a lot of time(>5 hours) and i dont know if this is a feasible option in the actual production environment.
The idea is once the flat file is loaded i would import it using SSIS into the destination SQLServer table.Any other options which are available that can speed up the process of data loading?
How about using backup and restore?
😎
June 12, 2018 at 10:28 pm
Eirikur Eiriksson - Tuesday, June 12, 2018 10:21 PMgeorge_at_sql - Tuesday, June 12, 2018 9:00 PMHi
I have a table in SQL server which has over 5 million records and around 12 columns and wish to load it into a table in another SQL server.There are no transformations of data and the structure of the table in source and destination is identical.
There are no indexes in the source table or have any column whereby i can split up the 5 million records into "partitions" in which case i could have parallelized the data load.
I tried using SSIS project with source as SQLServer and destination as SQLServer, however it is taking more than 3-4 hours and was erroring out after 1 million records.
After that i tried source SQLServer and destination as flatfile. It didnt error out however it was taking a lot of time(>5 hours) and i dont know if this is a feasible option in the actual production environment.
The idea is once the flat file is loaded i would import it using SSIS into the destination SQLServer table.Any other options which are available that can speed up the process of data loading?
How about using backup and restore?
😎
I will look into that option after asking the DBA.
Do you have any positive feedback on using linked server for large data transfer?
June 13, 2018 at 6:39 am
The first question I'll ask on this is how much network bandwidth exists between the two servers? If for whatever reason, these servers are on 10/100 Mb network cards, that alone might be enough to cause you a ton of grief. You need to have gigabit network cards or fiber network cards to reduce the network latency. If you have any 10/100 switches or routers in between, then that hardware would become a bottleneck, and it would be far better to have gigabit network switches or routers. Just 5 million rows seems like a rather small number of rows to have such a long transfer time, so unless you have a bunch of varchar(max) or nvarchar(max) columns, there's something else going on. So question number two is: how much data does the table contain? You can right-click on the table from SSMS and choose Properties, then click on Storage. You'll see the number of rows and the amount of data space. Question three: What are the server's specs in terms of RAM and CPU ? Post back and chances are the true problem gets revealed.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
June 13, 2018 at 6:46 am
Additional to what Steve just mentioned. You should also take into account that there's additional time when you run the SSIS from your machine as the data needs to travel to it before getting to the destination.
If network is slow, use bcp to load the data into a flat file in the source (delimited not fixed length please), copy the file to the destination and load the file from there. It might still be slow, but you won't be locking your tables for hours.
June 13, 2018 at 10:05 am
A single 5-million row INSERT can be a huge problem, it may fill up the transaction log even if the database is in simple recovery. A minimally-logged bcp insert is probably the best way to handle it as one file while minimizing this impact.
The process would run fastest if the flat file path is on a separate drive than the SQL data files on either server.
If the network is a serious bottleneck, you could try exporting to a file on the source server, copy the file to the destination, and import it from there. It is the same number of bytes transferred, but I have seen cases where Windows file transfer between servers uses the network more efficiently than SQL Server I/O.
You could split the source file into pieces in SSIS in spite of not having an index or key column. Create a set of flat file destinations, and use a source query of "SELECT *, FileNumber = (ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) % n) FROM table" (where n = number of files). Then use a Conditional Split component to distribute the rows by the FileNumber column. This requires one table scan at the source so it shouldn't make the export take longer, but you may have more luck loading multiple files rather than one big one. (You could also use a Conditional Split Script Transform, where you increment a counter and redirect the rows with relatively simple C# code. The number of files wouldn't have to be hardcoded in the script, it could simply use the number of outputs.)
June 13, 2018 at 10:29 am
Scott Coleman - Wednesday, June 13, 2018 10:05 AMA single 5-million row INSERT can be a huge problem, it may fill up the transaction log even if the database is in simple recovery. A minimally-logged bcp insert is probably the best way to handle it as one file while minimizing this impact.
FYI, SSIS does not generate a single insert when loading data to a table. Also, 5 million rows is not a huge insert that would take multiple hours. Even my laptop could handle that within a few minutes.
June 13, 2018 at 11:46 am
Luis Cazares - Wednesday, June 13, 2018 10:29 AMScott Coleman - Wednesday, June 13, 2018 10:05 AMA single 5-million row INSERT can be a huge problem, it may fill up the transaction log even if the database is in simple recovery. A minimally-logged bcp insert is probably the best way to handle it as one file while minimizing this impact.FYI, SSIS does not generate a single insert when loading data to a table. Also, 5 million rows is not a huge insert that would take multiple hours. Even my laptop could handle that within a few minutes.
I think it might, depending on the settings for 'Rows Per Batch' and 'Maximum Insert Commit Size'.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
June 13, 2018 at 12:00 pm
Luis Cazares - Wednesday, June 13, 2018 10:29 AMScott Coleman - Wednesday, June 13, 2018 10:05 AMA single 5-million row INSERT can be a huge problem, it may fill up the transaction log even if the database is in simple recovery. A minimally-logged bcp insert is probably the best way to handle it as one file while minimizing this impact.FYI, SSIS does not generate a single insert when loading data to a table. Also, 5 million rows is not a huge insert that would take multiple hours. Even my laptop could handle that within a few minutes.
Luis - it depends on the object used for the destination and the settings for that object. The default settings for the OLEDB Destination will create a single transaction...but you can control that using the fast load option (bulk insert) and setting appropriate batch/commit sizes.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
June 13, 2018 at 7:38 pm
sgmunson - Wednesday, June 13, 2018 6:39 AMThe first question I'll ask on this is how much network bandwidth exists between the two servers? If for whatever reason, these servers are on 10/100 Mb network cards, that alone might be enough to cause you a ton of grief. You need to have gigabit network cards or fiber network cards to reduce the network latency. If you have any 10/100 switches or routers in between, then that hardware would become a bottleneck, and it would be far better to have gigabit network switches or routers. Just 5 million rows seems like a rather small number of rows to have such a long transfer time, so unless you have a bunch of varchar(max) or nvarchar(max) columns, there's something else going on. So question number two is: how much data does the table contain? You can right-click on the table from SSMS and choose Properties, then click on Storage. You'll see the number of rows and the amount of data space. Question three: What are the server's specs in terms of RAM and CPU ? Post back and chances are the true problem gets revealed.
You are right, there are varchar(max) columns in that table.
I am trying to get a query to get me the space occupied by a table, since we have restrictive access to the source database i get an error while trying the right click option.
The network i think would be the regular 10/100 Mbit one since its not a high end server of any kind being a development database.
June 13, 2018 at 7:44 pm
Phil Parkin - Wednesday, June 13, 2018 11:46 AMLuis Cazares - Wednesday, June 13, 2018 10:29 AMScott Coleman - Wednesday, June 13, 2018 10:05 AMA single 5-million row INSERT can be a huge problem, it may fill up the transaction log even if the database is in simple recovery. A minimally-logged bcp insert is probably the best way to handle it as one file while minimizing this impact.FYI, SSIS does not generate a single insert when loading data to a table. Also, 5 million rows is not a huge insert that would take multiple hours. Even my laptop could handle that within a few minutes.
I think it might, depending on the settings for 'Rows Per Batch' and 'Maximum Insert Commit Size'.
I fiddled around with the rows per batch from 10000 to 50000.
The maximum insert commit size was kept at the default value of 2147483647
However it didnt help much, in decreasing the loading time
June 13, 2018 at 9:40 pm
I dunno... seems like a lot of people try to jump through unnecessary hoops to do such things. I've got a plain ol' linked server between two of my boxes and, just about an hour ago, I transferred more than 27 million rows of 6 columns in 5 minutes and 9 seconds using a simple select and a little knowledge about Minimal Logging. Sure, that won't set any speed records but it's damned awful easy.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 14, 2018 at 8:58 am
Jeff Moden - Wednesday, June 13, 2018 9:40 PMI dunno... seems like a lot of people try to jump through unnecessary hoops to do such things. I've got a plain ol' linked server between two of my boxes and, just about an hour ago, I transferred more than 27 million rows of 6 columns in 5 minutes and 9 seconds using a simple select and a little knowledge about Minimal Logging. Sure, that won't set any speed records but it's damned awful easy.
That relies on a good connection between the two boxes. Our guesses seem to be focused on the network speed being the problem. I'm sure that there are many ways to transfer data between 2 servers that shouldn't take that long.
I've had the experience of a similar situation where a table was reloaded daily and took 4 hours to complete while locking the table or having incomplete data. The only reason for slowness was that the connection to the source of the table was too slow.
June 14, 2018 at 11:23 am
george_at_sql - Wednesday, June 13, 2018 7:38 PMsgmunson - Wednesday, June 13, 2018 6:39 AMThe first question I'll ask on this is how much network bandwidth exists between the two servers? If for whatever reason, these servers are on 10/100 Mb network cards, that alone might be enough to cause you a ton of grief. You need to have gigabit network cards or fiber network cards to reduce the network latency. If you have any 10/100 switches or routers in between, then that hardware would become a bottleneck, and it would be far better to have gigabit network switches or routers. Just 5 million rows seems like a rather small number of rows to have such a long transfer time, so unless you have a bunch of varchar(max) or nvarchar(max) columns, there's something else going on. So question number two is: how much data does the table contain? You can right-click on the table from SSMS and choose Properties, then click on Storage. You'll see the number of rows and the amount of data space. Question three: What are the server's specs in terms of RAM and CPU ? Post back and chances are the true problem gets revealed.You are right, there are varchar(max) columns in that table.
I am trying to get a query to get me the space occupied by a table, since we have restrictive access to the source database i get an error while trying the right click option.
The network i think would be the regular 10/100 Mbit one since its not a high end server of any kind being a development database.
Well, with 5 million rows of varchar(max), that could be one heck of a lot of data. Might be a lot easier to just backup the database to a removable device and walk the device over to the other server and restore the database there, and then copy the table directly.
If I were you, I'd verify the actual speed of the network between the servers by talking to either the network folks or your DBA, or both. 10/100 is a genuine problem for this kind of data quantity.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
June 14, 2018 at 11:51 am
Luis Cazares - Thursday, June 14, 2018 8:58 AMJeff Moden - Wednesday, June 13, 2018 9:40 PMI dunno... seems like a lot of people try to jump through unnecessary hoops to do such things. I've got a plain ol' linked server between two of my boxes and, just about an hour ago, I transferred more than 27 million rows of 6 columns in 5 minutes and 9 seconds using a simple select and a little knowledge about Minimal Logging. Sure, that won't set any speed records but it's damned awful easy.That relies on a good connection between the two boxes. Our guesses seem to be focused on the network speed being the problem. I'm sure that there are many ways to transfer data between 2 servers that shouldn't take that long.
I've had the experience of a similar situation where a table was reloaded daily and took 4 hours to complete while locking the table or having incomplete data. The only reason for slowness was that the connection to the source of the table was too slow.
Correct but, the one thing that people forget is that if you're using the network, then you're using the network. I've seen people use BCP to save a file and then the copy the file to the other computer and use BCP to import that. They might even "get the religion" and do the BCP out and then use that same file to do a BCP in to save on the copy step but... you're still transferring data between machines across the same network that the linked server uses.
For linked servers, I've found it to be more dependent on the driver being used and how it's being used. People try to do a whole lot of things like joining to the remote server across the linked server and that's just not going to be as fast as a single table "go get what I need" query across the linked server.
I believe that performance challenged code is the usual problem for linked servers and not the linked servers themselves (provided that they correctly configured).
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply