May 28, 2019 at 4:57 pm
Hi,
I have an oracle database installed in a linux machine.
And I have an sql server 2017 on windows.
I would like to copy the data from oracle to sqlserver.
I have an SSIS process that copy less than half of the data almost 13 hours, therefore I am trying to import everything in bulk.
I understand that I need to install it on the oracle machine where the database is (or I am wrong, I am just confused right now).
Can someone please point me to the correct link, information on how to do it? I am really lost.
Thanks
Astrid
May 28, 2019 at 5:13 pm
125 gb of total data. I imagine around 973 tables. right now I have 80 tables and that take like 13 hours
May 28, 2019 at 5:58 pm
That might be a bit much to try to push through SSIS. One option is to export the stuff to files on the Oracle server(I don't really know oracle well enough to know what their export options are but i assume they have some easy way to dump data to a csv or whatnot), copy that over to the SQL Server and them import them there with BCP.
Is this a one time thing? Or something that you need to do often?
May 28, 2019 at 6:02 pm
i need to do it daily, that is why i was trying to so the ssma but the oracle database in on linux and you do need to install it on the source server.
May 28, 2019 at 9:22 pm
i need to do it daily, that is why i was trying to so the ssma but the oracle database in on linux and you do need to install it on the source server.
Just to confirm - you are looking to refresh the tables in SQL Server every day with more than 125GB of data? How long does it take to extract the data from Oracle?
What you really need to be looking at is how to extract just the data that has changed since the last extract, instead of pulling all of the data each time. For those tables that are just lookup type tables - you could build those out as full refresh - but all other tables should be setup as incremental.
The initial extract can take several hours for each table - but that is just to build up the data in SQL Server. After that - each incremental extract shouldn't take but a few minutes.
It also sounds like you have not setup the package to run multiple processes concurrently. That is - you are performing one table extract at a time when you could be performing up to n+1 (n = number of cpus). For these types of projects I setup sequence containers to hold the same group of tables with their own dependencies - then setup dependencies between each sequence container (if needed - for example, process all lookup tables first - then member/customer data).
Exporting to a file - copying the file across the network - and then using BCP won't be any faster than extracting directly using SSIS and in fact can be quite a bit slower. SSIS will stream the data through the pipeline - loading the data into the destination as it is received from the source...whereas extracting to a file, copying the file - then loading must wait for all data to be extracted and copied before it can start loading.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
May 28, 2019 at 9:49 pm
Exporting to a file - copying the file across the network - and then using BCP won't be any faster than extracting directly using SSIS and in fact can be quite a bit slower. SSIS will stream the data through the pipeline - loading the data into the destination as it is received from the source...whereas extracting to a file, copying the file - then loading must wait for all data to be extracted and copied before it can start loading.
The difference with using files is that you don't have to tie up both servers the entire time it's running. Generate your extract on oracle, oracle is done, move the files over then load them. Of course since this has to happen daily neither solution is great for moving the entire database.
May 29, 2019 at 12:50 pm
Good morning.
Maybe I should detailed my process better.
The company I work for uses the Datavault methodology… that is another whole issues…
So basically, we take a table on SSIS add the hashkeys, the loading date and a sequence number and load it to a db called stagearea, on daily basis, only one table gets load incrementally right now during the week.
The biggest table we need to load does not have an id or date that we can look for the last record, and the unique constraint of the table involves like 8 columns.
I do run things on parallel, but I am on a virtual server with 128 RAM and 4 CPU’s and on the same virtual server I have an SSAS (I also have the SSRS but it is not being used right now). So I needed to limited the amount of things I was able to run in parallel as the server kept chocking.
Basically I am not blesses with hardware, and I need to bring the data.
May 29, 2019 at 6:01 pm
The difference with using files is that you don't have to tie up both servers the entire time it's running. Generate your extract on oracle, oracle is done, move the files over then load them. Of course since this has to happen daily neither solution is great for moving the entire database.
If the purpose is to improve performance - then building a file, copying the file to the destination and then loading the file will take longer than streaming through an integration engine (SSIS, Informatica, Ensemble, etc.).
The fact that SSIS uses bulk insert (aka: bcp) for the fast table load option also means you are getting those same benefits.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
May 29, 2019 at 6:22 pm
I am not using the bulk option right now, as I am loading directly from Oracle.
I will try it to, but the big question is: How do you copy data faster? how much hardware am I lacking? I am trying to understand where I am going wrong.
It is not ok that we are loading 80 tables in 13 hours (well it is more like 90)
May 29, 2019 at 7:16 pm
I am not using the bulk option right now, as I am loading directly from Oracle. I will try it to, but the big question is: How do you copy data faster? how much hardware am I lacking? I am trying to understand where I am going wrong. It is not ok that we are loading 80 tables in 13 hours (well it is more like 90)
How is your SSIS package configured? Are you extracting/loading one table at a time - or do you have multiple control flows built that all process at the same time (no dependencies)?
I would start with the biggest table - build a control flow and data flow for that one table only. I would then optimize the query that extracts the data to insure it is as fast as possible and only pulls the required data. If the extract is selecting from multiple tables - I would look at removing the extra tables and just extract the foreign keys, with the related table extracted separately.
Once that query has been optimized - I would then use the OLEDB Destination with the fast table load option. In that object I would then set a batch/commit size - normally I set around 200,000. This can be larger but make sure you review the transaction log on the destination - you want a batch/commit size that works within the current size of the transaction log.
For the destination table - disable all non-clustered indexes. If possible - make sure the data is extracted in the clustered key order so it is inserted in that order. After the data is loaded you can then rebuild the non-clustered indexes. For this largest table - I would not create a clustered index, rather I would load into a HEAP and enable/rebuild the non-clustered indexes after the load.
Since this is a stage table - build a non-clustered unique constraint on the 8 columns that uniquely identify the row. You can then use those columns against the target/final table to identify existing rows (for update) and new rows (for insert) - and no longer existing rows (for delete).
If this one table takes 13+ hours...then you can start to look at other items. Is it the network causing the performance problem - the destination IO/CPU/Memory?
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
May 29, 2019 at 7:37 pm
I load two tables every time, and the reason I do so it is because when we were loading more, we were choking the server.
We load table by table, we don’t do joins
All the tables are set to fast load mode.
Because of the CPU issue we set the max current executable to 2.
The max commit size I didn’t touch at all.
Now, there is no really query, it is a select all kind of query, we bring a copy of the table with all the columns.
One table does not take 13 hours, all the 80 tables we are loading are taking 13 hours.
We do have a non clustered index on the tables, but all the tables get truncated before loading, so it is starting to load empty tables.
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply