October 27, 2015 at 2:48 am
I'm thinking about what might be the best solution for a synchronization problem.
The problem:
I have one database called X. It's the source database for my data warehouse, but I can't touch this database (politics, no debate on this one).
However, I can get a daily back-up which is restored on another server. Let's call this server A. My data warehouse server is in another network (literally in another country). Let's call this one server B.
I need to get the data from server A to server B. The problem is that there's a very slow connection between those two servers. So I can't just copy the back-up. Another problem is that the database on server A is a daily restore of another database, so I can't use AlwaysOn for example.
Any ideas?
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
October 27, 2015 at 2:57 am
Koen Verbeeck (10/27/2015)
I'm thinking about what might be the best solution for a synchronization problem.The problem:
I have one database called X. It's the source database for my data warehouse, but I can't touch this database (politics, no debate on this one).
However, I can get a daily back-up which is restored on another server. Let's call this server A. My data warehouse server is in another network (literally in another country). Let's call this one server B.
I need to get the data from server A to server B. The problem is that there's a very slow connection between those two servers. So I can't just copy the back-up. Another problem is that the database on server A is a daily restore of another database, so I can't use AlwaysOn for example.
Any ideas?
What is the origin of the backup which is restored to Server A? Could the originator send you a copy direct?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 27, 2015 at 3:16 am
It's all in the same company, but in different locations.
So again, the network is too slow to send a copy. (and it has to be automated daily)
It takes 17 hours to transfer the back-up over the network by the way :angry:
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
October 27, 2015 at 3:26 am
Do you need the full source DB or is it just a subset?
October 27, 2015 at 3:46 am
October 27, 2015 at 3:51 am
Did something similar a while back (in the age of the slow Internet) by adding in a step in the restore process. An "untouchable" originating database (SQL mirror of an AS/400) was backed up and restored on a "workable" location which had then to be mirrored/replicated to the BI location. Instead of restoring over the "workable" one, it would be restored side-by-side to the previous one, a delta created and lastly a drop-rename cycle. Not very intuitive but saved lots of time and network bandwidth.
π
October 27, 2015 at 3:55 am
anthony.green (10/27/2015)
Do you need the full source DB or is it just a subset?
I need a subset.
But the tables I need are still quite large (let's say in total 20GB).
There's a last modified column so I could set-up incremental loading, but the guys from the source system don't know if this date is reliable.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
October 27, 2015 at 4:00 am
Jayanth and Eirikur: your delta proposals seem quite interesting.
How did you create the delta?
Using the EXCEPT operator? This might be quite slow on the larger tables (about 10 million rows).
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
October 27, 2015 at 4:04 am
Yeah my thoughts where to have 2 DB's one to load the copy of the tables you needed into
Eg, DB1 has TableA_Today, TableA_Yesterday, DB2 has TableA, run a checksum join against TableA_Today and TableA_Yesterday and output none matched checksum rows into DB2..TableA
Rename Yesterday to holding, rename Today to Yesterday rename holding to Today
Rinse and repeat for all the tables you need to "incremental load", backup the DB, send the DB over restore, and merge the data into your DW.
Lot of work, may or may not be feasible for you
October 27, 2015 at 4:30 am
Koen Verbeeck (10/27/2015)
Jayanth and Eirikur: your delta proposals seem quite interesting.How did you create the delta?
Using the EXCEPT operator? This might be quite slow on the larger tables (about 10 million rows).
Had to use few different techniques, part of the data had last changed but other (referential data from other systems used in the AS/400) didn't have any such information so checksum/rowcount etc.. Since both databases were on the same server, the delta calculation/extraction was pretty quick. The deltas would then be dumped into flat files which were zipped and shipped across.
π
Isn't there an SSIS thingy to do this:-P
October 27, 2015 at 4:32 am
Koen Verbeeck (10/27/2015)
Jayanth and Eirikur: your delta proposals seem quite interesting.How did you create the delta?
Using the EXCEPT operator? This might be quite slow on the larger tables (about 10 million rows).
It might not be as simple as a Except statement. Since you might need to keep track of new rows inserted as well as old rows that are updated or deleted I am thinking it has to be a join , unless the tables have some kind of versifying column like startdate and end date.
You could try tablediff.exe too.
October 27, 2015 at 4:41 am
Jayanth and Eirikur: your delta proposals seem quite interesting.
How did you create the delta?
Using the EXCEPT operator? This might be quite slow on the larger tables (about 10 million rows).
Could you not use Hashbytes in SSIS script to compare the differences. Again it would still have to scan the 10 million rows,
but I suspect it would be quicker than using Except.
October 27, 2015 at 4:47 am
I'm thinking a left join between the recent table and the old table. There are no deletes and with a left join I have all the inserts.
Using hashes, I can find out which rows have actually changed: keep the rows with different hashes and where the hash of the old table is NULL (--> the inserts).
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
October 27, 2015 at 4:48 am
SimonH (10/27/2015)
Jayanth and Eirikur: your delta proposals seem quite interesting.
How did you create the delta?
Using the EXCEPT operator? This might be quite slow on the larger tables (about 10 million rows).
Could you not use Hashbytes in SSIS script to compare the differences. Again it would still have to scan the 10 million rows,
but I suspect it would be quicker than using Except.
If I would use SSIS to do the compare, I would need a merge join to join the two scans from the tables which would mean the data needs to be sorted.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
October 27, 2015 at 5:05 am
Viewing 15 posts - 1 through 15 (of 21 total)
You must be logged in to reply to this topic. Login to reply