Synchronize databases over a slow network

Question

Synchronize databases over a slow network

Koen Verbeeck

SSC Guru

Points: 259115
More actions
October 27, 2015 at 2:48 am

#322091

I'm thinking about what might be the best solution for a synchronization problem.
The problem:
I have one database called X. It's the source database for my data warehouse, but I can't touch this database (politics, no debate on this one).
However, I can get a daily back-up which is restored on another server. Let's call this server A. My data warehouse server is in another network (literally in another country). Let's call this one server B.
I need to get the data from server A to server B. The problem is that there's a very slow connection between those two servers. So I can't just copy the back-up. Another problem is that the database on server A is a daily restore of another database, so I can't use AlwaysOn for example.
Any ideas?
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 1

Koen Verbeeck (10/27/2015)
I'm thinking about what might be the best solution for a synchronization problem.
The problem:
I have one database called X. It's the source database for my data warehouse, but I can't touch this database (politics, no debate on this one).
However, I can get a daily back-up which is restored on another server. Let's call this server A. My data warehouse server is in another network (literally in another country). Let's call this one server B.
I need to get the data from server A to server B. The problem is that there's a very slow connection between those two servers. So I can't just copy the back-up. Another problem is that the database on server A is a daily restore of another database, so I can't use AlwaysOn for example.
Any ideas?

What is the origin of the backup which is restored to Server A? Could the originator send you a copy direct?

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Koen Verbeeck SSC Guru Points: 259115 More actions · Answer 2

It's all in the same company, but in different locations.

So again, the network is too slow to send a copy. (and it has to be automated daily)

It takes 17 hours to transfer the back-up over the network by the way :angry:

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

Ant-Green SSC Guru Points: 113507 More actions · Answer 3

Do you need the full source DB or is it just a subset?

Jayanth_Kurup SSC-Insane Points: 22967 More actions · Answer 4

Can you do two copies on server A one for today and the other for tomorrow. Then find diff and only move delta to dw?

Jayanth Kurup[/url]

Eirikur Eiriksson SSC Guru Points: 182916 More actions · Answer 5

Did something similar a while back (in the age of the slow Internet) by adding in a step in the restore process. An "untouchable" originating database (SQL mirror of an AS/400) was backed up and restored on a "workable" location which had then to be mirrored/replicated to the BI location. Instead of restoring over the "workable" one, it would be restored side-by-side to the previous one, a delta created and lastly a drop-rename cycle. Not very intuitive but saved lots of time and network bandwidth.

😎

Koen Verbeeck SSC Guru Points: 259115 More actions · Answer 6

anthony.green (10/27/2015)
Do you need the full source DB or is it just a subset?

I need a subset.

But the tables I need are still quite large (let's say in total 20GB).

There's a last modified column so I could set-up incremental loading, but the guys from the source system don't know if this date is reliable.

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

Koen Verbeeck SSC Guru Points: 259115 More actions · Answer 7

Jayanth and Eirikur: your delta proposals seem quite interesting.

How did you create the delta?

Using the EXCEPT operator? This might be quite slow on the larger tables (about 10 million rows).

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

Ant-Green SSC Guru Points: 113507 More actions · Answer 8

Yeah my thoughts where to have 2 DB's one to load the copy of the tables you needed into

Eg, DB1 has TableA_Today, TableA_Yesterday, DB2 has TableA, run a checksum join against TableA_Today and TableA_Yesterday and output none matched checksum rows into DB2..TableA

Rename Yesterday to holding, rename Today to Yesterday rename holding to Today

Rinse and repeat for all the tables you need to "incremental load", backup the DB, send the DB over restore, and merge the data into your DW.

Lot of work, may or may not be feasible for you

Eirikur Eiriksson SSC Guru Points: 182916 More actions · Answer 9

Koen Verbeeck (10/27/2015)
Jayanth and Eirikur: your delta proposals seem quite interesting.
How did you create the delta?
Using the EXCEPT operator? This might be quite slow on the larger tables (about 10 million rows).

Had to use few different techniques, part of the data had last changed but other (referential data from other systems used in the AS/400) didn't have any such information so checksum/rowcount etc.. Since both databases were on the same server, the delta calculation/extraction was pretty quick. The deltas would then be dumped into flat files which were zipped and shipped across.

😎

Isn't there an SSIS thingy to do this:-P

Jayanth_Kurup SSC-Insane Points: 22967 More actions · Answer 10

Koen Verbeeck (10/27/2015)
Jayanth and Eirikur: your delta proposals seem quite interesting.
How did you create the delta?
Using the EXCEPT operator? This might be quite slow on the larger tables (about 10 million rows).

It might not be as simple as a Except statement. Since you might need to keep track of new rows inserted as well as old rows that are updated or deleted I am thinking it has to be a join , unless the tables have some kind of versifying column like startdate and end date.

You could try tablediff.exe too.

https://msdn.microsoft.com/en-us/library/ms162843.aspx

Jayanth Kurup[/url]

Simon Hundleby SSCarpal Tunnel Points: 4568 More actions · Answer 11

Jayanth and Eirikur: your delta proposals seem quite interesting.
How did you create the delta?
Using the EXCEPT operator? This might be quite slow on the larger tables (about 10 million rows).

Could you not use Hashbytes in SSIS script to compare the differences. Again it would still have to scan the 10 million rows,

but I suspect it would be quicker than using Except.

Koen Verbeeck SSC Guru Points: 259115 More actions · Answer 12

I'm thinking a left join between the recent table and the old table. There are no deletes and with a left join I have all the inserts.

Using hashes, I can find out which rows have actually changed: keep the rows with different hashes and where the hash of the old table is NULL (--> the inserts).

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

Koen Verbeeck SSC Guru Points: 259115 More actions · Answer 13

SimonH (10/27/2015)
Jayanth and Eirikur: your delta proposals seem quite interesting.
How did you create the delta?
Using the EXCEPT operator? This might be quite slow on the larger tables (about 10 million rows).
Could you not use Hashbytes in SSIS script to compare the differences. Again it would still have to scan the 10 million rows,
but I suspect it would be quicker than using Except.

If I would use SSIS to do the compare, I would need a merge join to join the two scans from the tables which would mean the data needs to be sorted.

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

Simon Hundleby SSCarpal Tunnel Points: 4568 More actions · Answer 14

Could you not use a cached Lookup on the IDs, as opposed to a Merge Join

and then use an SSIS script (gives you access to C# hashbytes function which isn't limited to varchar(8000)) to compare for UPDATES.

There is a good article on it here.

[/url]