Synchronize databases over a slow network

  • I'm thinking about what might be the best solution for a synchronization problem.

    The problem:

    I have one database called X. It's the source database for my data warehouse, but I can't touch this database (politics, no debate on this one).

    However, I can get a daily back-up which is restored on another server. Let's call this server A. My data warehouse server is in another network (literally in another country). Let's call this one server B.

    I need to get the data from server A to server B. The problem is that there's a very slow connection between those two servers. So I can't just copy the back-up. Another problem is that the database on server A is a daily restore of another database, so I can't use AlwaysOn for example.

    Any ideas?

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Koen Verbeeck (10/27/2015)


    I'm thinking about what might be the best solution for a synchronization problem.

    The problem:

    I have one database called X. It's the source database for my data warehouse, but I can't touch this database (politics, no debate on this one).

    However, I can get a daily back-up which is restored on another server. Let's call this server A. My data warehouse server is in another network (literally in another country). Let's call this one server B.

    I need to get the data from server A to server B. The problem is that there's a very slow connection between those two servers. So I can't just copy the back-up. Another problem is that the database on server A is a daily restore of another database, so I can't use AlwaysOn for example.

    Any ideas?

    What is the origin of the backup which is restored to Server A? Could the originator send you a copy direct?

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • It's all in the same company, but in different locations.

    So again, the network is too slow to send a copy. (and it has to be automated daily)

    It takes 17 hours to transfer the back-up over the network by the way :angry:

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Do you need the full source DB or is it just a subset?

  • Can you do two copies on server A one for today and the other for tomorrow. Then find diff and only move delta to dw?

    Jayanth Kurup[/url]

  • Did something similar a while back (in the age of the slow Internet) by adding in a step in the restore process. An "untouchable" originating database (SQL mirror of an AS/400) was backed up and restored on a "workable" location which had then to be mirrored/replicated to the BI location. Instead of restoring over the "workable" one, it would be restored side-by-side to the previous one, a delta created and lastly a drop-rename cycle. Not very intuitive but saved lots of time and network bandwidth.

    😎

  • anthony.green (10/27/2015)


    Do you need the full source DB or is it just a subset?

    I need a subset.

    But the tables I need are still quite large (let's say in total 20GB).

    There's a last modified column so I could set-up incremental loading, but the guys from the source system don't know if this date is reliable.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Jayanth and Eirikur: your delta proposals seem quite interesting.

    How did you create the delta?

    Using the EXCEPT operator? This might be quite slow on the larger tables (about 10 million rows).

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Yeah my thoughts where to have 2 DB's one to load the copy of the tables you needed into

    Eg, DB1 has TableA_Today, TableA_Yesterday, DB2 has TableA, run a checksum join against TableA_Today and TableA_Yesterday and output none matched checksum rows into DB2..TableA

    Rename Yesterday to holding, rename Today to Yesterday rename holding to Today

    Rinse and repeat for all the tables you need to "incremental load", backup the DB, send the DB over restore, and merge the data into your DW.

    Lot of work, may or may not be feasible for you

  • Koen Verbeeck (10/27/2015)


    Jayanth and Eirikur: your delta proposals seem quite interesting.

    How did you create the delta?

    Using the EXCEPT operator? This might be quite slow on the larger tables (about 10 million rows).

    Had to use few different techniques, part of the data had last changed but other (referential data from other systems used in the AS/400) didn't have any such information so checksum/rowcount etc.. Since both databases were on the same server, the delta calculation/extraction was pretty quick. The deltas would then be dumped into flat files which were zipped and shipped across.

    😎

    Isn't there an SSIS thingy to do this:-P

  • Koen Verbeeck (10/27/2015)


    Jayanth and Eirikur: your delta proposals seem quite interesting.

    How did you create the delta?

    Using the EXCEPT operator? This might be quite slow on the larger tables (about 10 million rows).

    It might not be as simple as a Except statement. Since you might need to keep track of new rows inserted as well as old rows that are updated or deleted I am thinking it has to be a join , unless the tables have some kind of versifying column like startdate and end date.

    You could try tablediff.exe too.

    https://msdn.microsoft.com/en-us/library/ms162843.aspx

    Jayanth Kurup[/url]

  • Jayanth and Eirikur: your delta proposals seem quite interesting.

    How did you create the delta?

    Using the EXCEPT operator? This might be quite slow on the larger tables (about 10 million rows).

    Could you not use Hashbytes in SSIS script to compare the differences. Again it would still have to scan the 10 million rows,

    but I suspect it would be quicker than using Except.

  • I'm thinking a left join between the recent table and the old table. There are no deletes and with a left join I have all the inserts.

    Using hashes, I can find out which rows have actually changed: keep the rows with different hashes and where the hash of the old table is NULL (--> the inserts).

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • SimonH (10/27/2015)


    Jayanth and Eirikur: your delta proposals seem quite interesting.

    How did you create the delta?

    Using the EXCEPT operator? This might be quite slow on the larger tables (about 10 million rows).

    Could you not use Hashbytes in SSIS script to compare the differences. Again it would still have to scan the 10 million rows,

    but I suspect it would be quicker than using Except.

    If I would use SSIS to do the compare, I would need a merge join to join the two scans from the tables which would mean the data needs to be sorted.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Could you not use a cached Lookup on the IDs, as opposed to a Merge Join

    and then use an SSIS script (gives you access to C# hashbytes function which isn't limited to varchar(8000)) to compare for UPDATES.

    There is a good article on it here.

    [/url]

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply