Resolving Data Drift

  • Hi All,

    First let me apologize that this is a broad question, but it's legitimate none the less.

    We run three production data centers and manage "replication" by an internal service which replicates statements (e.g, a procedure runs with a set of inputs to update some data, that same call is run on the server it was called on, plus replicated over to the other servers) and tasks (a simple example would be grabbing a file from an FTP site, parsing it and uploading it to the database). Most of the time it works fine, but as you can imagine, there are lots of edge cases which can and do happen where something doesn't get persisted in one data center, or gets persisted out of order, leading to data drift.

    I'm not asking for a specific implementation of how to fix the drift, since you'd need intimate knowledge of each system that is drifted, but I'm wondering what some good approaches might be to fix this.

    For example, is there a better way to keep all three datacenters in sync? Assuming I can't make big architectural changes to our system, would it be best to write something in SSIS, C#? Are there 3rd party tools that can do this sort of thing?

    Executive Junior Cowboy Developer, Esq.[/url]

  • To a degree, it probably depends on the requirements;are the other data centers also gathering data, or are they read only? that kind of decides what methodology to consider. sql has a ton of built in options; i think anything third party is just build on the SQL server built ins.

    if read only was ok, log shipping would be one way to go; then you could ship the logs via FTP, and then restore them.

    after that, merge replication would be a solid choice, or something built specifically around change tracking or Change data capture(if you had enterprise)

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Thanks, Lowell.

    All three data centers are full read/write instances of SQL, each processing their own data, and serving it up when a load balancer directs web traffic to that data center.

    We do run enterprise, so anything is on the table. I'll Look into the different replication options. Hopefully it's not such a big change in our server topology that it's not an option.

    Executive Junior Cowboy Developer, Esq.[/url]

  • The one technology built into SQL Server would be Merge Replication, but it's a nightmare to maintain and it requires every table taking part to be modified (you have to add a GUID column). It's designed to do what you want, but Microsoft has given very little attention to replication over the versions. It still operates roughly the way it did back in 2005, which isn't that good.

    I'm not aware of another good option in the built-in products though. It's the multi-direction sync that's the issue. There might be third party tools, but none come to mind.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks you all for the input, even if it was mostly "there aren't any hole-in-one solutions". I'll talk with our DBAs about the viability of using merge replication.

    Executive Junior Cowboy Developer, Esq.[/url]

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply