October 12, 2016 at 12:50 pm
We have 2 tables with the same schema on different servers. Let's call them TblSrc and TblDest. We need to "synchronize" the data in both tables based on the primary key. If the record exists in TblSrc but not in TblDst (based on primary key) we INSERT the record from TblSrc into TblDst. If the record exists in both tables, delete the one in TblDst and replace it by the record from TblSrc.
I coded some t-sql using MERGE which does the trick but is relatively slow. Here's why: TblSrc and TblDst actually have millions of records each and each table has about 100 fields. (Furthermore, I've described the situation for only 2 tables. The actual task involves over 100 tables needing handling similar to that described above.)
Question: given the task, the size of tables, and the number of tables, how would you approach this task from an performance perspective?
TIA,
BD
October 12, 2016 at 1:18 pm
Barkingdog (10/12/2016)
We have 2 tables with the same schema on different servers. Let's call them TblSrc and TblDest. We need to "synchronize" the data in both tables based on the primary key. If the record exists in TblSrc but not in TblDst (based on primary key) we INSERT the record from TblSrc into TblDst. If the record exists in both tables, delete the one in TblDst and replace it by the record from TblSrc.I coded some t-sql using MERGE which does the trick but is relatively slow. Here's why: TblSrc and TblDst actually have millions of records each and each table has about 100 fields. (Furthermore, I've described the situation for only 2 tables. The actual task involves over 100 tables needing handling similar to that described above.)
Question: given the task, the size of tables, and the number of tables, how would you approach this task from an performance perspective?
TIA,
BD
Can you add and maintain a DateModified column on the tables to be sync'd?
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
October 12, 2016 at 2:25 pm
Is SQL Replication available to you?
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
October 12, 2016 at 2:57 pm
Create a bridge table containing a column for the primary key and a column for a left/right bit indicator.
Insert into the bridge table the exception records from each side, with the appropriate l/r indicator.
Insert into left table
select * from right table
where key in (select key from bridge table where rightside = true );
All of the joins are fully indexed on your keys so they should be much quicker and the table is very skinny requiring fewer page scans.
Wes
(A solid design is always preferable to a creative workaround)
October 12, 2016 at 5:06 pm
I asked about that and they said "no" (as third parties are involved in the drama and that could impact their code.)
BD
October 12, 2016 at 5:07 pm
No replication either (as the source DB is already involved in a replication relationship.)
October 12, 2016 at 5:14 pm
Great idea!. I was thinking about something along that line but it didn't quite crystallize. Your push got made me think about it again and it became much clearer.
Two barks to you!
BD
October 12, 2016 at 5:16 pm
Using SQL2012, you have the TABLEDIFF option. I'd look into that. Each table would need a PK though. You tell it which is the source database/table and the destination database/table then it lets you know what commands need run to make the destination the same as the source. Bit of advice: make sure you use the -f option. That outputs the commands to a file instead of just running them. Then you can open the file to see what would be run and run it manually (or create a script to run the files).
-SQLBill
October 12, 2016 at 5:16 pm
I'm curious what they mean by impact. The bridge table would be fairly light-weight and incur minimal locking. It may even be safe to use With (NoLock) to eliminate the locking on the reads, depending on the activity on the table and when this will be run. You're biggest impact will be the deletes/inserts, but your merge statement will have a bigger impact.
If they are concerned about altering the schema, you could host the bridge table in tempdb.
Wes
(A solid design is always preferable to a creative workaround)
October 12, 2016 at 5:27 pm
Barkingdog (10/12/2016)
We have 2 tables with the same schema on different servers. Let's call them TblSrc and TblDest. We need to "synchronize" the data in both tables based on the primary key. If the record exists in TblSrc but not in TblDst (based on primary key) we INSERT the record from TblSrc into TblDst. If the record exists in both tables, delete the one in TblDst and replace it by the record from TblSrc.I coded some t-sql using MERGE which does the trick but is relatively slow. Here's why: TblSrc and TblDst actually have millions of records each and each table has about 100 fields. (Furthermore, I've described the situation for only 2 tables. The actual task involves over 100 tables needing handling similar to that described above.)
Question: given the task, the size of tables, and the number of tables, how would you approach this task from an performance perspective?
TIA,
BD
So, you don't care what the non-primary key columns contain at all? In other words, you never have the need for an UPDATE?
From your description, it would be easier to just restore the database as a different name than what you want, drop the old database, and then rename the freshly restored database to be the same as the one you just dropped. Users would see a total down time of about 65ms. If the databases are on the same SAN, you might be able to use software that comes with the san to do a SAN SNAPSHOT to update the database and would be nearly instantaneous especially when compared to a restore or trying to do "diffs" on a hundred tables.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 12, 2016 at 7:19 pm
Is it possible for a row to be in tblDst, but not in tblSrc? There was no mention of any action for that situation so I'm assuming you would still want to keep the row. That sounds like a warehouse of accumulating tables.
However, synchronizing the tables implies that the data is identical between the two. If that is the case, I agree with Jeff that there are much better infrastructure solutions that can handle this on the back end.
Wes
(A solid design is always preferable to a creative workaround)
October 12, 2016 at 8:13 pm
Rookie,
Another great thought. You're not rookie!
BD
October 12, 2016 at 8:14 pm
Jeff,
>> So, you don't care what the non-primary key columns contain at all? In other words, you never have the need for an UPDATE?
Correct. It's simply a matter of replacing the existing record.
VD
October 12, 2016 at 10:24 pm
Barkingdog (10/12/2016)
Jeff,>> So, you don't care what the non-primary key columns contain at all? In other words, you never have the need for an UPDATE?
Correct. It's simply a matter of replacing the existing record.
VD
I'd go with the SAN Snapshot, if you have that available, then. It'll be faster than synching even just one table.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply