February 6, 2003 at 6:12 am
Any pointers to a good checklist for replication performance issues.
We have a smallish database (I think it's up to about 15G) that we make a lot of changes to, which is replicated between two servers. It's set up for transaction replication, about 98% of the updates are on the publisher, and we may update a few million records per day, perhaps as many as 10M.
The servers have 2G, very fast disks, 4CPU, and GBE between them dedicated to inter-server traffic (full duplex), so we've done all we can in that vein, but frequently we'll do an update and about 30 hours later (literally) it has finished replication to the subscriber.
We worked through a number of timeouts and have it so that it generally completes and runs correctly, but it is PAINFULLY slow.
Because we want subscriber update capability, we can't use snapshot replication, and I see no indication merge is faster.
We are not replicating stored procedure calls, but rather letting the affects of stored procedures replicate -- I'm terrified that we have some stored procedures that might not execute 100% the same on each (long story, and I may be wrong); doing so might help a lot, and that might be our fallback. But it's not really a complete solution since about 50% of these updates are uploads of new data records.
So.... any tuning checklist or guides floating around that might show us some things to check for, or consider?
Or any community advice?
A few more details:
- Most tables are replicated. We need all the data that is being replicated, we've already separated temp-type tables so they are not replicated.
- The vast amount of volume comes from 2 or 3 tables with large numbers of rows that grow daily and also have updates regularly; the tables have a couple dozen columns, nothing unusual, mostly integers, a few strings and a couple of datetimes.
- A few tables have triggers but only those very rarely updated. The big volume are just straight updates.
- The big tables have a fairly large number of indexes (3-4) but they are well tuned to the queries we run, and are not a significant problem on the publisher during the record inserts (who knows what slows down the subscriber and/or communications.
- I'm perfectly game to put in another GBE and try to load share, but I frankly suspect its the CPU/driver saturated at this point not the actual wire, and it's not clear that having two and the overhead of load balancing will help or hurt. Is there a better W2K compatible inter-system interface to use than GBE? Running on VMS there was a shared memory channel for such things that had extremely low overhead.
- During these very long replication runs, neither system is saturated in any obvious place. 1 or 2 CPU's of each may be busy, and the network interfaces are slow (i.e. other traffic moves much more slowly than normal). My guess is that the replication process is fairly single streamed and can't make use of multiple CPU's effectively, and in fact is saturating 1 of 4 CPU's (plus a chunk of another doing network IO).
February 6, 2003 at 4:57 pm
Thats a lot of updates. Considering moving distribution db to separate server? Replicating stored proc is a good idea, will save a lot on the publisher - think about what pushing the change does the other way:
a. sql writes to db trans log
b. logreader scans log, writes to distribution db, which is another log write
c. distribution agent pushes data out, deletes, another log entry!
Might also look at your batch size. Maybe changing up or down would help. Is one or more CPU's maxing out? No disk queueing? SP3 added a switch that lets you break up transactions when replicating, have not tried it yet, you'd get part of the data there quicker maybe.
Andy
February 10, 2003 at 1:44 pm
I would use replication store procedures and mcall them to avoid overhead in the updates. Also could create severals publication with different articles, and schedule the replication to avoid concurrent publications delivering to subscribers.
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply