Replication offers some interesting ways to solve problems. We get a fair
number of questions in the discussion area about when to use which type of
replication. Sometimes that is an easy decision, sometimes not. You really have
to know a good bit about how all the types of replication work AND have a good
understanding of the business problem. In the next few articles I'm going to
explore replication and talk about some of the issues that you might not think
about when deciding to use replication.
Let's start with snapshot replication. I would guess that this method is the
least used (we'll put a poll up soon to see!) but it does have it's uses. I'll
put up a tutorial on how to do it in a future article, for now let's talk about
how it works at a high level, something like this:
- Create the publication (basically a one time set up)
- Snapshot agent runs which creates at least three files. One to create the
tables involved, one to create the indexes for those tables, and one bcp
file per table.
- Snapshots get applied to subscriber at whatever schedule makes sense
What that means to us is that we typically will have a very high latency.
Changes accrue on the publisher and at some point we just make a complete copy
and send to the subscriber to replace the existing one. Depending on the table
size it also requires a significant amount of bandwidth. Both merge and
transactional require the same snapshot, the difference is they only need it
once. With snapshot you're doing it over and over again.
So what would be a good situation to use snapshot? How about a fairly static
look up table? States. Zip codes maybe. Changes to those would be pretty rare
and probably a few hours or even a day of latency probably wouldn't matter. Even
here, why not use transactional and only send the changes which we know will be
few? Why keep sending over data that probably hasn't changed?
Probably not the best example. Hoping someone will offer some good ones!
Let's look at it from another angle. Why would I NOT want to use
transactional or merge? Aside from the apparent (and sometimes real) complexity
compared to snapshot, I can make some good arguments for not using them:
- No log reader required. Log readers take about a meg of memory. They also
have to run near continuously (the log reader actually reads the transaction
log to determine if each transaction meets the criteria for replication.
Either way, once it has finishing it marks the transaction as clear,
allowing you to remove it from the transaction log).
- No restrictions on schema changes. With SQL2K you can at least add or
remove columns without a lot of work, changing an existing column is a pain.
In SQL7 any change is a pain.
- No requirement to have a primary key (or a GUID for merge).
None of those are insurmountable. The last one is actually something I ran
into recently. We needed to replicate a table used by a legacy app and the table
had no primary key defined and no combination of keys that I could find to
create one, so I added an identity column. That broke the legacy app. Lacking
the time and knowledge of how the app worked, I opted for snapshot as a short
term solution. Don't underestimate the value of being able to make schema
changes easily! What would be a simple change becomes a serious task.
One point in favor of transactional over snapshot is how the snapshots are
processed. A standard snapshot will lock the tables during the BCP step to make
sure you get a valid copy. Transactional in SQL2K offers the option of a
'concurrent' snapshot which uses the log reader in conjunction with the snapshot
agent to reduce locking.
Hopefully that leaves you with more questions than answers! Anyone out there
using snapshot? Give me a good reason why it was the right choice? Interested in
a detailed walk through of how it works in a follow up article? Post a comment
in the attached discussion forum!