Lately I've been trying to catch up on all the changes in replication in SQL
2005 and one in particular struck me as interesting - how transactional
replication handles replication of identity columns. If you've done this with
SQL 2000 you know that if you have a table on the publisher that has an identity
column that when the snapshot creates the table on the subscriber it will not
have the identity column. This makes pretty good sense for two reasons; one is
that the subscriber is generally considered read only, the other is that if you
had an identity column on the subscriber you'd have to deal with the possibility
of a key collision if someone had added a row on the subscriber and ended up
using a key that was later generated on the publisher.
Now to make it a little more complicated there is also the concept of having
subscribers that can be updated and have those changes pushed back to the
publisher. SQL 2000 would let you do this by assigning identity ranges to each
subscriber, so that Subscriber A might have the range 100k-200k, Subscriber B
200k-300k, and the publisher would have 1-100k (and obviously the ranges could
not overlap). As the ranges approached the 80% utilization mark there was a proc
(sp_adjustpublisheridentityrange) that would be called on the publisher to go
and get a new range. Works well enough in most cases.
So that's the way the world was, but in SQL 2005 there is a tiny change -
tables with identity columns keep the identity column on the subscribers even if
they are not updateable. We'll talk about why the change was made in a moment,
but I was curious to see how they implemented the change. If you've ever
looked at the stored procedures SQL generates on the subscribers you've probably
seen something like this:
ALTER procedure [dbo].[sp_MSins_dboEmployees]
@c1 int,@c2 datetime,@c3 varchar(50),@c4 varchar(50)
as
begin
insert into "dbo"."Employees"(
"EmployeeID"
,"DateAdded"
,"FirstName"
,"LastName"
)
values (
@c1
,@c2
,@c3
,@c4
)
end
Now that proc makes perfect sense if there is no identity column on the
EmployeeID column, SQL is explicitly providing the value. Add the identity
column back to the table on the subscriber and things would break, right? My off
the cuff thought when I started looking was that I expected to see something
like this:
ALTER procedure [dbo].[sp_MSins_dboEmployees]
@c1 int,@c2 datetime,@c3 varchar(50),@c4 varchar(50)
as
begin
set identity_insert employees on
insert into "dbo"."Employees"(
"EmployeeID"
,"DateAdded"
,"FirstName"
,"LastName"
)
values (
@c1
,@c2
,@c3
,@c4
)
set identity_insert employees off
end
That's how we would normally add a row to a table with an identity column
when we needed to set the value explicitly. But when I created a new publication
to see, I was surprised to see the proc looked just like it always did. Strange,
so how did they accomplish the insert? My next thought was maybe an instead of
trigger on the subscriber, but there were no triggers. Starting up Profiler and
doing a few inserts, I found this:
Nothing on sp_MSPub_adjust_identity in BOL, so I opened it up to see for
myself. Lots of TSQL, so down the rabbit hole for a few minutes looking at stuff
that while interesting, doesn't seem to be overriding or resetting the identity
value. Go back and test again, make sure I haven't looked at wrong table or
something, still works. I reset the seed on the subscriber, insert from the
publisher still works and the seed not changed on the subscriber. I back up and
look once more, and then I see it - "Not for Replication" on the identity
column. Makes perfect sense, that's the same technique that would be used for
updating subscribers. If you look in BOL at the definition you'll see:
- Foreign key constraints - The foreign key constraint is not enforced
when a replication agent performs an insert, update, or delete operation.
- Check constraints - The check constraint is not enforced when a
replication agent performs an insert, update, or delete operation.
- Identity columns - The identity column value is not incremented when a
replication agent performs an insert operation.
- Triggers - The trigger is not executed when a replication agent performs
an insert, update, or delete operation.
So, it's easy to get off track when you start with a bad assumption! The one
thing I think about as I document this is that I wish you could see the 'not for
replication option' being applied in Profiler, that would have saved me a few
minutes, bad assumption or not.
Now let's talk about why the change is important. Lot's of people use
transactional replication to push data to a reporting server. It's easy to set
up and administer, and putting reports on a different server both isolates the
reports from the OLTP side of things and allows you to index more aggressively
to gain speed in reporting. It's then natural to think that the subscriber
becomes a perfect fail over target if the publisher catches on fire. This is
mostly true, you'll have data to within a few seconds typically, but when you
point your applications at the new server and start doing inserts they would
fail in SQL 2000 - no identity columns. If you're using SQL 2000 and wanted to
use it as a failover you needed to have scripts ready to alter the tables to add
the identity columns. Not terribly hard, but one extra step. With this minor
change in SQL 2005 you don't have to worry about making that change any longer.
I hope you'll find the replication change interesting, and maybe my missteps
on seeing how it works will help you somehow too!