When most of build transactional publications we think of articles as tables,
but articles can actually be other objects including stored procedures, the
focus of this article. To get started I've created a database with a single
table called Employees and two stored procedures, one to add an employee and the
other to increase the pay of all employees. In the screenshot below you can set
the publication configuration.
You can see that I've elected to publish both stored procedures as articles.
If we highlight a stored proc and click on the article properties button, we can
see that we have three choices for how the stored procedure gets replicated. I'm
going to leave it set at the default 'Stored Procedure Definition Only' for now.
Once we add a subscriber and let the snapshot run, this is what we have on
the subscriber:
Now let's do a couple quick tests. The first is to run the Add Employee proc
and see how the change is transmitted to the subscriber:
exec [sp_MSins_dboEmployees] 21,'2007-08-19 11:18:16:077','John','Public',NULL
This shows that we're calling the replication generated procedure on the
subscriber. Our Add Employee proc will never get called by the publisher, but it
might be called by some process on the subscriber. Now let's see what happens if
we modify the proc on the publisher - I'll add a comment to it. The following
was executed on the subscriber automatically:
ALTER PROCEDURE [dbo].[usp_Employee_Add] @FirstName varchar(50), @LastName varchar(50) as set nocount on --TEST COMMENT insert into dbo.Employees ( FirstName, LastName) values ( @FirstName, @LastName)
Changes to stored procedures are propagated to subscribers automatically if
you make them part of the publication. Remember, the stored procedure won't get
called as part of standard replication. It makes sense to replicate the
definition if you plan to use the subscriber as a standby server (having all
your proc changes kept in sync is pretty important) or if you've configured
replication for updating subscribers (would need the same stored procedures that
would normally get executed on the publisher).
Now let's revisit our settings and change it to replicate stored procedure
execution:
I'll execute the same IncreasePay proc on the publisher:
And then we see the same call executed on the subscriber:
This can be a huge performance advantage. I called a single proc that updated
all the rows on the publisher and then instead of transmitting a replication
proc call on the subscriber for each row changed, we just executed the same proc
on the subscriber(s). Given that the proc definition and data hasn't been manually
adjusted on the subscriber you wind up with exactly the same results for a lot
less work - one row logged and processed in the distribution database and one
row instead of x rows sent across the network We still have to do the actual
work on the subscriber of course; if the proc modifies a million rows there is a
certain amount of overhead that we pay regardless of how the update was issued.
And even though we've elected to replication stored procedure execution in this
case we still have the ability to directly modify the table and have those
changes replicated as usual.
The last option is 'Execution in a serialized transaction of the SP' and has
the same advantages of replicating procedure execution but it does so within a
serialized transaction, that is, one with the isolation level set to
SERIALIZABLE. Most of us don't use that high a setting often because it
typically has a negative impact on concurrency. If you need absolute perfection,
this is the way to go, because it eliminates the chance that the stored
procedure can affect a different number of rows because of other transactions
that are occurring at the same time. But in practice do you really need this?
The easy answer is to say yes and the possibly risky answer is to say no. As
with most things that have to do with SQL performance the answer is 'it depends'
and in this case, it depends on you looking at how and when data access occurs,
and your tolerance for having data wind up slightly different. It's probably not
a fair line to draw but I'm usually comfortable skipping serialization if the
changes are just being executed on a reporting server. No, it's not good to have
incorrect data, but if something happens I definitely do still have the
original/correct data. Even if you do select the serialize option you will
probably wind up a net gain compared to processing the individual row changes
that would normally be done.
As a follow up to that point, nothing says that you have to allow replication
of all stored procedures, or that they all have to be serialized or not. It's
entirely worthwhile to just create one stored procedure called something like 'usp_DBA_AdminProc'
and add it to the publication when you create it and set it to replicate stored
procedure execution. It's just for you, will never get called by an application.
But the next time there are a large number of rows to be modified you can alter
the proc to have the needed definition and then run it on the publisher,
probably at night or other lull. You get all the benefits of replicating
procedure execution and almost none of the possible downside because it's being
executed off peak. You can modify the proc again and again, changing it however
is needed, knowing that the right definition will be on each subscriber at the
right time.
The scenario where replicating execution makes the most sense would be where
you have a slow network connection, followed by when you are frequently updating
or deleting large numbers of rows, and the last would be when you just need to
reduce the latency between execution on the publisher and having the changes
fully applied on the subscriber. There can also be some reduction of load on the
distributor if the procs are updating large numbers of rows, but probably much
less savings if you are just replicating OLTP type procs that only affect a
small number of rows with each call.