SQL Server replication is a mature and reliable out-of-the-box means of copying
data from one database to another. Up to a point it is also a high performance solution.
There are many factors that influence the performance of replication and perhaps
the most overlooked one is the configuration of the server acting as the distributor.
By default the server acting as a distributor will have a single database called
"distribution". As the number of publications and publishers increase we
are faced with the fact that this single distribution database is attempting to
process all transactions.
Fortunately we can set up additional distribution databases on our distributor and
assign a separate distribution database to each publisher. Short of installing a
brand new distributor this acts as quite an effective way of scaling out distribution.
Basic topology for replication
There are a number of articles on SSC that deal with replication topology but for
brevity SQL Server follows a magazine publishing paradigm.
- Publishers supply data to the distributor
- The distributor provides data to subscribers.
A publication contains articles which can be any combination of the following
- Tables
- Views
- Indexed views
- User defined functions
- Stored Procedures
You can have one or many publications from any database and the database at the
subscriber end can receive one or many subscriptions.
The diagram below illustrates this setup.
We can see that replication is quite flexible in what it allows us to do and, even
in this basic setup will work well for quite heavy loads.
Multiple distribution databases
The symptoms of a struggling distributor are not always apparent from the normal
Perfmon counters. The most obvious sign of a problem is in the replication monitor
where the rate at which outstanding transactions are dealt with seems to slow to
a crawl.
In a struggling distribution database the number of records in the MSRepl_transactions
table is likely to be very high and once it has gone beyond a certain size distribution
will slow to a crawl.
Fortunately we can have multiple distribution databases on the server acting as
the distributor as the diagram below shows.
A publishing instance of SQL Server may only be assigned to a single distributor
and a single distribution database. Looking at the diagram above we could not have
Publish_01 using Distribution_A and Publish_02 using Distribution_B unless Publish_01
and Publish_02 were on separate SQL Server instances on the same machine.
Setting up a new distribution database
This is easy using either Management Studio or by scripts.
The following code snippet adds a new distribution database called "QA_Distribution".
USE master GO exec sp_adddistributiondb @database = N'QA_Distribution', @data_folder = N'C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\Data', @log_folder = N'C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\Data', @data_file_size = 5, @log_file_size = 5, @min_distretention = 0, @max_distretention = 24, @history_retention = 24, @security_mode = 1 GO
There are a few points to note about the parameters for the sp_adddistributiondb
stored procedure
Parameter | Comment |
---|---|
@data_folder @Log_folder | If you have reached the point where you need separate distribution databases then ideally you should aim to store the distribution database and log files on fast disks |
@data_file_size @log_file_size | The figures are in MB. Ideally you want your file sizes to be big enough so you are unlikely to need an autogrow. |
@max_distretention | Books online tells us that if the subscriber does not receive transactions from the distributor and there are actually transactions that are older than this figure then the subscription will be marked as inactive. Although this defaults to 72 hours you need to make a judgment call as to whether your distributor can actually hold 72 hours worth of transactions! If your company has support cover on a 24/7 basis and you can fix replication issues within 4 hours then drop this figure down to around 8 hours. Why around 8 hours? Old DBA rule when estimating. Come up with a figure, double it then add a bit! |
@history_retention | This is how long you wish to retain the replication history. Again, if you run a 24/7 service then the default of 48 hours may be unnecessarily long. |
Once we have created our new distribution database using the sp_adddistributiondb
stored procedure you need to run the following code snippet.
USE [QA_Distribution] GO IF NOT EXISTS (SELECT * FROM sysobjects WHERE name = 'UIProperties' AND type = 'U ') CREATE TABLE UIProperties(id INT) IF EXISTS (SELECT * FROM ::fn_listextendedproperty('SnapshotFolder', 'user','dbo', 'table', 'UIProperties', null, null)) EXEC sp_updateextendedproperty N'SnapshotFolder', N'C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\ReplData', 'user', dbo, 'table', 'UIProperties' ELSE EXEC sp_addextendedproperty N'SnapshotFolder', N'C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\ReplData', 'user', dbo, 'table', 'UIProperties' GO
This snippet creates a user table called UIProperties and adds an extended property
called "SnapshotFolder" to the table containing the filepath for the snapshot
BCP files.
To do the same thing using management studio do the following
- Connect to your distributor
- Right-Click on replication and choose "Distributor Properties" as shown below.
- On the Distributor Properties dialogue choose "New" as circled below in red.
- Complete the resulting dialogue as shown below.
It is worth emphasising that management studio does not allow you to set the data
and log file sizes in this dialogue so, as mentioned in the parameters for sp_adddistributiondb you
should set there sizes to ensure that they are not likely to require an autogrow.
It should also be noted that you can add data files to the primary filegroup for
the distribution database if required.
Configuring replication to use our new distribution database
If you are commissioning a new database server to use as a publisher then this is
quite straight forward. If you are changing existing publications/subscriptions
to use the new distribution database then there are few more steps to undertake
but they are straight forward, if a little inconvenient.
If you have any existing publications
This is the inconvenient bit. You have to script off the creation of ALL publications
and subscriptions and then drop said publications and subscriptions prior to going
any further.
This is necessary because changing the distribution database for a publisher affects
the entire publisher.
Configuring from a script
The code below is sufficient to assign a distribution database to a publisher.
exec sp_adddistpublisher @publisher = N'DEVELOPMENT', @distribution_db = N'QA_Distribution', @security_mode = 1, -- Use trusted security to allow the distributor to talk to the publisher. @working_directory = N'C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\ReplData', @trusted = N'false', -- deprecated, for backwards compatibility only. @thirdparty_flag = 0, -- 0 = SQL Server, 1 = other (probably ORACLE) @publisher_type = N'MSSQLSERVER'
Once this has been done you can run you publication and subscription creation scripts
taking whatever remedial action is necessary to ensure that the data on your publisher
and subscriber are in sync.
Configuring from management studio.
Frankly this is a pain in the neck and is a good example where management studio
can be smartened up a bit. Do the following as before
- Connect to your distributor
- Right-Click on replication and choose "Distributor Properties"
- In the left hand panel choose "Publishers" to see a dialogue box similar
to the one shown below
- Uncheck the publisher you wish to reassign a distribution database for
- Click OK to close the dialogue box. You will receive a warning message asking you
if you want to disable this publisher and that all publications and subscriptions
will be deleted. This is why you needed to script your publications and subscriptions
before doing this! Although the warning will say that publications and subscriptions
will be deleted I have found that management studio leaves behind orphaned subscriptions
so I stress the need to drop publications and subscriptions manually.
- Repeat the first 3 steps only this time your original publisher will not be shown
so click the Add button shown in red.
- Connect to your publisher and you will receive the "Publishers" dialogue
box again only this time you can choose the distribution database from a drop down
menu as shown below.
Once you click OK you will have configured a publisher to use a specific replication
database.
Again, you will now have to run your publication/subscription creation scripts and
take whatever remedial action is necessary to ensure that the data is in sync between
your publisher and subscriber.
Closing thoughts
Using separate distribution databases can have a dramatic beneficial effect on replication
performance. I would go as far as to recommend that a distribution DB per publisher
be adopted as standard practise.
By having multiple distribution databases we may actually get a benefit that is
perceived as being greater than reality simply because transactions from publisher
'x' no longer get jammed up in a distribution database shared with publisher
'y'.
Sizing of the distribution databases is important. Whatever else happens you want
to avoid the database auto-growing so size the database correctly in the first place.
If you make the distribution database too big then it can always be shrunk later.
Don't overlook the hardware for the distribution databases. The nature of what
these databases are asked to do is to read and write simple transactions at very
high speeds and the appropriate hardware should be chosen accordingly.