This is the final installment in the 12 day series for SQL tidbits during this holiday season.
Previous articles in this mini-series on quick tidbits:
- SQL Sat LV announcement
- Burning Time
- Reviewing Peers
- Broken Broker
- Peer Identity
- Lost in Space
- Command ‘n Conquer
- Ring in The New
- Queries Going Boom
- Retention of XE Session Data in a Table
- Purging syspolicy
Working with replication quite a bit with some clients you might run across some particularly puzzling problems. This story should shed some light on one particularly puzzling issue I have seen on more than one occasion.
In working with a multi-site replication and multi-package replication topology, the cpu was constantly running above 90% utilization and there seemed to be a general slowness even in Windows operations.
Digging into the server took some time to find what might have been causing the slowness and high CPU. Doing an overall server health check helped point in a general direction.
Some clues from the general health check were as follows.
- distribution database over 20GB. This may not have been a necessarily bad thing but the databases between all the publications weren’t that big.
- distribution cleanup job taking more than 5 minutes to complete. Had the job been cleaning up records, this might not have been an indicator. In this case, 0 records were cleaned up on each run.
The root cause seemed to be pointing to a replication mis-configuration. The mis-configuration could have been anywhere from the distribution agent to an individual publication. Generally, it seems that the real problem is more on a configuration of an individual publication more than any other setting.
When these conditions are met, it would be a good idea to check the publication properties for each publication. Dive into the distribution database and try to find if any single publication is the root cause and potentially is retaining more replication commands than any other publication. You can use sp_helppublication to check the publication settings for each publication. You can check MSrepl_commands in the distribution database to find a correlation of commands retained to publication.
Once having checked all of this information, it’s time to put a fix in place. It is also time to do a little research before actually applying this fix. Why? Well, because you will want to make sure this is an appropriate change for your environment. For instance, you may not want to try this for a peer-to-peer topology. In part because one of the settings can’t be changed in a peer-to-peer topology. I leave that challenge to you to discover in a testing environment.
The settings that can help are as follows.
EXEC SP_CHANGEPUBLICATION
@publication = 'somepub', -- put your publication name here
@property = 'allow_anonymous',
@VALUE = 'false'
GO
EXEC SP_CHANGEPUBLICATION
@publication = 'somepub', -- put your publication name here
@property = 'immediate_sync',
@VALUE = 'false'
GO
These settings can have a profound effect on the distribution retention, the cleanup process and your overall CPU consumption. Please test and research before implementing these changes.
Besides the potential benefits just described, there are other benefits to changing these commands. For instance, changing replication articles can become less burdensome by disabling these settings. The disabling of these settings can help reduce the snapshot load and allow a single article to be snapped to the subscribers instead of the entire publication.