August 4, 2010 at 11:41 am
Environment :
SQL Server 2005 32 Bit on Windows 2003 R2 server
64 GB - AWE Enabled
Question :
I have a few replications set up and all is good. Until....
I created a new Publication P1 and then tried to create two Subscribers S1 and S2.
I was able to create only one subscriber. doesnt matter which one I can create first, it allows me to create either S1 or S2.
As I create the second subscription and look at the sync status all it says is "Starting Agent" for a long time and then stops.
Am I hitting a limit here ?
Note: My Publisher, distributor and Subscribers are all different servers
I tried the heap changes on the publisher from the article "http://support.microsoft.com/kb/184802/". Does not work.
Help much appreciated
Thanks
Shinoj
August 5, 2010 at 2:40 pm
Any thought experts ?
August 5, 2010 at 2:50 pm
try adding some logging on your agent job and output it to a text file, see how far the agent is getting and whether its running into any problems. it sounds like it could be a locking issue but that pretty hard to say with limited information.
see this article on setting up logging if you are unsure how to do it http://support.microsoft.com/kb/312292
August 9, 2010 at 10:19 am
Thanks Guys... I guess I know whats happening now.
I think I'm hitting the limit on maximum number for continuous subscriptions on a distribution server.
I am planning to setup scheduled subscriptions
Shinoj
August 9, 2010 at 10:36 am
interesting. what limit are you hitting? number of publications?
August 9, 2010 at 2:25 pm
I would like to know that as well... We have 325 Plus tables being replicated to two subscribers. These articles are published by using 10 publications.
-Roy
August 9, 2010 at 2:56 pm
I have only about 10 publishers and 20 subscriptions in all but I kept hitting the limit
as I start the snapshot agent after creating new publication and subscription.
I delete an old subscription from some other publisher and then try the new one - that works
I'm wondering is it because of my 32bit service?
Scheduled subscriptions seems to be working... Here is an article I found...same issue
http://ask.sqlteam.com/questions/1378/replication-limit-on-number-of-push-subsciptions
Shinoj
August 9, 2010 at 3:08 pm
Thanks for sharing that. It will be a good reference when our replication set up blows apart... 🙂
-Roy
August 10, 2010 at 2:02 am
interesting, thanks Shinoj
August 10, 2010 at 12:09 pm
When I have seen this it was down to exceeding the number of worker threads configured in the various SQL Agent subsystems
To check this
on the distributor
select subsystem,max_worker_threads from msdb.dbo.syssubsystems
where subsystem in ('Snapshot','LogReader','Distribution')
or
exec sp_enum_sqlagent_subsystems
check the number of max_worker_threads against the number of agents running for each subsystem. Generally you also see a message in the logs indicating the stalled process is in fact waiting for a worker thread.
There are two work rounds that I am aware of. Increase the number of worker threads per subsystem in the msdb.dbo.syssubsystems table or perhaps more safely set up a proxy and run additional distribution or snapshot agents under the proxy account.
The latter approach has worked for us. We have 2 proxy accounts. 128 Distribution agents running under the SQL Agent account and a further 35 split across the two proxies
This setting used to be held in the registry see http://support.microsoft.com/kb/306457 and while this is an old KB there is reference to SQL 2005 and manually adjusting the syssubsystems table
also check out the msdb proc sp_verify_subsystems as this gives details of how SQL Server sets up the worker threads on installation. You need to run this proc manually if you have adjusted a server from 24 CPUs down to 1 to get round the SQL 2005 RTM bug that causes installation failure where the number of CPUs in the server is not a power of 2.
When you finish the install and get SP2 on you can then switch the number of CPUs back to 24, however because the system was built with 1 CPU all the max worker threads are configured for 1 CPU
Deleting all rows in the table then running sp_verify_subsystems resolves this for you.
August 10, 2010 at 2:37 pm
My subsystem table has these value, seems high and I hardly have abt 10 publishers and about 20 subscribers. I wonder how I hit a limit
Snapshot - 800
LogReader - 200
Distribution - 800
Do you mean running sp_verify_subsystem will change these values? Will it need a distributor restart after i run sp_verify_subsystem?
Shinoj
August 10, 2010 at 2:55 pm
The numbers returned by that query are very suspect. Try running select * from MSdistribution_agents on the distributor. I'm curious but are you using shared distribution agents? The issue you are describing is something that is hit pretty easily and the only solid way that I know of around this is to change those agents to shared.
To change to shared distribution agent you have to drop the subscription, go to publication properties, subscription options, and change to Independent Distribution Agent.
You can quickly check which publications are set to independent agents by querying syspublications on the published database. Look at the independent_agent column. In most cases you can use a shared agent and not have problems. We only use independent agents when we have a REALLY active table and we do this so that other tables in replication are not negatively affected.
Hope this helps.
David
@SQLTentmaker“He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot
August 10, 2010 at 3:26 pm
1)
select * from MSdistribution_agents
17 and 12 form two distribution agents a toal of 29
2) checked all the publications and all are using independent agent (column value is 1)
3) also checked the sysproxies table (SELECT * FROM msdb.dbo.sysproxies)
Looks like there is a proxy for all the agents running on the server and the proxy subsystem shows me the proxy assigned to the subsystem. Also all the jobs run as their corresponding Proxy Account
Noticed one thing though - all the jobs Run as the proxy account but the owner of the jobs is a sql account distributor_admin (but do not see the entry in sysproxylogin table for the sid and proxyid)
August 10, 2010 at 3:51 pm
distributor_admin is a sysadmin. I guess that's why the use is not in sysproxylogin
August 10, 2010 at 4:16 pm
Shinoj Ramanathan (8/10/2010)
My subsystem table has these value, seems high and I hardly have abt 10 publishers and about 20 subscribers. I wonder how I hit a limitSnapshot - 800
LogReader - 200
Distribution - 800
Do you mean running sp_verify_subsystem will change these values? Will it need a distributor restart after i run sp_verify_subsystem?
Shinoj
deleting the rows from the table then running sp_verify_subsystems will reset the number of worker threads depending on the number of cpus in the server
if you do sp_helptext sp_verify_subsystems you can see exacly what this proc does but in essence there is a check for each subsystem with the following code extract being an example of what the proc does per subsystem
IF NOT EXISTS(SELECT * FROM syssubsystems WHERE subsystem = N'Distribution')
INSERT syssubsystems
VALUES
(
6, N'Distribution', 14553, @InstRootPath + N'SQLREPSS.DLL', @ComRootPath + N'DISTRIB.EXE',N'ReplStart',N'ReplEvent',N'ReplStop',100 * @processor_count
)
If the server you are working with has 8 CPUs then the worker threads you have posted is correct as can be confirmed by the multipliers in the sp_verify_subsystems proc for each of the sub systems.
The number of worker thread is not specifically related to number of publishers and subscribers rather and provided they are not shared agents each subscription to a publication will have its own agent so if you have 10 publications and each publication has 20 subscribers then you will have 200 distribution agents. This can be seen on the distributer in the SQL Agent Jobs and do a count of these with a job category of Repl-Distribution.
select COUNT(*) from msdb.dbo.sysjobs sj join msdb.dbo.syscategories sc on sc.category_id = sj.category_id
where sc.name = 'REPL-Distribution'
If you try this in my experience all you need is a SQL Agent restart so that it picks up the new values.
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply