November 14, 2013 at 9:03 am
We have also just expanded tempDB on the subscriber server to have 8 data files to match the number of processors.
November 14, 2013 at 9:22 am
I'm assuming network speed/bandwidth has been eliminated since it was a SQL upgrade.
So this may be down to differences in functionality between the 2000 distribution agent and 2008. Maybe execution of the repl procs on the subscriber is slow due to blocking or poor plan.
Did you run sp_updatestats post subscriber upgrade?
Are you able to trace the sp_MSins/upd/del procs to see their execution and test speed?
You could also look in the procedure cache, its possible the 2000 agent has a different plan to 2008.
November 15, 2013 at 1:40 am
Network hasn't changed so that was ruled out.
Stats were updated for the MSreplication_subscriptions table.
We will do some comparing of the repl procs today to see if that shows anything.
We also have the DBs in 2000 compatibility mode still and have changed a few to 2008 but that doesn't appear to have resolved
November 15, 2013 at 2:08 am
chris.roddis-ferrari (11/15/2013)
Stats were updated for the MSreplication_subscriptions table.
I was wondering if you'd updated all the stats on the 2008 subscriber since upgrading the server?
I would consider running sp_updatestats on the db then look considering updating the stats with fullscan on the biggest tables at the subscriber.
Its possible that the different agent versions are using different execution plans for the replication stored procedures. Since the latency appears to be at the running of commands at the subscriber, this is where I would be looking.
November 15, 2013 at 3:11 am
We had updated the stats on specific tables, but have now scheduled in an update with full scan for tomorrow morning
November 17, 2013 at 9:25 am
8hrs and the update stats is still going! It isn't perfect but performance does seem to be better. Should have a good idea tomorrow morning.
November 18, 2013 at 4:29 am
Update stats did not resolve the issue unfortunately.
We have also just expanded the TempDB from 6GB to 80GB based on a recommendation from Microsoft.
November 18, 2013 at 10:06 am
Do you see Async network IO waits on all of the publishers or only the upgraded ones?
Have they checked the network interfaces on the new servers (full duplex?)?
What do round trip network times look like using tracert? Are they the same?
Do you get the same bandwidth speed if you copy files from the new servers to the central server, as you get if you do the same from one of the old servers?
November 18, 2013 at 11:47 am
I think the async network io is a red herring. Network cards are full duplex and data transfer rates comparable.
Over the weekend we set up a second subscription to an identical db on the same box for 20 of the 60 sites. There has been no latency on this subscription at all. We are adding more sites over night to see how this works with more load.
November 19, 2013 at 2:17 am
chris.roddis-ferrari (11/18/2013)
I think the async network io is a red herring. Network cards are full duplex and data transfer rates comparable.
I agree with this, the wait stat is related to command completion on the subscriber.
chris.roddis-ferrari (11/18/2013)
Over the weekend we set up a second subscription to an identical db on the same box for 20 of the 60 sites. There has been no latency on this subscription at all. We are adding more sites over night to see how this works with more load.
Have you used SQL trace on the original subscriber for slow replication procedures yet?
November 22, 2013 at 9:32 am
Thanks for everyones help. Finally got to the bottom of this. It appears to be a clash between SQL2000 and 2008
SQL 2008
==========
update MSreplication_subscriptions set transaction_timestamp = cast(@P1 as binary(15)) + cast(substring(transaction_timestamp, 16, 1) as binary(1)), "time" = @P2 where UPPER(publisher) = UPPER(@P3) and publisher_db = @P4 and publication = @P5 and subscription_type = 0 and (substring(transaction_timestamp, 16, 1) = 0 or datalength(transaction_timestamp) < 16)
Relevant Index Seek as below:-
|--Clustered Index Seek(OBJECT:([Three].[dbo].[MSreplication_subscriptions].[uc1MSReplication_subscriptions]), SEEK:([Three].[dbo].[MSreplication_subscriptions].[publication]=[@P5] AND [Three].[dbo].[MSreplication_subscriptions].[publisher_db]=[@P4]), WHERE:([Three].[dbo].[MSreplication_subscriptions].[subscription_type]=(0) AND upper([Three].[dbo].[MSreplication_subscriptions].[publisher])=upper([@P3])) ORDERED FORWARD)
SQL 2000
========
update MSreplication_subscriptions set transaction_timestamp = @P1, "time" = @P2 where publisher = @P3 and publisher_db = @P4 and publication = @P5 and subscription_type = 0
Relevant Index Seek as below:-
|--Clustered Index Seek(OBJECT:([Three].[dbo].[MSreplication_subscriptions].[uc1MSReplication_subscriptions]), SEEK:([Three].[dbo].[MSreplication_subscriptions].[publication]=[@P5] AND [Three].[dbo].[MSreplication_subscriptions].[publisher_db]=[@P4] AND [Three].[dbo].[MSreplication_subscriptions].[publisher]=[@P3] AND [Three].[dbo].[MSreplication_subscriptions].[subscription_type]=(0)) ORDERED FORWARD)
In SQL 2008 the UPDATE is doing a “Clustered Index Seek” operation on “uc1MSReplication_subscriptions” , the seek operation is just done on “publication” and “publisher_db” columns and later we are filtering the rows on other distinct columns like “publisher”. This means that first the query will fetch all the rows from “Msreplication_Subscriptions” table as the Publication and Publisher DB name is same across all the servers. Thus there is a high potential that with this approach one distribution agent will block the other agent while updating “Msreplication_Subscriptions” table.
Contrary to this , in SQL 2000 , the “Clustered Index Seek” operation on “uc1MSReplication_subscriptions” has no WHERE clause (only ANDs) and this should ideally fetch one single row for that particular distribution agent , which in turn means that we have very low (or no) probability of this agent blocking other agents while updating the watermarks on “Msreplication_Subscriptions” table.
To resolve we recreated the SQL 2008 Publishers but this time with a unique publication name which stopped the blocking
Viewing 11 posts - 16 through 25 (of 25 total)
You must be logged in to reply to this topic. Login to reply