Transactional Replication Performance Issues After Migration From 2000 to 2008

Question

Post reply

Transactional Replication Performance Issues After Migration From 2000 to 2008

Viewing 11 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply

chris.roddis-ferrari Mr or Mrs. 500 Points: 566 More actions · Answer 1

We have also just expanded tempDB on the subscriber server to have 8 data files to match the number of processors.

MysteryJimbo SSC-Insane Points: 24203 More actions · Answer 2

I'm assuming network speed/bandwidth has been eliminated since it was a SQL upgrade.

So this may be down to differences in functionality between the 2000 distribution agent and 2008. Maybe execution of the repl procs on the subscriber is slow due to blocking or poor plan.

Did you run sp_updatestats post subscriber upgrade?

Are you able to trace the sp_MSins/upd/del procs to see their execution and test speed?

You could also look in the procedure cache, its possible the 2000 agent has a different plan to 2008.

chris.roddis-ferrari Mr or Mrs. 500 Points: 566 More actions · Answer 3

Network hasn't changed so that was ruled out.

Stats were updated for the MSreplication_subscriptions table.

We will do some comparing of the repl procs today to see if that shows anything.

We also have the DBs in 2000 compatibility mode still and have changed a few to 2008 but that doesn't appear to have resolved

MysteryJimbo SSC-Insane Points: 24203 More actions · Answer 4

chris.roddis-ferrari (11/15/2013)
Stats were updated for the MSreplication_subscriptions table.

I was wondering if you'd updated all the stats on the 2008 subscriber since upgrading the server?

I would consider running sp_updatestats on the db then look considering updating the stats with fullscan on the biggest tables at the subscriber.

Its possible that the different agent versions are using different execution plans for the replication stored procedures. Since the latency appears to be at the running of commands at the subscriber, this is where I would be looking.

chris.roddis-ferrari Mr or Mrs. 500 Points: 566 More actions · Answer 5

We had updated the stats on specific tables, but have now scheduled in an update with full scan for tomorrow morning

chris.roddis-ferrari Mr or Mrs. 500 Points: 566 More actions · Answer 6

8hrs and the update stats is still going! It isn't perfect but performance does seem to be better. Should have a good idea tomorrow morning.

chris.roddis-ferrari Mr or Mrs. 500 Points: 566 More actions · Answer 7

Update stats did not resolve the issue unfortunately.

We have also just expanded the TempDB from 6GB to 80GB based on a recommendation from Microsoft.

arnipetursson SSCertifiable Points: 6557 More actions · Answer 8

Do you see Async network IO waits on all of the publishers or only the upgraded ones?

Have they checked the network interfaces on the new servers (full duplex?)?

What do round trip network times look like using tracert? Are they the same?

Do you get the same bandwidth speed if you copy files from the new servers to the central server, as you get if you do the same from one of the old servers?

chris.roddis-ferrari Mr or Mrs. 500 Points: 566 More actions · Answer 9

I think the async network io is a red herring. Network cards are full duplex and data transfer rates comparable.

Over the weekend we set up a second subscription to an identical db on the same box for 20 of the 60 sites. There has been no latency on this subscription at all. We are adding more sites over night to see how this works with more load.

MysteryJimbo SSC-Insane Points: 24203 More actions · Answer 10

chris.roddis-ferrari (11/18/2013)
I think the async network io is a red herring. Network cards are full duplex and data transfer rates comparable.

I agree with this, the wait stat is related to command completion on the subscriber.

chris.roddis-ferrari (11/18/2013)
Over the weekend we set up a second subscription to an identical db on the same box for 20 of the 60 sites. There has been no latency on this subscription at all. We are adding more sites over night to see how this works with more load.

Have you used SQL trace on the original subscriber for slow replication procedures yet?

chris.roddis-ferrari Mr or Mrs. 500 Points: 566 More actions · Answer 11

Thanks for everyones help. Finally got to the bottom of this. It appears to be a clash between SQL2000 and 2008

SQL 2008

==========

update MSreplication_subscriptions set transaction_timestamp = cast(@P1 as binary(15)) + cast(substring(transaction_timestamp, 16, 1) as binary(1)), "time" = @P2 where UPPER(publisher) = UPPER(@P3) and publisher_db = @P4 and publication = @P5 and subscription_type = 0 and (substring(transaction_timestamp, 16, 1) = 0 or datalength(transaction_timestamp) < 16)

Relevant Index Seek as below:-

|--Clustered Index Seek(OBJECT:([Three].[dbo].[MSreplication_subscriptions].[uc1MSReplication_subscriptions]), SEEK:([Three].[dbo].[MSreplication_subscriptions].[publication]=[@P5] AND [Three].[dbo].[MSreplication_subscriptions].[publisher_db]=[@P4]), WHERE:([Three].[dbo].[MSreplication_subscriptions].[subscription_type]=(0) AND upper([Three].[dbo].[MSreplication_subscriptions].[publisher])=upper([@P3])) ORDERED FORWARD)

SQL 2000

========

update MSreplication_subscriptions set transaction_timestamp = @P1, "time" = @P2 where publisher = @P3 and publisher_db = @P4 and publication = @P5 and subscription_type = 0

Relevant Index Seek as below:-

|--Clustered Index Seek(OBJECT:([Three].[dbo].[MSreplication_subscriptions].[uc1MSReplication_subscriptions]), SEEK:([Three].[dbo].[MSreplication_subscriptions].[publication]=[@P5] AND [Three].[dbo].[MSreplication_subscriptions].[publisher_db]=[@P4] AND [Three].[dbo].[MSreplication_subscriptions].[publisher]=[@P3] AND [Three].[dbo].[MSreplication_subscriptions].[subscription_type]=(0)) ORDERED FORWARD)

In SQL 2008 the UPDATE is doing a “Clustered Index Seek” operation on “uc1MSReplication_subscriptions” , the seek operation is just done on “publication” and “publisher_db” columns and later we are filtering the rows on other distinct columns like “publisher”. This means that first the query will fetch all the rows from “Msreplication_Subscriptions” table as the Publication and Publisher DB name is same across all the servers. Thus there is a high potential that with this approach one distribution agent will block the other agent while updating “Msreplication_Subscriptions” table.

Contrary to this , in SQL 2000 , the “Clustered Index Seek” operation on “uc1MSReplication_subscriptions” has no WHERE clause (only ANDs) and this should ideally fetch one single row for that particular distribution agent , which in turn means that we have very low (or no) probability of this agent blocking other agents while updating the watermarks on “Msreplication_Subscriptions” table.

To resolve we recreated the SQL 2008 Publishers but this time with a unique publication name which stopped the blocking