March 3, 2010 at 12:56 am
Hello,
I've setup transactional replication between two SQL servers.
The publication database is on a standalone server and the subscription runs in a live environment where the database has to be accessible during replication. Both are connected accross an internet connection. The subscription has some stored procedures that are used by a website.
During replication I notice that the subscription database is sometimes not reachable, timeouts appear and it looks like it's locked by the replication. This causes a lot of errors on the stored procedures that run in the database, needed for the website. Total replication time is about 2 hours because there's a relatively slow connection between the two servers (different countries).
My opinion was that the subscription should be completely accessible during replication, but it seems it's not. As I look at the subscription server I see no high load or high memoy usage that can cause the problems.
I can't hardly find any information on this subject on the internet.
Can anyone please help me how to solve this?
best regards,
Jens
March 3, 2010 at 6:21 am
One thing you can do is to set up SnapShot Isolation. (Row Level version) This means that the select queries (Stored procs) from your website will be able to read the last committed data. It will not have to wait for the Replication to release lock before your Stored procs can read the data. It is a simple set up.
I did write an article regarding this. Check here.[/url]
I did not understand one thing though. You stated it takes 2 Hrs for the replication to complete. Does the updates that happen in publisher comes in Batches of Hundreds of thousands of records?
-Roy
March 3, 2010 at 6:44 am
Hello Roy,
Yes indeed, the publisher will have around 100000 to 400000 updates and new records with each replication. In total the table has around 4 million records.
I thought this would be no problem because replication was made for this, but it looks if the table is locked during replication.
Thanks for your article, I will read it and hope it gives me some more information.
March 3, 2010 at 7:02 am
Replication is designed to do this pretty fast. But since you have latency, it takes more time to get this done. Any data manipulation will need a lock on the target (Subscriber).
The article will give you one way of mitigating the timeout issue. There might be other ways as well.
-Roy
March 3, 2010 at 7:49 am
I'm confused what you mean by the subscriber is not accessible? Replication runs continuously, so is the subscriber never accessible? Or do you have scheduled subscription pull/push?
I haven't seen SQL Server not work during these times. When it's not accessible, do you mean you cannot connect? Or some stored procedures have long runtimes? Can you narrow down more what's happening?
When transaction replication runs, it doesn't necessarily stop access to your data. It runs log records back against your database, just like another user that's connected to the server.
March 3, 2010 at 10:11 am
In our situation replication doesn't run continuously. It's scheduled for the night and it runs a few times a week. To my opinion that's better because also the data in the publisher is updated only a few times every week.
And yes you're right, SQL server is working and some data can be retrieved from the tables. But in most cases it takes a very long time for (some relatively simple) stored procedures to run. This will end up with a timeout error on the website.
I hope the problem is a bit more clear now.
March 3, 2010 at 10:48 am
So here's what I suggest. When things are slow, check for locking/blocking. I would be concerned that someone has not setup transactional replciation and instead setup some snapshot replication, or that they are dropping and adding back data.
Are there lots of changes on the publisher?
March 3, 2010 at 11:00 am
Hello Steve,
Could it be a good idea to use Snapshot Isolation (Row Level version) as Roy proposes?
Yes there are around 100000 to 400000 changes on the publisher each time before it's replicated. Replication is indeed setup as transactional replication, so no snapshot replication.
March 3, 2010 at 11:25 am
I would think snapshot isolation would help, but I'm not a replication expert. With that many changes coming through, I suspect you are going to have issues. It's like packing that many transactions from clients into a small window.
Have you considered using more regular replication?
March 3, 2010 at 11:32 am
Continuous replication would have been a better option. But still if you are going to have bulk updates you will still have the issues. Another thing you might want to look at is to see what kind of WAIT Types are happening while the Bulk Updates are being replicated.
This might be showing an underlying problem in your Server set up.
-Roy
March 3, 2010 at 9:04 pm
I would be checking for blocking. Replication is attempting to deliver changes to about 10% of the records - to me that sounds like you probably end up with replication having exclusive locks on most of the table(s) in question.
Since you have a slow connection, have you looked using a different profile for the distribution agent. One of the changes that you can make is to decrease the bcpbatchsize. From memory it is every 100,000 records. On a slow network could take quite a while. I would decrease this to something like 10,000.
March 4, 2010 at 7:26 am
I've tested today with replication of only 10.000 changed records. Even this small amount of records will give timeout errors on the Stored Procedures that are used by the website.
@roy, how can I see what kind of WAIT Types are happening while the bulk updates are being replicated? Somewhere in the logfiles?
March 4, 2010 at 7:49 am
There is a very good blog by Jimmy may with some SQL Queries that gives you Wait Type, Finds out if there are IO issues, Blocking etc. Check this blog.
In that page there is a link called "DMV All-Stars Dream Team"
You can run the SQL Query based on your SQL Server Version. What I would look for if I were you would be REAL TIME BLOCKING and the Virtual File latency. The latter one will show if there are any problems with your IO subsystem.
-Roy
January 10, 2012 at 11:16 pm
I don't think snapshot is the correct replication to use in this case as it locks the tables and other articles for publishing during replication process. Transcational and Merge do not held locks on the articles and your website/application can still use last committed snapshot data. This is my understanding. Can someone correct me if this is not the case?
January 10, 2012 at 11:24 pm
I don't think snapshot is the correct replication to use in this case as it locks the tables and other articles for publishing during replication process. Transactional and Merge do not hold locks on the articles and your website/application can still use last committed snapshot data. This is my understanding. Can someone correct me if this is not the case?
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply