April 17, 2014 at 3:33 pm
Hello team,
I setup transaction replication between 2 servers and facing locks issue where when replication is running no one is able to access any tables on Subscriber
Now I changed the schedule to run only once a day at night time but in couple of months I need to run the replication after every 15 minutes
Please let me know the best way to resolve this issue( at this time I have 94 articles) speed/performance wise all looks very good and 100 transaction will take only lest than 2 minutes to copy over things but if the sync is running(or enabled) then I'm facing this issue
Memory/CPU plus over all hardware is very powerful
Please help
Thanks
Dave
April 18, 2014 at 5:35 am
What is the specific kind of locks that you are encountering?
How are your articles designed? Are they whole tables or queries? If queries, can you paste the offending ones here with an execution plan so we can look at them?
April 21, 2014 at 9:35 am
Hello, thanks for the reply. please see below
Brandie Tarvin (4/18/2014)
What is the specific kind of locks that you are encountering?(294ms)PAGEIOLATCH_SH:testdb:1(*)
If I do a simple record count on a small table, it is taking long time
How are your articles designed? Are they whole tables or queries? If queries, can you paste the offending ones here with an execution plan so we can look at them?
Only the tables are present
April 21, 2014 at 10:46 am
I do not think your issue is Replication. I think it is either I/O or bad indexes (or both).
A few things to look at on your side:
How wide is your table (how many columns and what is the predominant datatype)?
How long is your table?
How many indexes does your table have? You might have too few. You might have too many. If the total # of indexes is half or more of the total # of columns, you have too many. If you have 0 indexes, you definitely have too few.
Is your paging file the proper size? Does your hard drive(s) / SAN(s) have the proper amount of free space for reading and writing on both sides of the Replication connection?
Network problems are a possibility, but I would concentrate on indexes and I/O issues first.
EDIT: Get your server admin involved for the I/O issues questions.
April 21, 2014 at 10:55 am
Hey, run this script on the Subscriber and see if you're getting lots of small waits or some (or lots) of long waits. Post the results, please.
select * from sys.dm_os_wait_stats
where wait_type = 'PAGEIOLATCH_SH';
April 21, 2014 at 9:11 pm
Take a look
wait_type waiting_tasks_countwait_time_msmax_wait_time_mssignal_wait_time_ms
PAGEIOLATCH_SH 9333614 512244039 103128 1638672
April 21, 2014 at 9:19 pm
Thanks Brandie,
I was thinking the same too but the problem is that after adding the Replication this issue is started
If I stop the replication everything completes in few seconds or minutes(depending on the job)
Some tables are huge also but still all was good before adding replication
I think there is somewhere problem with San too but before telling my Manager I need to make sure there is nothing on my side causing this issue
I can check the missing Indexes but again only counting the # of records .. system is taking so long
April 22, 2014 at 4:22 am
It doesn't matter that everything was good until Replication was added. When you added Replication onto the server, you added an extra load plus you added a whole bunch of storage needed for the tempdb and the regular db. Your I/O increased as well. Therefore, you have a bottleneck issue of some flavor.
I would run a Profile trace at the same time as you run Performance Monitor (the Windows version) while you have Transactional Replication turned on. Check all the related I/O counters in both and have the Profiler Trace include the (under TSQL) SQL:BatchCompleted, SQL:BatchStarting, (under Stored Procedures) RPC:Completed, (under Sessions) ExistingConnection, (under Security Audit) Audit Login, Audit Logout, and (under Transactions) TM: Begin Tran Completed, TM: Begin Tran Starting.
And if you still don't think that's the issue, I suggest you hire a local SQL Server consultant to come into your office for a couple of days to resolve your issues. There's no substitute for having someone on site who can look at your systems setup and then troubleshoot.
April 22, 2014 at 4:36 am
qew420 (4/21/2014)
Some tables are huge also but still all was good before adding replication...
I can check the missing Indexes but again only counting the # of records .. system is taking so long
If some of the tables are huge, then they probably aren't taking advantage of the indexes that exist. Replication does not necessarily use the same indexes as other queries do. And smaller tables could be blocked by the larger tables not moving well.
I think there is somewhere problem with San too but before telling my Manager I need to make sure there is nothing on my side causing this issue
I/O isn't just about hard drives and disk controllers. It is about reading and writing from the database. Hence the reason I suggest running PerfMon and Profiler together. Profiler now has the ability to hook the two traces together and tell you exactly where the bottleneck is. And if the indexes aren't being used, or there aren't indexes for the tables, then those alone would cause a bottleneck before it even gets to the SAN part. But the traces should tell you that.
There are a lot of moving parts in Replication. Where do you have your Distributor? Is it on the Publisher, the Subscriber, or is it a standalone server / instance?
April 24, 2014 at 7:08 am
Hi Brandie,
Thanks for your assistance
The main issue is fixed now as there were missing Indexes, now things are running much better
But just for my knowledge what will be the solution of I/O wait, Latches if SQL DB is having no issues
So let's say after running PerMon or DMV's I have a report now what will be the next step, there is any SQL side things I need to check (DB is optimized already)
OR what are the pointers I need to report to SAN admin?
Let me know if there is a book or step by step guide with a solution of dealing with the I/O Wait etc
Thanks
April 28, 2014 at 4:15 am
Troubleshooting I/O issues depends on the DBA. Everyone does the same set of checks, but different people do it in different orders. Me, I prefer to verify SQL Server first. That way when I say "it's a drive issues" or "it's a network issue", the other admin can't push back on me saying "no, it's SQL Server." If they know I checked my end before going to them, they're a lot more likely to assist me when I do go to them about potential problems.
So, my order of things is: First check SQL Server to see if any issues are there. This includes verifying there's space available on the SAN and the local hard drives. A full C: drive can cause just as many problems as a full SAN where the database files are.
If everything looks good, I check the server paging file (for kicks and giggles) to make sure it's large enough. Then I run PerfMon with a Profiler Trace, using I/O related counters. If I still can't find anything, then I send a ticket up the help desk chain asking for someone to check disk controllers and the physical connections between server and drives. I've had issues before where someone doing maintenance in the server room bumped a cord and it partially pulled out of the socket.
If all of these are correct, I get someone to run network sniffers as well as doing drive diagnostics. Usually I ask for a limited amount of sniffing during the time of worst problems. Sniffers take up a lot of bandwidth, so you do not want them running 24/7.
Remember also, the DBCC commands are your troubleshooting tools of first or second resort.
As an FYI: I have rarely had to go so far as to getting network admin involved for an I/O issue. Network issues usually present as some other problem and are fairly self-evident. I have seen disk controller failures, full local hard drive issues, etc. But usually, the issue lies in bad queries, missing indexes, or a poorly designed database / ETL process.
So always, always check SQL Server first. And never say "it's not SQL" until you can be at least 98% certain you've checked every possibility and googled every error / problem first. There are a lot of moving parts in a database and it only takes 1 tiny little thing to totally hose I/O. (been there, done that, have the souvenir t-shirt).
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply