December 27, 2013 at 4:04 am
hi.
I have a strange SQL 2012 mirror issue, on a rather large HP DL980 server (32 CPU's, 256 GB memory), on which I mirror about 40 DB's.
38 of them work just fine, but for some reason the mirror just cant keep up with 2 DB's.
The disks should be able to cope with the load, since the work on the server is well spread over several disks;
2 Fusion IO's (PROD DB and temp),
25 15K SAS disks raid-10 6TB array (ACC DB's)
25 10K SATA disks raid-5 18TB array (DWH and backups, budget choiche)
I know that he last is a slower setup, but it's just used for backup and DWH databases, and two of those DB's happen to be my problem DB's; 700 GB and 170 GB.
There is no excessive queuing on the disks; i can still copy a backup on the same disk from folder to folder and do >80MB/sec, hence; there is enough disk IO left to process the mirror logs faster then the current 100KB/sec'ish speeds.
CPU load over all CPU is low on the mirror, about 4% average, but i can see 2 CPU's running at 100% load.... hmm... 2 problem DB's, 2 CPU's at 100%.... suspicious..... π
I'm, nowhere near the limits of mirroring and the number of working threads is no issue either, i even tried to increase it to 1500, just to sure. (http://support.microsoft.com/kb/2001270)
So I have no clue why those 2 DB's are running so far behind in the log restores.
Does anyone have any ideas, or hints?
Theo (NL)
December 27, 2013 at 9:13 am
Theo Ekelmans (12/27/2013)
hi.I have a strange SQL 2012 mirror issue, on a rather large HP DL980 server (32 CPU's, 256 GB memory), on which I mirror about 40 DB's.
38 of them work just fine, but for some reason the mirror just cant keep up with 2 DB's.
The disks should be able to cope with the load, since the work on the server is well spread over several disks;
2 Fusion IO's (PROD DB and temp),
25 15K SAS disks raid-10 6TB array (ACC DB's)
25 10K SATA disks raid-5 18TB array (DWH and backups, budget choiche)
I know that he last is a slower setup, but it's just used for backup and DWH databases, and two of those DB's happen to be my problem DB's; 700 GB and 170 GB.
There is no excessive queuing on the disks; i can still copy a backup on the same disk from folder to folder and do >80MB/sec, hence; there is enough disk IO left to process the mirror logs faster then the current 100KB/sec'ish speeds.
CPU load over all CPU is low on the mirror, about 4% average, but i can see 2 CPU's running at 100% load.... hmm... 2 problem DB's, 2 CPU's at 100%.... suspicious..... π
I'm, nowhere near the limits of mirroring and the number of working threads is no issue either, i even tried to increase it to 1500, just to sure. (http://support.microsoft.com/kb/2001270)
So I have no clue why those 2 DB's are running so far behind in the log restores.
Does anyone have any ideas, or hints?
Theo (NL)
From what I understand you get ONE thread per database for the tlog read/send on the primary side and ONE thread per database for tlog hardening/replay on the secondary side. My experience and in talking with others is that you get a maximum of maybe 50MB/sec of activity with that one thread, and that is with FusionIO on both ends.
I assume you know it is those two databases that are slow? Given the discussion above I am not sure that you do. It isn't the SIZE of a database that determine's how much mirroring load will be needed - it is the volume of transactions and the amount of tlog generated for those transactions that matter. I can take a 100MB database and saturate mirroring with my laptop generating load.
I would do a profiler trace to see what activity is happening on each primary database so you can find the heavy hitters. I do note that ETL processes on a DWH can EASILY blow out tlog volumes, especially if you do an index defrag process after a load. I have had to change a number of clients over to tlog shipping from mirroring on their EDWs because of that.
I would also do file IO stall and wait stats analysis on the secondary, as well as run some DMVs to find out what activity is consuming those 2 CPUs. Shouldn't be hard to find the culprits.
Without detailed data you are simply hunting and pecking and guessing. Not a good way to troubleshoot ... π
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
December 30, 2013 at 9:23 am
Hi Kevin,
Good point, i was somewhat sparse with the amount of details... here goes:
As you quite rightly pointed out it's not the size of the DB that is important here, it is the size of the log that is important in this case. Well.... the are *huge*.... I have no control over what the BI guy does every night to the DWH, but it is not nice (log wise that is)
And Yes, i'm sure that these 2 DWH databases are the ones that have problems, since they are the only 2 DB's (out of 40+) that show "Unrestored log" in mirror monitor, as well as enourmouse log growth, and there are 2 CPU's running up the electricity bill....
But for good measure, here is how i came to my conclusion:
Profiler has been runnign for over an hour now, and showed nothing of any significant (> 50 ms) CPU usage, hence; it is a LOOOOONG running query, and i'm missing start and end of that SPID.
When i use Adam's sp_WhoIsActive with the switch for system SPID's on, i see
SpID: 67: DBMIRROR_DBM_MUTEX, Wait: 638.823 ms, CPU: 57M, read: 45M, write: 2M, phy read: 2.6M
DBCC inputbuffer(67) comes back empty... no luck there either.
Process Explorer: shows 2 thread id's under SQLSERVR.exe comsuming huge amounts of CPU clycles for ntoskrnl.exe!KeWaitForMultipleObjects, confirming the view in task manager of 2 CPU's maxing out. And i also think that these must be a mirror threads described in kb2001270, since *nothing* else is running on that server besides the maintenance jobs.
When i look at the distribution of this query:
SELECT a.scheduler_id ,current_tasks_count, runnable_tasks_count, work_queue_count, pending_disk_io_count, parent_node_id as Cpu_Numa, is_idle, b.session_id
FROM sys.dm_os_schedulers a
left JOIN sys.dm_os_tasks b on a.active_worker_address = b.worker_address
WHERE a.scheduler_id < 255
order by b.session_id
I see an average 13 threads on each of the 64 CPU, and an average of 6-9 scheduler_id's without an SPID assigned, so there are enough scheduler_id's to server any SPID that needs attention.
Memory: 256 GB, buffer cache hitratio is close to 100%, with an 23 hour PageLifeExpectancy.
DISK: Nn the drive that holds the database files is see an occasional spike in disk queues to 0.7, the database files do report IO-Stalls but that is to be expected for a DWH on a traditional raid array (even it is 25 disks wide)
But what is really buggin me is this: SQL mirror monitor claims 64 - 100 KB/sec for the "current restore rate", but windows TASKMANAGER disk monitor clamed 5 MB / sec Write IO for one of the DWH databases.
In summary: I think that applying this much mirrored log entries overwhelming SQL for some reason i cannot (yet) see.....
I do not see an obvious HW bottleneck except for the 2 maxed out CPU cores which (most likely) are servicing the LOG threads.
Could it be that i stumbled upon an inherent "maximum log/time vs underlying hardware" ratio that mirroring is able to cope with??
p.s.
I'm moving one of my problem DB's from mirror to logshipping, but why would that work better? Both mirror and logshipping are essentially in the business of duplicating log entries on one DB and restoring them on another?
December 30, 2013 at 2:24 pm
You might be running out of restore threads on the mirror and need to enable trace flag 1485. The following is an excerpt from a SAP on SQL whitepaper (page 87) which can be found here:
β’Database mirroring uses parallel threads to restore the transaction log records that were shipped to the mirror instance. By default it uses one restore thread for every 4 CPU threads. In our large SAP ERP encryption example, this still amounted to 16 threads encrypting on the principal, but only 2 threads were restoring on the mirror side.
If your system encounters an imbalance like the one we encountered, you should consider using trace flag 1485 on the mirror instance. This trace flag specifies one restore thread for every CPU thread. In our large SAP example, a 1:1 ratio between encrypting threads on the principal and restoring thread on the mirror could have been achieved with this trace flag.
December 30, 2013 at 2:46 pm
I haven't looked at the deep internals of transaction logging in some time and my brain is frazzled at the moment from a recent total loss of my main computer (thankfully able to restore with minimal loss) and a client that is tits up at the moment. But I believe that there is something other than a one-to-one relationship between the tlog "size" and the amount of data pages that can be affected by the given tlog chunk. So a few hundreds of K of tlog could be (possibly) generating 5MB/sec dirty database page writes.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
January 3, 2014 at 2:32 am
Thanks for the feedback guys π Much appreciated!
I have tried trace flag 1485 on the mirror, just to be sure, but as the Guru theorized; there is probably only one thread applying log entries. I have 32 cores to the work, and only 2 are running at 100%, so no thread starvation here; no threadpool waits either <scratches a suggestion of a *very small* list>. 't was worth a try π
I have moved one of the problem DWH's to a Fusion IO card that has only the tempdb on it (which does next to nothing on a dedicated mirror) to rule out any IO related issues.
And to my dismay; the size of the "unrestored log" of last night's ETL run was even bigger!
As a last step before i escalate to the Microsoft CST, i will go from the deprecated mirroring to HA mirroring.
Yes, yes.... I know there is a only a small chance that it will fix it since it is essentially still mirroring, but I do not see any other options that I can try.
I have even read every KB's above my build level to see if there is any fix that will address the mirror restore speed problem, but nothing is mentioned about log speed.
So HA it is, and after that.... ?
Logshipping does not seem a viable alternative either, since it does the same as mirrorring; except that it "mirrors" log entries in chunks of 15 minutes. My problem seems to lie in the process that applies the log entries to the database file.
And i found i cannot trust the mirror monitor software either, It claims for one DB: 222 GB unsent log + 221 GB unrestored log, while the actual log sizes where 167 GB and 35 GB....
Any more idea guys?
p.s. @TheSQLGuru;
i feel your pain, i had my laptop stolen recently.... and even though you make backups, it takes weeks to get your PC "comfy" again!
January 3, 2014 at 8:23 am
Thanks for the comfort, but I was actually able to restore from an image backup I keep using ShadowProtect (AWESOME software) with a refresh of my important files from a copy I keep with Syncovery (more awesome software).
Back to the mirroring issue - have you done a full IO stall and wait stats run for 3-5 minutes to see how things fall out during that? If you don't have scripts for that PM me and I will get you what I use.
You do seem pretty knowledgeable, but perhaps Microsoft or other "sniper" resources could be what you need to decipher the issue here. If you go with Microsoft remember that the first thing they are going to ask you will probably be "are you patched to the latest build". So you might want to try to do that prophylactically - who knows, you might stumble across something that fixes the problem. Test first obviously... π
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
January 4, 2014 at 3:35 am
Hi Kevin,
I did'nt add the waitstates because on the mirror there are only 2 (Script at end of message)
Top 10 Waits @ primary
WaitTypePercentage
CXPACKET40.70
DBMIRROR_WORKER_QUEUE8.58
LATCH_EX8.40
HADR_FILESTREAM_IOMGR_IOCOMPLETION8.07
DIRTY_PAGE_POLL8.07
SP_SERVER_DIAGNOSTICS_SLEEP8.07
SOS_SCHEDULER_YIELD1.92
BACKUPIO1.75
ASYNC_IO_COMPLETION1.40
ASYNC_NETWORK_IO1.30
All Waits @ mirror
WaitType Percentage
DBMIRROR_DBM_MUTEX 96.89%
DBMIRROR_SEND 2.86%
The mirror is a dedicated fallback, nothing else runs on it except our home grown sql management tooling.
I think i have IO stalls excluded as a root cause, because the restore rate is equally abysmal on the original harddisk, and on a Fusion IO that has allmost no IO besides this DWH DB.
So... <sighs deeply>, unless you or Kevin Kline, who has also graciously agreed to "give a look" come up with an other idea, i will throw myself at the mercy of a Microsoft CST engineer. We have 19 calls left on our gold contract....
But my experience with MS support is that there is a linear relationship between the complexity of the problem, and the number of useless hoops they make you jump through, like adding useless patches. (i have read *all* the available patch notes, and NO patch has any bearing on the mirror issue at hand)
I will try one more thing today, and that is the move from the deprecated mirroring to full blown HA / AG.
Theo π
The exclusion list i use in my script i use is based on some post of Paul Randal, "tell me where it hurts"
WITH Waits AS
(SELECT
wait_type,
wait_time_ms / 1000.0 AS WaitS,
(wait_time_ms - signal_wait_time_ms) / 1000.0 AS ResourceS,
signal_wait_time_ms / 1000.0 AS SignalS,
waiting_tasks_count AS WaitCount,
100.0 * wait_time_ms / SUM (wait_time_ms) OVER() AS Percentage,
ROW_NUMBER() OVER(ORDER BY wait_time_ms DESC) AS RowNum
FROM sys.dm_os_wait_stats
WHERE wait_type NOT IN (
'CLR_SEMAPHORE', 'LAZYWRITER_SLEEP', 'RESOURCE_QUEUE', 'SLEEP_TASK',
'SLEEP_SYSTEMTASK', 'SQLTRACE_BUFFER_FLUSH', 'WAITFOR', 'LOGMGR_QUEUE',
'CHECKPOINT_QUEUE', 'REQUEST_FOR_DEADLOCK_SEARCH', 'XE_TIMER_EVENT', 'BROKER_TO_FLUSH',
'BROKER_TASK_STOP', 'CLR_MANUAL_EVENT', 'CLR_AUTO_EVENT', 'DISPATCHER_QUEUE_SEMAPHORE',
'FT_IFTS_SCHEDULER_IDLE_WAIT', 'XE_DISPATCHER_WAIT', 'XE_DISPATCHER_JOIN', 'BROKER_EVENTHANDLER',
'TRACEWRITE', 'FT_IFTSHC_MUTEX', 'SQLTRACE_INCREMENTAL_FLUSH_SLEEP',
'BROKER_RECEIVE_WAITFOR', 'ONDEMAND_TASK_QUEUE', 'DBMIRROR_EVENTS_QUEUE',
'DBMIRRORING_CMD', 'BROKER_TRANSMITTER', 'SQLTRACE_WAIT_ENTRIES',
'SLEEP_BPOOL_FLUSH', 'SQLTRACE_LOCK')
)
SELECT
W1.wait_type AS WaitType,
CAST (W1.WaitS AS DECIMAL(14, 2)) AS Wait_S,
CAST (W1.ResourceS AS DECIMAL(14, 2)) AS Resource_S,
CAST (W1.SignalS AS DECIMAL(14, 2)) AS Signal_S,
W1.WaitCount AS WaitCount,
CAST (W1.Percentage AS DECIMAL(4, 2)) AS Percentage,
CAST ((W1.WaitS / W1.WaitCount) AS DECIMAL (14, 4)) AS AvgWait_S,
CAST ((W1.ResourceS / W1.WaitCount) AS DECIMAL (14, 4)) AS AvgRes_S,
CAST ((W1.SignalS / W1.WaitCount) AS DECIMAL (14, 4)) AS AvgSig_S
FROM Waits AS W1
INNER JOIN Waits AS W2 ON W2.RowNum <= W1.RowNum
GROUP BY W1.RowNum, W1.wait_type, W1.WaitS, W1.ResourceS, W1.SignalS, W1.WaitCount, W1.Percentage
HAVING SUM (W2.Percentage) - W1.Percentage < 98; -- percentage threshold
January 4, 2014 at 9:44 am
You are correct that Microsoft (and pretty much any technical support these days) is HIGHLY dependent on a number of variables, and for complex stuff it often takes quite a long, frustrating time to get to the resource you need to do any good.
I am out of shots for this one - best of luck with it! Please do a follow-up post when you get it resolved (or a root cause determined) so others can learn from your experience!
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
January 4, 2014 at 3:25 pm
Cheers thus far Kevin,
I'm now running AlwaysOn, an i will keep you informed when i have found a solution,
Theo π
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply