February 10, 2017 at 6:58 am
We have a periodic problem on our sql server that I'm totally stumped on. This is on SQL Server 2012, clustered server, attached to a SAN. All latest updates and patches are installed. Every so often (maybe once a month totally random) for multiple hours the Disk IO will start spiking on and off. This has no correlation to load on the server. The IO will spike for roughly 90 seconds and then go back down for about the same period of time and repeat this over and over until it just stops. Failing over the cluster and restarts have no affect on the problem. I've narrowed it down to one database on one drive. This particular db has a lot of blob objects and is our largest db (2 TB) . When montioring perf mon the Avg. Disk sec/Transfer for this drive spikes up from the normal .010 to around 0.8. It will stay up around 0.8 for 90 seconds and then drop back to normal. We have checked all the SAN hardware even switching to totally new hardware to try to isolate but it doesn't fix it. Eventually the problem just goes away as mysteriously as it appears. I'm totally stumped any random thoughts appreciated, thanks.
February 10, 2017 at 7:15 am
Anything in the default trace or sys.dm_exec_requests when this is happening?
😎
February 10, 2017 at 7:27 am
No it is completely unrelated to any sql transactions, load is unrelated. It spikes for about 90 seconds then goes back down for about the same time rinse and repeat. I've already checked for any scheduled processes, db and application.
February 10, 2017 at 7:31 am
huntersimon - Friday, February 10, 2017 7:27 AMNo it is completely unrelated to any sql transactions, load is unrelated. It spikes for about 90 seconds then goes back down for about the same time rinse and repeat. I've already checked for any scheduled processes, db and application.
What about the default trace?
😎
February 10, 2017 at 7:35 am
We traced the db wide open for over an hour while the problem persisted and found nothing unusual.
February 10, 2017 at 7:35 am
Windows updates, anti virus scans, monthly reports? Just thinking outside of the box...
What type of SAN do you have? Anything going on like adaptive optimization / tiering?
February 10, 2017 at 7:48 am
adb2303 - Friday, February 10, 2017 7:35 AMWindows updates, anti virus scans, monthly reports? Just thinking outside of the box...What type of SAN do you have? Anything going on like adaptive optimization / tiering?
Its an HP p2000 g3 controller. We have restarted all the controllers and everything and eventually we installed all new SAN hardware and moved one db to its own drive. This has isolated the problem to that drive so we feel we have eliminated all the SAN possibilities. I'll look into adaptive optimization / tiering that is not something I've researched.
February 10, 2017 at 4:42 pm
Use sp_whoisactive in differential mode to see the cumulative amount of work done by spids that run for the duration. I might try 30 or 60 seconds. Also use it in interactive mode in another spid to see real-time work. Be sure to include system spids.
I would also do a profiler trace to see if I can capture things doing a lot of IO. The problem is that it could be bunch of little things that a large-IO filter wouldn't catch. You can find this by doing an aggregate profiler trace analysis. Perhaps something small is getting called iteratively for some reason.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
February 12, 2017 at 11:22 am
Is this a physical server or a VM? If a VM is there anything on the VM host layer going on during that timeframe (VM snapshots, vMotions, storage migrations, etc.)?
Joie Andrew
"Since 1982"
February 17, 2017 at 6:09 am
Its physical, problem has vanished again, still didn't figure out the root cause. Thanks for the ideas.
February 17, 2017 at 3:05 pm
huntersimon - Friday, February 17, 2017 6:09 AMIts physical, problem has vanished again, still didn't figure out the root cause. Thanks for the ideas.
Does EVERY piece of hardware in the server and along the ENTIRE IO stack have updated drivers and firmware? I have seen some VERY bad things due to both of those issues.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
February 17, 2017 at 10:41 pm
Haven't checked to see what the others have said but do you , by any chance, have automatic updates or automatic downloads turned on in Windows or your anti-virus software? If so, you need to turn that off.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 21, 2017 at 1:45 am
Could it be caused by DB file auto-grow?
How much free space do you have in the files?
Do you shrink DB files overnight?
_____________
Code for TallyGenerator
February 21, 2017 at 9:38 am
Sergiy - Tuesday, February 21, 2017 1:45 AMCould it be caused by DB file auto-grow?How much free space do you have in the files?Do you shrink DB files overnight?
DOH!! I HATE it when I miss the obvious possibilities (file growth)!! :hehe:
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply