April 4, 2011 at 3:21 pm
First off, I've not had this issue before with this server...at least not until this past Friday afternoon.
It's SQL 2008 EE, version 10.0.4, 64-bit running on Win' 2003 EE x64, 8GB RAM, 16 processors, (it's not clustered), lots of space on both database/log files. I have 6.25 GB of RAM allocated for SQL Server and the rest for the OS.
All Agent jobs (data dumps, ETL processes, FULL Backups, and so forth are unaffected and all appear to be running as they normally would).
One thing to note: when I open the Activity Monitor I can view the Overview, Recent Expenseive Queries, and Resources waits but CANNOT view anything in the Processes panel. When I attempt to open that it get a timed out error: "Timeout expired. The timeout period elapsed prior to completion..., yada, yada, tada"
Also, I tend to look at the Resource Waits panel regularly (as we tend to have high Network IO) and since Friday I've noticed that MEMORY is at the top of the list! Currently it reads:
Wait Category: Memory
Wait Time: 16366
Recent Wait Time (ms/sec): 1039335
Average Waiter Count: 16.4
Cumulative Wait Time (sec): 105655
Again this is unusual.
CPU usage tends to be between 7% and 50%
If more information is needed please let me know what else I can provide or if there's something off the top of your head that sticks out that i can try:.
Thanks
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
April 4, 2011 at 3:32 pm
Get someone to check the physical server and see if one of your Memory Chips bit the dust. While they're doing that...
Get yourself a perfmon open and check for page faults, see if you're going to the swapfile. My guess is you are and that's a large portion of your problem. Doublecheck the machine's reported RAM, and confirm that your server is allocated as you still expect it to be. Also (if you have access) pop open the Task Manager and machine services. See if anything stands out (IE: "When the heck did they install IIS?") or if something is going hard.
EDIT: Oh, yeah, might want to try to crank open sysprocesses and see what your wait_types are, and double check if tempdb has had a heart attack lately (heavy usage, massive sorting, that kind of thing).
Then, when all else fails and everything looks good and you can't find the culprit, go back to the default. Reboot. See if it keeps happening.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
April 4, 2011 at 3:56 pm
Which performance object is that in? Not really sure on how to check and see if the servers swapping files...
FYI - checked the RAM and yes it is what i expected it to be, nothing really new running in the task manager...looked at the wait types...and good god, there's too many to list! is there a specific "bad guy" I should be looking for?
We rebooted the machine earlier today already and that would have cleared up any tempdb issues - right?
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
April 4, 2011 at 4:02 pm
MyDoggieJessie (4/4/2011)
Which performance object is that in? Not really sure on how to check and see if the servers swapping files...
Swapfile: It's under memory, Hard Page Faults/sec or Page Faults/sec.
FYI - checked the RAM and yes it is what i expected it to be, nothing really new running in the task manager...looked at the wait types...and good god, there's too many to list! is there a specific "bad guy" I should be looking for?
We rebooted the machine earlier today already and that would have cleared up any tempdb issues - right?
AH! That'll make life MUCH easier... though my guess is that SQL Server is going to tell us it's waiting for memory. XD Can find out though. RE: Tempdb. Yes and no. Depends on what the heck else is going on. Also, can you confirm you don't have any DB's still trying to recover?
I borrowed this code from here[/url]. Can you tell us what you get from this?
WITH Waits AS
(SELECT wait_type, wait_time_ms / 1000. AS wait_time_s,
100. * wait_time_ms / SUM(wait_time_ms) OVER() AS pct,
ROW_NUMBER() OVER(ORDER BY wait_time_ms DESC) AS rn
FROM sys.dm_os_wait_stats
WHERE wait_type NOT IN ('CLR_SEMAPHORE','LAZYWRITER_SLEEP','RESOURCE_QUEUE','SLEEP_TASK'
,'SLEEP_SYSTEMTASK','SQLTRACE_BUFFER_FLUSH','WAITFOR', 'LOGMGR_QUEUE','CHECKPOINT_QUEUE'
,'REQUEST_FOR_DEADLOCK_SEARCH','XE_TIMER_EVENT','BROKER_TO_FLUSH','BROKER_TASK_STOP','CLR_MANUAL_EVENT'
,'CLR_AUTO_EVENT','DISPATCHER_QUEUE_SEMAPHORE', 'FT_IFTS_SCHEDULER_IDLE_WAIT'
,'XE_DISPATCHER_WAIT', 'XE_DISPATCHER_JOIN', 'SQLTRACE_INCREMENTAL_FLUSH_SLEEP'))
SELECT W1.wait_type,
CAST(W1.wait_time_s AS DECIMAL(12, 2)) AS wait_time_s,
CAST(W1.pct AS DECIMAL(12, 2)) AS pct,
CAST(SUM(W2.pct) AS DECIMAL(12, 2)) AS running_pct
FROM Waits AS W1
INNER JOIN Waits AS W2
ON W2.rn <= W1.rn
GROUP BY W1.rn, W1.wait_type, W1.wait_time_s, W1.pct
HAVING SUM(W2.pct) - W1.pct < 99 OPTION (RECOMPILE); -- percentage threshold
GO
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
April 4, 2011 at 4:32 pm
Cant find that option under memory (this is SQL 2003 EE x64 server) if that matters to the options...
All DB's are up and operational.
Not sure how to paste in an image here but will copy and paste the results from your query below so you can paste it into excel or something:
wait_typewait_time_spctrunning_pct
RESOURCE_SEMAPHORE49971.0226.1526.15
OLEDB44398.4723.2449.39
PREEMPTIVE_OS_PIPEOPS22267.1111.6561.04
BROKER_EVENTHANDLER19703.1310.3171.35
BROKER_RECEIVE_WAITFOR19025.269.9681.31
ASYNC_NETWORK_IO18076.729.4690.77
PAGEIOLATCH_SH5346.392.8093.57
IO_COMPLETION1558.720.8294.38
MSQL_DQ1066.410.5694.94
PREEMPTIVE_OS_WAITFORSINGLEOBJECT955.510.5095.44
CXPACKET937.030.4995.93
PREEMPTIVE_COM_QUERYINTERFACE922.650.4896.42
ASYNC_IO_COMPLETION899.760.4796.89
BACKUPBUFFER883.470.4697.35
LCK_M_IX854.250.4597.80
PAGEIOLATCH_EX568.110.3098.09
BACKUPIO541.900.2898.38
BACKUPTHREAD530.100.2898.65
SOS_SCHEDULER_YIELD475.640.2598.90
LCK_M_IS428.180.2299.13
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
April 4, 2011 at 5:47 pm
MyDoggieJessie (4/4/2011)
Not sure how to paste in an image here but will copy and paste the results from your query below so you can paste it into excel or something:
Even better this way. Btw, just code="plain" usually works for these.
wait_typewait_time_spctrunning_pct
RESOURCE_SEMAPHORE49971.0226.1526.15
OLEDB44398.4723.2449.39
PREEMPTIVE_OS_PIPEOPS22267.1111.6561.04
Memory waits, Linked server waits, and the new wait type without a lot of literature on it yet are your three top ones. Note, that's not 49 seconds of memory waits. That's already divided into seconds. That's 13 and a half (roughly) hours worth of waits that occured. For a server rebooted today.
Find your main linked server that works with this one. Find the queries that work with that linked server. My guess is there's a table or two on the linked server that hit its 'tipping point', and is now shoving the entire table back onto your server instead of running local to itself. Do that a few hundred times....
You're probably going to have to look into doing foreign proc calls to local #tmps and then using the data from there as a quick and dirty redesign for the foreign server calls.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
April 4, 2011 at 7:36 pm
... are your three top ones. Note, that's not 49 seconds of memory waits. That's already divided into seconds. That's 13 and a half (roughly) hours worth of waits that occured. For a server rebooted today.
I don't mean to ask an overly stupid question but how did you get 13.5 hrs from this?
...Find your main linked server that works with this one. Find the queries that work with that linked server. My guess is there's a table or two on the linked server that hit its 'tipping point', and is ...
Nearly every query running on this server hits either a local replicated table or uses the linked server to query some tables containing (+20 million) records...and that can either be via views, temp tables, with some poorly written tsql. That being said, this has all been working great up until this past Friday.
.. You're probably going to have to look into doing foreign proc calls to local #tmps and then using the data from there as a quick and dirty redesign for the foreign server calls.
This has been something I have entertained but there are hundreds of these reports...any other ideas???
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
April 4, 2011 at 7:44 pm
We recently stopped the reports that were running using the linked servers and then started some of them again in a manual process...
I reran the query you provided me with:
wait_typewait_time_spctrunning_pct
PREEMPTIVE_OS_PIPEOPS180980.4739.3139.31
RESOURCE_SEMAPHORE110668.6424.0463.35
OLEDB53578.7611.6474.99
BROKER_EVENTHANDLER31106.336.7681.75
ASYNC_NETWORK_IO27699.906.0287.76
BROKER_RECEIVE_WAITFOR27556.465.9993.75
PAGEIOLATCH_SH9932.822.1695.91
MSQL_DQ2078.420.4596.36
ASYNC_IO_COMPLETION2061.090.4596.81
BACKUPBUFFER2026.660.4497.25
PREEMPTIVE_COM_QUERYINTERFACE1934.660.4297.67
IO_COMPLETION1564.050.3498.01
LCK_M_IX1155.240.2598.26
BACKUPIO1079.190.2398.49
PREEMPTIVE_OS_WAITFORSINGLEOBJECT1068.900.2398.72
BACKUPTHREAD1062.430.2398.95
CXPACKET1000.400.2299.17
I have an "okay" understanding of waits but I guess not enough to fully understand what all these mean in the bigger picture. Any insight you could provide is greatly appreciated!
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
April 4, 2011 at 8:08 pm
MyDoggieJessie (4/4/2011)
I don't mean to ask an overly stupid question but how did you get 13.5 hrs from this?
49971.02 / 60 (seconds) / 60 (minutes).
That's the wait time across every process, it's not one item waiting for 13 hours. 100 threads each waiting a second gives you a 100 second wait time for the server.
Nearly every query running on this server hits either a local replicated table or uses the linked server to query some tables containing (+20 million) records...and that can either be via views, temp tables, with some poorly written tsql. That being said, this has all been working great up until this past Friday.
Yeah, which means either some statistics got stale on a foreign server or one of the tables 'tipped' and is no longer seeking, but scanning, and the optimizer is just dragging everything over. Something changed, and since nothing rolled, it's either data volume related or fragmentation related. I'd go with volume as the first culprit.
This has been something I have entertained but there are hundreds of these reports...any other ideas???
You can attempt to track down the offending culprits by doing this. Throw it into a While Loop with a waitfor on it and let it run for 15 to 20 minutes. Insert the results of this into your friendly temp table, and see what you can figured out from the 'text' column.
SELECT
GETDATE() AS RunAtTime,
r.session_ID,
r.status,
r.start_time,
r.wait_time,
r.command,
s.text
from
sys.dm_exec_requests AS r
CROSS APPLY
( SELECT [text] FROM sys.dm_exec_sql_text( r.sql_handle) ) AS s
where
wait_time > 1000
AND status = 'RUNNING'
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
April 4, 2011 at 8:15 pm
MyDoggieJessie (4/4/2011)
We recently stopped the reports that were running using the linked servers and then started some of them again in a manual process...I have an "okay" understanding of waits but I guess not enough to fully understand what all these mean in the bigger picture. Any insight you could provide is greatly appreciated!
PREEMPTIVE_OS_PIPEOPS180980.4739.3139.31
- I have no idea what this is, but this is going through the roof. I'm assuming it's requests for data from the RAM and it's trying to get something before it needs it, but that's an outright, from somewhere in my back pocket, guess. I can't seem to find anything on this and you might consider pinging Microsoft on it.
RESOURCE_SEMAPHORE110668.6424.0463.35
- Memory Waits. This has about doubled since last pull. You're still in trouble here.
OLEDB53578.7611.6474.99
- Odd, not much more on the linked server wait requests, so your other servers are moving fast, you're just waiting locally to deal with it.
BROKER_EVENTHANDLER31106.336.7681.75
BROKER_RECEIVE_WAITFOR27556.465.9993.75
- Service Broker stuff, no worries, should probably be excluded.
ASYNC_NETWORK_IO27699.906.0287.76
This is how long it waits trying to ship things to other applications (ie: your reporting server). If that's slow, this will increase as SQL waits to ship the results. It will add more pressure.
PAGEIOLATCH_SH9932.822.1695.91
Disk waits, and very low. This is usually one of the highest waits on your server, and is expected to be. Within reason, that is.
The rest are low enough that I'd ignore, except for one thing. CXPacketWait is VERY low. There is very little parellelism going on here, or you're incredibly well balanced data-wise for it. Considering the latter is nearly impossible, I'm going with you probably have a LOT of MAXDOP 1 settings going on. Just an FYI, not a big deal overall.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
April 4, 2011 at 9:37 pm
Our reporting team uses the rs.exe executable from the Reporting Services engine to generate our Corporate reports (using a combination of the RDL files and the sprocs), while this runs (which spawn up whe reporting begins and ultimately dying once reporting completes) it uses xp_cmdshell and generates PREEMPTIVE_OS_PIPEOPS wait type. To the best of my knowledge, it's unavoidable in this usage.
RESOURCE_SEMAPHORE - Any ideas on how to pinpoint this the root cause of this? Currently it's at:
wait_typewait_time_spctrunning_pct
RESOURCE_SEMAPHORE111023.0919.9358.16
Regarding the MAXDOP. We've have lots of issue with parallelism in the past...to the best of my knowledge, we've only a couple sprocs out of hundreds that use MAXDOP=1
Wish I knew more about that...
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
April 4, 2011 at 11:09 pm
MyDoggieJessie (4/4/2011)
First off, I've not had this issue before with this server...at least not until this past Friday afternoon.
90% of the time, all you have to do is find out what the new code they put into production was. Find out what changed (and it certainly could be on the hardware side, as well) on Friday and you'll likely have your answer.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 4, 2011 at 11:47 pm
Looping Myself...
Thanks
April 5, 2011 at 2:03 am
MyDoggieJessie (4/4/2011)
RESOURCE_SEMAPHORE - Any ideas on how to pinpoint this the root cause of this? Currently it's at:
wait_typewait_time_spctrunning_pct
RESOURCE_SEMAPHORE111023.0919.9358.16
Offhand, no. I'd have to research and probably bombard you with questions... and I've got my own stuff blowing up at the moment. 😉
Regarding the MAXDOP. We've have lots of issue with parallelism in the past...to the best of my knowledge, we've only a couple sprocs out of hundreds that use MAXDOP=1
Wish I knew more about that...
Hnh... did someone set the server to default to MAXDOP 1? Might be worth checking the server settings out. Suddenly losing parallelism could cause all sorts of annoyances.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
April 5, 2011 at 6:35 am
forsqlserver (4/4/2011)
Looping Myself...
Please explain.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 23 total)
You must be logged in to reply to this topic. Login to reply