February 14, 2017 at 5:01 pm
RVO - Tuesday, February 14, 2017 4:02 PMPerry White,
Can you please explain your comment below?
What does it mean "when buffer pool recharges"?Disk usage will increase as the buffer pool recharges.
Nothing happens in SQL Server with stuff that isn't in RAM first. Thus if anything flushes part or most of the stuff that is currently in RAM out so new stuff can come in to be processed then some or even most of that stuff that was flushed out will need to come back into memory so it can be processed. Often those pages are for "hot", active data and they cause other applications that need them to bog down waiting for IO from disk. Remember, memory is MICROSECOND timescale. Rotating disks are often tens to hundreds of milliseconds and even SSDs are a few milliseconds. That is 3-5 ORDERS OF MAGNITUDE slower.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
February 14, 2017 at 5:06 pm
RVO - Tuesday, February 14, 2017 11:25 AMJeff,Is Kevin G. Boles specialty - performance issues?
What is the best way to contact Kevin?
I have about 45000 hours of time invested in the SQL Server relational engine, most of that as an independent consultant (I started my company in 1999). Performance tuning is one of my primary lines of business. Making SQL Server applications go fast and scale high is second only to helping clients do things correctly FROM THE START on my list of favorite things to do. 😎
SQLServerCentral.com has a messaging function you can use to reach out to me. Or you can search the web, linkedin, twitter for TheSQLGuru and you will find me.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
February 14, 2017 at 5:37 pm
Anytime sql server performs an operation on data it must first be read from the disk subsystem into physical memory.
Disk subsystem access times are typically 10-15 milliseconds.
Random access memory operates in the magnitude of nano seconds (1000 nano seconds = 1 micro second).
When you restart a sql server the volatile memory or ram is lost so sql server has to start reading all data required back into the buffer pool which is the area of ram sql server uses for data\index page storage.
Note that Cached plans are also stored in an area of ram.
Generally, it's a bad idea for performance to constantly reboot your sql server, once the cache is fully charged disk access should ideally be at a minimum
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
February 14, 2017 at 5:45 pm
RVO - Tuesday, February 14, 2017 11:25 AMJeff,Is Kevin G. Boles specialty - performance issues?
What is the best way to contact Kevin?
Send Kevin a message through this forum. Here's his contact page. https://www.sqlservercentral.com/Forums/Users/TheSQLGuru. And, yeah... Kevin's specialty is "performance" at the hardware end, the SQL Configuration end, and at the T-SQL end.
Couple of questions.
Correct me if I'm wrong.
---------------------------------------------------------------------------------------------------------------------------------------------------------
For the same query, SQL Server stores multiple plans in cache for different parameters.
If let's say for query A you had 5 good plans (1,2,3,4,5)
and then somebody runs same query with weird rare parameters which creates another cached plan A6
and this plan is very bad. So the impact is that from now on plan A6 might be reused by SQL Server
and it will cause performance degradation.
--------------------------------------------------------------------------------------------------------------------------------------------------------
It's usually the other way around. Some good plan is available but a different query isn't quite right for it but the plan gets used anyway. Or, worse yet... something cleared the good plan and some infrequently used query hit the table first, creating a plan that works well for it, but doesn't work so well for the more frequent queries.
"substantially different" from what?
It'll sound a bit funny but substantially different from similar queries. ORMs are famous for this. We're currently replacing a bit of ORM code that updates 32 or so columns in a single row of a given large table. The WHERE clause contains all 32 columns as criteria. All 32 columns have different criteria when it runs. SQL Server looks at it as being ad hoc and decides that it would be cheaper to recompile than store an execution plan... it takes 2 to 17 seconds to recompile with an average of 7 to 8 seconds and it runs10's of thousands of times during the business day. If I run the query three times (I DO have optimize for ad-hoc queries turned on) with the same criteria, it "only" takes 100 milliseconds because it reused a plan. Still and again, that's just a stupid long time to update a single row especially since the first thing in the WHERE clause is correct identification of the clustered Primary Key for the table.
I use this query to view current cache below.
The output you included looks really light. Did you run that some short time after a DBCC FREEPROCCACHE? If not, then we may have identified a part of the problem... something is driving things out of cache and it may be because you simply don't have enough memory to support the code as it currently is. Combined with the size of your TempDB files, which can get that way because of accidental many-to-many joins, unnecessary scans to hash joins, etc, etc, I'd guess that you have some real code issues. Right click on the instance in the Object Explorer window (press the {f8} key to get there in SSMS if it's not already open) , select {reports}{Standard Reports} and then either of the two "Performance" reports for "Top Queries by Total" to see your top 10 worst queries for CPU or IO. Keep in mind that these are emptied out when you do a reboot or a DBCC FREEPROCCACHE.
Kevin will likely have some tools that work a whole lot better than the canned routines in SQL Server. For example, the two reports I cited only work well on execution plans that are re-used. They won't find ad-hoc single usage queries as a problem and they can be a large part of the problem.
We only manually do UPDATE STATISTICS full scan every 2-3 weeks
Especially since you're doing ETL on the same database, that's not enough especially for such large databases. It may not solve all of the problems you're currently having but it'll help. As a recent example, we have a 45 million row table in Dev that one of the Developers added about 200,000 rows to and then ran a query using criteria against those new rows. 200,000 rows wasn't enough to cause the auto-rebuild of stats. His simple query ran for two hours before he stopped it. He tried again with the same results I tried it and same results (except I only waited 10 minutes). I rebuilt the stats on the table and the query ran in about a second. Keeping stats updated is uber important. I'd be much more aggressive about it if I were in your shoes.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 14, 2017 at 9:07 pm
So Jeff.
Long story short.
No multiple plans for the same query. Correct?
something is driving things out of cache
EXACTLY!
Sometimes I come in the morning and run a simple SELECT COUNT(*) FROM Table1.
It always takes too long to return results in the morning. Let's say 45 seconds for Table1
If I run it in 2 minutes - results get returned in 1 second.
If I wait 10 minutes - again 45 seconds.
Cache gets cleared too fast.
Our Page Life Expectancy very often drops to under 500 milliseconds.
February 14, 2017 at 9:18 pm
RVO - Tuesday, February 14, 2017 9:07 PMSo Jeff.
Long story short.
No multiple plans for the same query. Correct?
something is driving things out of cache
EXACTLY!
Sometimes I come in the morning and run a simple SELECT COUNT(*) FROM Table1.
It always takes too long to return results in the morning. Let's say 45 seconds for Table1
If I run it in 2 minutes - results get returned in 1 second.
If I wait 10 minutes - again 45 seconds.
Cache gets cleared too fast.Our Page Life Expectancy very often drops to under 500 milliseconds.
1) PLE is measured in seconds, not milliseconds.
2) Your scenario of long then slow then long table counts is expected on a server with too little memory for the amount of active data, and a correspondingly "slow" IO subsystem that cannot get data into RAM fast enough to be "acceptable". You didn't have WITH (NOLOCK) on your count query though, so simple blocking of your SELECT could be involved at least some of the time (and you will block DML too, BTW, which is suboptimal for doing a count for a server performance check). I know that you have a 3TB database on the server and don't know what else in addition to that and also that your RAM is 128GB (virtualized, which most clients I have come across screw up, so it could be less physical RAM in reality). That is not sufficient RAM if any reasonable fraction of your 3TB database is active.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
February 15, 2017 at 1:22 am
How about scheduling this query?
-- Find single-use, ad-hoc queries that are bloating the plan cache
SELECT TOP(20) [text] AS [QueryText], cp.size_in_bytes
FROM sys.dm_exec_cached_plans AS cp WITH (NOLOCK)
CROSS APPLY sys.dm_exec_sql_text(plan_handle)
WHERE cp.cacheobjtype = N'Compiled Plan'
AND cp.objtype = N'Adhoc'
AND cp.usecounts = 1
ORDER BY cp.size_in_bytes DESC OPTION (RECOMPILE);
From the discussion I think I can see that finding the culprit is the biggest challenge here. I've got a health check script that has tons of various queries, but the one above is just an extract (I can post more later), maybe you need to run it frequently to pinpoint what is clogging up your memory. Someone suggested your buffer cache is being wiped out, maybe an adhoc query, a weekly task, etc.
Have you got any scheduled tasks or rougue users running SQL queries?
Any huge query that forces SQL Server to put a lot of data blocks into the buffer cache is wiping out what SQL server has been optimizing for the time before the system stalls.
Here's another query:
-- Top cached queries by Execution Count (SQL Server 2012)
SELECT qs.execution_count, qs.total_rows, qs.last_rows, qs.min_rows, qs.max_rows,
qs.last_elapsed_time, qs.min_elapsed_time, qs.max_elapsed_time,
SUBSTRING(qt.TEXT,qs.statement_start_offset/2 +1,
(CASE WHEN qs.statement_end_offset = -1
THEN LEN(CONVERT(NVARCHAR(MAX), qt.TEXT)) * 2
ELSE qs.statement_end_offset END - qs.statement_start_offset)/2)
AS query_text
FROM sys.dm_exec_query_stats AS qs WITH (NOLOCK)
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qt
ORDER BY qs.execution_count DESC OPTION (RECOMPILE);
I would look for "low execution count", but at the same time huge in block reads.
-- Top Cached SPs By Execution Count (SQL Server 2012)
SELECT TOP(250) p.name AS [SP Name], qs.execution_count,
ISNULL(qs.execution_count/DATEDIFF(Second, qs.cached_time, GETDATE()), 0)
AS [Calls/Second],
qs.total_worker_time/qs.execution_count AS [AvgWorkerTime],
qs.total_worker_time AS [TotalWorkerTime],qs.total_elapsed_time, qs.total_elapsed_time/qs.execution_count AS [avg_elapsed_time],
qs.cached_time
FROM sys.procedures AS p WITH (NOLOCK)
INNER JOIN sys.dm_exec_procedure_stats AS qs WITH (NOLOCK)
ON p.[object_id] = qs.[object_id]
WHERE qs.database_id = DB_ID()
ORDER BY qs.execution_count DESC OPTION (RECOMPILE);-- Tells you which cached stored procedures are called the most often
-- This helps you characterize and baseline your workload
-- Top Cached SPs By Total Physical Reads (SQL Server 2012).
-- Physical reads relate to disk I/O pressure
SELECT TOP(25) p.name AS [SP Name],qs.total_physical_reads
AS [TotalPhysicalReads],qs.total_physical_reads/qs.execution_count
AS [AvgPhysicalReads], qs.execution_count, qs.total_logical_reads, qs.total_elapsed_time, qs.total_elapsed_time/qs.execution_count
AS [avg_elapsed_time], qs.cached_time
FROM sys.procedures AS p WITH (NOLOCK)
INNER JOIN sys.dm_exec_procedure_stats AS qs WITH (NOLOCK)
ON p.[object_id] = qs.[object_id]
WHERE qs.database_id = DB_ID()
AND qs.total_physical_reads > 0
ORDER BY qs.total_physical_reads DESC,
qs.total_logical_reads DESC OPTION (RECOMPILE);
-- Lists the top statements by average input/output
-- usage for the current database
SELECT TOP(50) OBJECT_NAME(qt.objectid) AS [SP Name],
(qs.total_logical_reads + qs.total_logical_writes) /qs.execution_count
AS [Avg IO],SUBSTRING(qt.[text],qs.statement_start_offset/2,
(CASE
WHEN qs.statement_end_offset = -1
THEN LEN(CONVERT(nvarchar(max), qt.[text])) * 2
ELSE qs.statement_end_offset
END - qs.statement_start_offset)/2) AS [Query Text]
FROM sys.dm_exec_query_stats AS qs WITH (NOLOCK)
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qt
WHERE qt.[dbid] = DB_ID()
ORDER BY [Avg IO] DESC OPTION (RECOMPILE);-- Helps you find the most expensive statements for I/O by SP
-- Breaks down buffers used by current database
-- by object (table, index) in the buffer cache
SELECT OBJECT_NAME(p.[object_id]) AS [ObjectName],
p.index_id, COUNT(*)/128 AS [Buffer size(MB)], COUNT(*) AS [BufferCount],
p.data_compression_desc AS [CompressionType]
FROM sys.allocation_units AS a WITH (NOLOCK)
INNER JOIN sys.dm_os_buffer_descriptors AS b WITH (NOLOCK)
ON a.allocation_unit_id = b.allocation_unit_id
INNER JOIN sys.partitions AS p WITH (NOLOCK)
ON a.container_id = p.hobt_id
WHERE b.database_id = CONVERT(int,DB_ID())
AND p.[object_id] > 100
GROUP BY p.[object_id], p.index_id, p.data_compression_desc
ORDER BY [BufferCount] DESC OPTION (RECOMPILE);-- Tells you what tables and indexes are
-- using the most memory in the buffer cache
-- When were Statistics last updated on all indexes?
SELECT o.name, i.name AS [Index Name],STATS_DATE(i.[object_id],
i.index_id) AS [Statistics Date], s.auto_created,
s.no_recompute, s.user_created, st.row_count
FROM sys.objects AS o WITH (NOLOCK)
INNER JOIN sys.indexes AS i WITH (NOLOCK)
ON o.[object_id] = i.[object_id]
INNER JOIN sys.stats AS s WITH (NOLOCK)
ON i.[object_id] = s.[object_id]
AND i.index_id = s.stats_id
INNER JOIN sys.dm_db_partition_stats AS st WITH (NOLOCK)
ON o.[object_id] = st.[object_id]
AND i.[index_id] = st.[index_id]
WHERE o.[type] = 'U'
ORDER BY STATS_DATE(i.[object_id], i.index_id) ASC OPTION (RECOMPILE);-- Helps discover possible problems with out-of-date statistics
-- Also gives you an idea which indexes are most active
Given the memory that you've got before you add more you probably need to:
1) Pinpoint the most expensive SQL, locate the TABLE/INDEX
2) Pinpoint which table is the most heavily used.
Cross check them, save the results and monitor.
3) Pinpoint which next big table is being used after the system stops.
February 15, 2017 at 8:10 am
Some clarification.
It is VIRTUAL MACHINE.
HYPERVISOR.
We do have about 100 active SQL Server Agent jobs running:
every 1 minute --------3-4 jobs
every 10 min ---------- 6 jobs
every 30 min ---------- 7 jobs
once a day ------------ 40% of jobs
weekly ------------------ 20% of jobs
A lot of them just execute stored procedures.
Some jobs run SSIS packages that load a file and run stored procedures
Plus daily and weekly databases maintenance jobs.
All the jobs are on the same production server.
Some of the jobs execute heavy queries.
We are looking at rewriting them but we are talking about tens of thousands of lines of code.
This will take a long time.
February 15, 2017 at 8:55 am
As another thought, do you have shared memory turned on for this VM?
I am wondering if at certain times other VM's on that host are getting memory starved so VMWare (or equivalent) is stealing some memory from your SQL box to give to the other VMs.
Might want to check your VM software to verify that something strange isn't happening up a level and that it might not even be related to the SQL box.
Also, when you say:
Sql Server performance gradually deteriorates during 2 weeks period.
We reboot the server
Do you mean you restart the SQL instance or you reboot the physical machine? I would try the DBCC FREEPROCCACHE thing first, but might also want to try restarting the SQL instance instead of the whole computer and see if performance is still slow.
Never know... might be looking at the wrong part of the problem.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
February 15, 2017 at 9:22 am
We reboot the whole server (Windows 2012 Server box).
Yes, It's shared memory and I pointed this out to our Infrastructure team already
but they say they are monitoring if any resource is stealing memory from VM production SQL server.
I think it's called "balooning" when stealing memory/CPU happening.
They say "ballooning" never happened yet.
Here is explanation about what "virtual memory ballooning" means:
http://www.vfrank.org/2013/09/18/understanding-vmware-ballooning/
February 15, 2017 at 9:31 am
RVO - Wednesday, February 15, 2017 9:22 AMWe reboot the whole server (Windows 2012 Server box).Yes, It's shared memory and I pointed this out to our Infrastructure team already
but they say they are monitoring if any resource is stealing memory from VM production SQL server.
I think it's called "balooning" when stealing memory/CPU happening.
They say "ballooning" never happened yet.
Here is explanation about what "virtual memory ballooning" means:
http://www.vfrank.org/2013/09/18/understanding-vmware-ballooning/
Right... Ballooning is what I was thinking of.
It might not hurt to see if performance increases or not with a service restart instead of a full server reboot.
Could also load up perfmon to see if anything strange is happening with resources at that time.
What else runs on that server? When you notice the slowness on the server, what are the resources looking like? Is there high CPU and/or high memory and/or high disk usage? Might want to monitor the numbers each day and when you are noticing the slowness, check what the numbers are like.
Although I could be completely wrong on this. I just know I've investigated SQL Slowness only to find out the antivirus was hogging all the CPU and memory.
EDIT - Although it could still be a VMWare issue. Not sure why I can't post links, but there is a vmware KB article 1002598 that talks about slow disk latency when network traffic gets high. Not sure if it is related, but also not sure it is directly related to SQL either.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
February 15, 2017 at 1:30 pm
Richlion2,
In you Query # 2 (-- Top cached queries by Execution Count (SQL Server 2012))
I would look for "low execution count", but at the same time huge in block reads.
There is no [block reads] column in your SELECT.
February 15, 2017 at 3:56 pm
RVO - Wednesday, February 15, 2017 9:22 AMWe reboot the whole server (Windows 2012 Server box).Yes, It's shared memory and I pointed this out to our Infrastructure team already
but they say they are monitoring if any resource is stealing memory from VM production SQL server.
I think it's called "balooning" when stealing memory/CPU happening.
They say "ballooning" never happened yet.
Here is explanation about what "virtual memory ballooning" means:
http://www.vfrank.org/2013/09/18/understanding-vmware-ballooning/
Whether it's happened or not, it would be worth finding out if there's something hidden about shared memory especially when it comes to SQL Server. As wise men have said, "One proper experiment is worth a thousand expert opinions". I'd turn it off for a month and see if you have the same problem. No one should balk at that because they claim that "ballooning has never happened yet".
--Jeff Moden
Change is inevitable... Change for the better is not.
February 15, 2017 at 9:19 pm
RVO - Wednesday, February 15, 2017 8:10 AMSome clarification.It is VIRTUAL MACHINE.
HYPERVISOR.
We do have about 100 active SQL Server Agent jobs running:
every 1 minute --------3-4 jobs
every 10 min ---------- 6 jobs
every 30 min ---------- 7 jobs
once a day ------------ 40% of jobs
weekly ------------------ 20% of jobs
A lot of them just execute stored procedures.
Some jobs run SSIS packages that load a file and run stored procedures
Plus daily and weekly databases maintenance jobs.
All the jobs are on the same production server.Some of the jobs execute heavy queries.
We are looking at rewriting them but we are talking about tens of thousands of lines of code.
This will take a long time.
Do you have your jobs spread out? There are 60 seconds in a minute, 60 minutes in an hour and 24 hours a day, etc. USE THEM. I look like I walk on water when I see a client with 19 jobs that all start at the default time of 00:00 (or on every hour, etc) and spread them out. POOF!! Total execution time drops by an order of magnitude whilst server load goes down. Resource contention and overload is a KILLER when MX/batch stuff is running.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
February 15, 2017 at 9:20 pm
RVO - Wednesday, February 15, 2017 9:22 AMWe reboot the whole server (Windows 2012 Server box).Yes, It's shared memory and I pointed this out to our Infrastructure team already
but they say they are monitoring if any resource is stealing memory from VM production SQL server.
I think it's called "balooning" when stealing memory/CPU happening.
They say "ballooning" never happened yet.
Here is explanation about what "virtual memory ballooning" means:
http://www.vfrank.org/2013/09/18/understanding-vmware-ballooning/
If it never gets ballooned (which I would question - I've seen so many VM admins at clients have no idea what they are doing) then tell them to give you a fixed reservation for RAM. Do the same for every resource that can be reserved. You simply cannot be successful with your production SQL Servers having their main resources throttled or yanked out from under them,
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
Viewing 15 posts - 31 through 45 (of 114 total)
You must be logged in to reply to this topic. Login to reply