June 6, 2016 at 10:51 am
Hi,
Today a view of ours slowed down to a crawl. It selects rows from a single table where column exported = 1. Only one process writes to and reads from this table.
A filtered index with all needed columns included supports this view. This view has been fast for the last 5 years and has not changed.
We noticed that excluding the UTCToLocal UDF columns (2 of them) made the view fast again. Thinking this to be the cause, we changed the table to make these 2 columns persisted. Straight away we had better performance, 15 minutes later, the performance was terrible again. Once again removing the columns from the view query improved performance. These 2 columns must be in the view.
Side question: Does checksum page verification play well with persisted computed columns?
The UDF does not access any tables or other server objects.
We have multiple similar versions of the UDF, all of which produce the same result.
The VM guest server has 2 virtual sockets with 6 cores per virtual sockets. No other guests run on this host.
Activity Monitor, resource waits keep changing, meaning there is not a specific wait type to blame.
sp_WhoIsActive dump to table doesn't show any sessions that are unusually high. Focusing mainly on CPU as the UDF does not perform any reads.
sp_BlitzCache shows the expected heavy queries.
No blocking.
No excessive locking.
From activity monitor, CPU 65-80%, 1-3 waiting tasks, I/O <1 MB/second, 30-50 batches per second.
MaxDop 6
Cost threshold for parallelism 20
Other SQL Agent jobs (SSIS and TSQL) have slowed down as well. I mentioned the view because that is reproduceable.
I am really stumped by this and don't know where else to look to find out what's wrong.
A lot of info here, hopefully just what you guys need to help me solve this.
Any help would be hugely appreciated.
Cheers,
Dennis
June 6, 2016 at 10:57 am
it sounds like your UTCToLocal functions are scalar which are notoriously slow, where including some datemath columns as the calculated value would definitely be faster since it would be treated as an inline calculation(table valued equivalent)
if your dates are stored in UTC time, i would add/subtract the timezone difference in minutes(since some zones can be @30 minutes instead of 1 hour)
select datediff(minute,GETUTCDATE(),getdate())
select datediff(minute,GETUTCDATE(),getdate()) ,
dateadd(minute,datediff(minute,GETUTCDATE(),getdate()) ,create_date),
create_date from sys.objects
Lowell
June 6, 2016 at 12:56 pm
Hi Lowell,
Thanks for taking the time to read and reply.
That approach won't work for us as the server is set to UTC and we need to be able to handle Daylight saving Time (DST).
Also Aaron Bertrand mentioned this technique Here[/url] and that it is not recommended.
Unfortunately, implementing his date table will require a lot of code changes and we'll need to keep multiple time-zones in mind.
Can you think of a reason why this reliable, quick UDF is giving us such a headache all of a sudden? Last week rebooting the server helped, today it do not.
We thought the VM CPU settings might not have been optimal, too many queries hitting virtual cores that are too busy. But our changes had no effect.
Is it possible that this is just a symptom of something else?
Could it be that the query optimizer is recalculating the value instead of pulling the data from disk or memory?
Cheers,
Dennis
June 6, 2016 at 1:15 pm
DennisPost (6/6/2016)
Hi Lowell,Thanks for taking the time to read and reply.
That approach won't work for us as the server is set to UTC and we need to be able to handle Daylight saving Time (DST).
Also Aaron Bertrand mentioned this technique Here[/url] and that it is not recommended.
Unfortunately, implementing his date table will require a lot of code changes and we'll need to keep multiple time-zones in mind.
Can you think of a reason why this reliable, quick UDF is giving us such a headache all of a sudden? Last week rebooting the server helped, today it do not.
We thought the VM CPU settings might not have been optimal, too many queries hitting virtual cores that are too busy. But our changes had no effect.
Is it possible that this is just a symptom of something else?
Could it be that the query optimizer is recalculating the value instead of pulling the data from disk or memory?
Cheers,
Dennis
ok, just move the logic your function does into a CASE statement that is inline and part of the view definition.
as soon as you get rid of the function, you'll fix the view performance.
Lowell
June 6, 2016 at 6:08 pm
Lowell (6/6/2016)
DennisPost (6/6/2016)
Hi Lowell,Thanks for taking the time to read and reply.
That approach won't work for us as the server is set to UTC and we need to be able to handle Daylight saving Time (DST).
Also Aaron Bertrand mentioned this technique Here[/url] and that it is not recommended.
Unfortunately, implementing his date table will require a lot of code changes and we'll need to keep multiple time-zones in mind.
Can you think of a reason why this reliable, quick UDF is giving us such a headache all of a sudden? Last week rebooting the server helped, today it do not.
We thought the VM CPU settings might not have been optimal, too many queries hitting virtual cores that are too busy. But our changes had no effect.
Is it possible that this is just a symptom of something else?
Could it be that the query optimizer is recalculating the value instead of pulling the data from disk or memory?
Cheers,
Dennis
ok, just move the logic your function does into a CASE statement that is inline and part of the view definition.
as soon as you get rid of the function, you'll fix the view performance.
It would be helpful if you could post some DDL and sample data (as described in Lowell's signature). An actual (not estimated) execution plan would be helpful as well.
That said, if your UDF is is a scalar function or multi-statement table valued function then replacing it with an inline table valued function or noving the logic into your query would likely solve your problem.
-- Itzik Ben-Gan 2001
June 6, 2016 at 8:38 pm
DennisPost (6/6/2016)
Thinking this to be the cause, we changed the table to make these 2 columns persisted. Straight away we had better performance, 15 minutes later, the performance was terrible again.
The reason why the code initially got fast might be because the column additions cause a bad execution plan to get driven out of cache. The reason why performance got slower again is probably because of the bad form of "Parameter Sniffing" (and the bad execution plan) probably came back.
You might want to add OPTION(RECOMPILE) to whatever code is calling the view.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 7, 2016 at 4:57 am
Thanks for your insights guys!
I had tried OPTION (RECOMPILE) to no avail.
We removed the UDF altogether and replaced it with a simplified DATEADD. This also didn't help.
The problem with just adding the code inline is that it uses variables (Which can be over come) and it is used in other views, stored procedures. SSIS packages and TSQL agent jobs. If it was clearly the problem, then we'd make the changes, but it's not the definite culprit (yet).
We also have another UDF that is now slow. It feels like we've reached some kind tipping point.
A rewrite of this second function resulted in an identical plan, but is 100+x faster.
To investigate further we'll install a third-party monitoring tool. If this helps pinpoint the problem I'll post back.
If you guys can think of anything else in the meantime, please let me know.
June 9, 2016 at 7:51 am
We found more UDFs (No tables or objects involved in of them) that are also all of a sudden very slow.
I still have not found the reason why these UDFs are performing so bad.
After installing a couple of monitoring tools, no clear cause has been found.
Code changes where possible have helped. Unfortunately some of these UDFs are used in persisted computed columns that cannot be altered without downtime.
If the same logic is inline (SELECT) as in a function, where is the overhead when calling the function?
June 9, 2016 at 8:21 am
Have you tried replacing the computed column with a static column?
Does the data in this table get changed frequently through direct connections? That's the main reason to use computed columns.
If the data is only changed through a standard stored procedure, have the procedure calculate the "computed" columns and populate them with static values.
If the data is subject to lots of single-row updates without going through a stored proc, then a trigger might work.
Beyond that, is there really a need to have the data layer calculate a "local" value? In most architectures, it works far better to have the user-interface do that kind of thing. That won't work if this view is used for automation (no UI), but there are similar solutions in that case.
Computed columns are very frequently a problem for performance. Other solutions work much faster in the vast majority of cases.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
June 9, 2016 at 9:14 am
Thanks for joining in GSquared.
The view is just one of many objects that use UDFs. You are right about static values being much better, but alas, that's not how everything is designed.
We have been very happy with the performance of our UDFs for years, up till the sudden performance decrease. To implement replacing all the UDFs, is a major undertaking.
Management insists that UDFs are not the root cause, that something has changed to make all the UDFs slow. I need to establish if there's a hidden "Slow down UDFs" setting somewhere or what caused this all of a sudden.
They are demanding answer and I don't have one.
Here's the code of one of the UDFs.
CREATE FUNCTION [dbo].[GetYearForWeek]
(
@week_nr INT,
@date_time DATETIME
)
RETURNS INT AS
BEGIN
DECLARE @Year INT;
IF DATEPART(Month, @date_time) = 1 AND @week_nr >= 52
SET @Year = DATEPART(Year, @date_time) -1
ELSE
IF DATEPART(Month, @date_time) = 12 AND @week_nr = 1
SET @Year = DATEPART(Year, @date_time) +1
ELSE
SET @Year = DATEPART(Year, @date_time)
return @Year
END
The test code:
SET STATISTICS TIME ON
DECLARE @BatchSize Int = 100000
SELECTTOP (@BatchSize)
[dbo].[GetYearForWeek] (DATEPART(ISO_WEEK, GETDATE()), GETDATE())
FROMsys.all_columns C1, sys.all_columns
SELECTTOP (@BatchSize)
CASE
WHEN DATEPART(Month, GETDATE()) = 1 AND DATEPART(ISO_WEEK, GETDATE()) >= 52 THEN DATEPART(Year, GETDATE()) -1
WHEN DATEPART(Month, GETDATE()) = 12 AND DATEPART(ISO_WEEK, GETDATE()) = 1THEN DATEPART(Year, GETDATE()) + 1
ELSE DATEPART(Year, GETDATE())
END
FROMsys.all_columns C1, sys.all_columns
The STATISTICS TIME results:
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 20 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(100000 row(s) affected)
SQL Server Execution Times:
CPU time = 23681 ms, elapsed time = 51309 ms.
(100000 row(s) affected)
SQL Server Execution Times:
CPU time = 31 ms, elapsed time = 50 ms.
Attached is the execution plan showing identical plans for both queries.
Some references:
June 9, 2016 at 9:51 am
Hugo Kornelis does a good write up of the overhead involved when calling an UDF and the bad decision taken by the query optimizer. A few big SQL names in the comments and a link to an MS connect issue.
Still don't know why the performance hit was so sudden. The servers are even less busy than usual due to holidays.
Hope you guys can enlighten me.
Cheers,
Dennis
June 9, 2016 at 10:09 am
My best guess for the change would be a change in the query plan, specifically where in the query plan the function was being applied. Something like previously the function was being applied after a filter on a small result set, so the overhead was minimal, and now it's being applied before the filter, so the overhead is much more visible.
Do you have copies of the query plans before and after the slowdown?
Cheers!
June 9, 2016 at 11:44 am
The most common cause of "sudden slowdown of what used to be fast" is a change of data volume.
The second most common is hardware degradation (RAID rebuild, etc.).
After that, it's pretty much "check change control to find out what we deployed right before it all slowed down".
If it's a data volume issue (as in the number of rows in the tables went over a threshold), this usually shows up in execution plans. Easiest way to identify that is to take a copy of the database, baseline the performance, reduce the rows, re-baseline, and compare both performance and execution plans between "lots of rows" and "less rows".
Because of how the query optimizer algorithms work, you sometimes hit inflection points where, "it was really fast with 100k rows, but got rear-ended by continental drift when we hit 1M rows". Sometimes the margin between "fast" and "your boss is yelling at you" can be a few thousand rows.
If cutting the rows speeds it back up, you'll have to re-engineer your solution. The UDF may no longer be an option.
If cutting the rows doesn't speed it back up, check for hardware and OS issues. I once saw a very fast database slowed down to a snail's pace because someone in Ops decided all disk access needed to be scanned by antivirus software - including the writes from a high-volume OLTP database. I've also seen RAID failures slow things down like crazy.
If copying the database over to a different server in order to play with table size (row volume) makes it fast, then the issue is something on the primary server.
Start there.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
June 9, 2016 at 8:40 pm
DennisPost (6/9/2016)
Hugo Kornelis does a good write up of the overhead involved when calling an UDF and the bad decision taken by the query optimizer. A few big SQL names in the comments and a link to an MS connect issue.Still don't know why the performance hit was so sudden. The servers are even less busy than usual due to holidays.
Hope you guys can enlighten me.
Cheers,
Dennis
Like I said, it sounds like a classic symptom of "Bad" Parameter Sniffing... especially with the "fixes" you've done that make it work fast for a while.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 9, 2016 at 11:32 pm
DennisPost (6/9/2016)
Hugo Kornelis does a good write up of the overhead involved when calling an UDF and the bad decision taken by the query optimizer. A few big SQL names in the comments and a link to an MS connect issue.
If the computed columns are persisted all the articles about UDF's are irrelevant.
(Are they really? Check it again)
UDF's are called only when the values in the columns need to be updated.
If you're complaining about SELECTs, and the columns are indeed persisted - forget about UDF's and look for a real reason.
Did you check locking?
Your SELECT may be sitting there waiting till the values in the column being recalculated because another process some fields in possibly another table (tables) which are mentioned in the function definition.
Even if a column is updated with the same values, resulting in no change in the function outcome, SQL Server would need to re-compute the column values for every single record affected by the update.
_____________
Code for TallyGenerator
Viewing 15 posts - 1 through 15 (of 21 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy