August 10, 2012 at 4:47 am
Hi all,
Suddenly the performance of one of our sprocs (multiple SELECTs, multiple tables) went from returning data in around 1/2 secs to 10-30secs.
The sproc had not been changed and the data set returned had not changed dramatically either. There had been some firmware updates on the switches between the database and client but this was seen as being too unlikely to cause the issue.
After adding various debugging and logging we noticed that the perfomance would increase from the slow 10-30secs, back to sub-second speeds after one nightly SQL job was run. We picked apart the SQL job and found that the key aspect which resulted in the performance increase was the disabling and re-enabling of an UPDATE trigger on the table.
Just to clarify....the sequence of events:
1. Slow performance of SELECTS
2. Update trigger disabled
3. Update trigger re-enabled
4. Fast performance of SELECTS
I'm not an expert but would really appreciate someone shedding light on what might be going on here. As far as I'm aware an UPDATE trigger should not affect a SELECT statement. Is this something to do with cache plans or statistics being refreshed?
Any clues or advice on debugging would be really appreciated.
Thanks,
Rick
(SQL 2008)
August 10, 2012 at 5:30 am
If I ask myself when I would disable update trigger? The first thing comes to mind is when I have to do some heavy data changes or a thing like that. So my shot in the dark is that it is not the disabling/enabling of trigger. It actually is the other part of the job where data is changed and hence could invoke the statistics update process. If you can give full details of the Job, may be we can help you more precisely.
August 10, 2012 at 5:40 am
Hi Usman,
Thanks for your reply.
We stripped the SQL Job so that it only contained the disable and re-enable trigger SQL commands and the problem still remained and therefore we've ruled out the real purpose of the job that was done while the trigger was disabled.
Regards,
Rick
August 10, 2012 at 6:34 am
ricklivesey (8/10/2012)
Hi Usman,Thanks for your reply.
We stripped the SQL Job so that it only contained the disable and re-enable trigger SQL commands and the problem still remained and therefore we've ruled out the real purpose of the job that was done while the trigger was disabled.
Regards,
Rick
Can you please let us know how did you find that it is actually the job and not the data change or any other job? Are you saying there is no data change happening during/after job execution? What if you do not run the job, but just update the statistics of the table? The problem remains the same?
Can you please post the execution plans of the procedure for both prior and after job execution?
August 10, 2012 at 7:01 am
Hi, I am working on a similar problem. A select statement as part of a SP with some joins performed very bad because it used a 'Clustered Index Scan' where it is not necessary. If I change SQL Server configuration for memory (like your trigger) 'Maximum server memory' by incrementing it by only 1MB while the SP is running, the QueryOptimizer changes the query plan and takes an other order to execute the joins, so that the scan is not necessary. I did not found the reason for that and therefore I got no solution for that problem.
If you have any solution, please post it!!!
Thomas
August 10, 2012 at 7:21 am
This was removed by the editor as SPAM
August 10, 2012 at 7:41 am
Hi Usman,
The debugging we put in wrote the sproc's execution duration to a logging table every time the sproc was accessed (usually in the order of < 10 per second) and so from this we were able to see a very dramatic change in execution durations at a certain point in time - which coincided with the start of the SQL job.
There is always data change on the tables in question but this slow, gradual growth over time could not explain the sudden x30 increase in execution time.
During the next performance degradation period I'll attempt to update the statistics and report back on the results.
Thanks for the suggestion Usman (and Stewart)
R
August 10, 2012 at 7:59 am
The difference in performance is due to either different plans, or blocking.
Can you capture actual query plans from both the slow and fast executions?
August 10, 2012 at 8:05 am
SpringTownDBA (8/10/2012)
The difference in performance is due to either different plans, or blocking.Can you capture actual query plans from both the slow and fast executions?
Actually, now that I think about it, disable/re-enabling a trigger counts as a schema modification. A schema modification would invalidate any cached plans that reference that table, so they'd all get recompiled. You may want to try adding OPTION (RECOMPILE) to the select statements in the stored proc that perform slowly.
August 10, 2012 at 11:57 am
SpringTownDBA (8/10/2012)
The difference in performance is due to either different plans, or blocking.Can you capture actual query plans from both the slow and fast executions?
I will second the two different plans. This is where I also asked for the execution plans.
@rick You can also find through the following tsql whether different plans exist against the stored procedure
SELECT qs.plan_handle, a.attrlist
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) est
CROSS APPLY (SELECT epa.attribute + '=' + convert(nvarchar(127), epa.value) + ' '
FROM sys.dm_exec_plan_attributes(qs.plan_handle) epa
WHERE epa.is_cache_key = 1
ORDER BY epa.attribute
FOR XML PATH('')) AS a(attrlist)
WHERE est.objectid = object_id ('SCHEMANAME.PROCNAME') --REPLACE IT WITH YOUR PROC NAME
AND est.dbid = db_id()
If there will be more than one row then different plans exist for the procedure
August 16, 2012 at 1:54 am
Hi all,
Thanks so much for the quick responses and avenues to explore.
We went straight in and added OPTION (RECOMPILE) to the main SELECT within the sproc and saw immediate performance increases which have been constant over the last few days.
After researching, we now have this crazy-sounding theory to explain the sudden, hude drop-off in performance: the nightly trigger-disabling and re-enabling job caused the cache plans to flush, meaning that a new cache plan would be created the next time the sproc was run. The sproc is called with a single bigint parameter (sorry if I failed to mention this - I didn't realise it could be important), and if that parameter results in the largest dataset being returned then the cache plan is optimised for that haul (parameter sniffing) and any other query may be sub-optimal. Until several weeks ago, unknowingly, the first user to run that sproc (after the cache plan flush) always queried the largest data set, and because he was consistently on the early shift, throughout the day all users' queries for this dataset were speedy, and our other users, who use much smaller datasets, didn't notice the speed impact because their dataset was relatively tiny.
Then that user left the company.
The cache plans were recreated daily, (usually based on the return of a more common, small dataset) and those users that attempted to return the large dataset had to use an inappropriate cache plan and so felt a massive speed penalty.
Sorry for the story-time moment, but it seems to make sense to me.
Today we're trying out
OPTION (OPTIMIZE FOR (@CLID = 2629), OPTIMIZE FOR UNKNOWN)
where 2629 is our large dataset parameter and so far the results are looking very good and, based on the logs, appear to be pleasing all users.
Many thanks again,
Rick
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply