Deadlocks When Expiring SSRS Dataset Cache

Question

Deadlocks When Expiring SSRS Dataset Cache

chris.o.smith

SSCommitted

Points: 1674
More actions
March 29, 2017 at 3:45 pm

#336678

We are currently on SQL Server 2014, V12.0.5203.0. We use SSRS as our core reporting tool. Most of our report datasets are set up as shared datasets which are cached. The cached datasets are linked to shared schedules. We expire caches linked to shared schedules programatically as a final step in our ETL, which runs several times per day. The code to expire the caches looks like this (passing in the appropriate schedule id):
EXEC dbo.AddEvent @EventType = 'SharedSchedule' , @EventData = @ScheduleID;
This sort of setup is described well here (we're not pre-loading the cache using the NULL delivery provider, but everything else is similar). After the event is added using the code above, we're noticing many deadlocks in the report server tempDB. Here is a sample deadlock graph:

From what I can gather, the "shared schedule event" triggers several calls to the FlushReportFromCache sproc (possibly 1 call for each shared dataset that is linked to the schedule??). This sproc has a delete and an update statement in it. The deadlock info consistently shows the delete statement in this sproc as the deadlock victim, and the update statement as the winner of the deadlock situation:
When all is said and done, we end up with stale data in the cache due to the failed deletes, but the shared schedule shows that it was successfully expired.
How can we avoid these deadlocks while still programatically expiring the cache at the end of an ETL run?
As a side note, I initially posted this question to "ask.sqlservercentral" by mistake, thinking I was posting it to a sql server central forum. I couldn't figure out how to delete that question, so I am reposting it here in the forums.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

chris.o.smith SSCommitted Points: 1674 More actions · Answer 1

A quick update to this - I've also noticed that deadlocks often occur between the delete portion of FlushReportFromCache sproc (always the deadlock victim) and the "Schedule_UpdateExpiration" trigger on the schedule table. This trigger attempts to update the "absoluteExpiration" column in the ExecutionCache table (the same table that FlushReportFromCache is trying to delete from). The trigger fires because the "UpdateTask" sproc in the report server is run at some point during the cache expiration.

It's a bit of a black box to me - how can I see the exact steps that are taken once an event is added to the Events table?

Sue_H SSC Guru Points: 90860 More actions · Answer 2

chris.o.smith - Thursday, March 30, 2017 10:14 AM
A quick update to this - I've also noticed that deadlocks often occur between the delete portion of FlushReportFromCache sproc (always the deadlock victim) and the "Schedule_UpdateExpiration" trigger on the schedule table. This trigger attempts to update the "absoluteExpiration" column in the ExecutionCache table (the same table that FlushReportFromCache is trying to delete from). The trigger fires because the "UpdateTask" sproc in the report server is run at some point during the cache expiration.
It's a bit of a black box to me - how can I see the exact steps that are taken once an event is added to the Events table?

Try an extended events session or a trace to see what is being executed.

Sue

chris.o.smith SSCommitted Points: 1674 More actions · Answer 3

Thanks Sue. As far as I can tell, this is the order of operations after running "EXEC dbo.AddEvent @EventType = 'SharedSchedule' , @EventData = @ScheduleID;":

1. A single "shared schedule" event is added to the events table.
2. This causes subsequent "CacheInvalidateSchedule" events to be added to the events table; it looks like there is 1 for each dataset linked to the shared schedule referenced in #1.
3. The "UpdateTask" sproc is called for each dataset referenced in #2. This sproc updates several columns in the Schedule table.
4. The Schedule table has a trigger on it called "Schedule_UpdateExpiration", which is fired after updates. As mentioned in my previous post, the code here updates the absoluteExpiration column in the ExecutionCache table.
5. The FlushReportFromCache sproc is called for each dataset referenced in #2. As mentioned earlier, this deletes records from the ExecutionCache table.

I've seen the deadlocks occur from steps 4 and 5 running simultaneously, and from step 5 running simultaneously for multiple datasets. It's interesting to note that the FlushReportFromCache sproc has the following comment in it: "VSTS #139360: SQL Deadlock in GetReportForexecution stored procedure". Perhaps this is a known issue already - it certainly seems like a major flaw.

If I run the FlushReportFromCache command manually, 1 at a time, for each dataset that failed to delete due to deadlocks, I am able to clear out the cache. So, for now that appears to be my only (hacky) solution: attempt to run the cache expiration per my normal method, then loop through any datasets that failed to clear out and run FlushReportFromCache. If anybody else has a better solution, please let me know! Who would've thought that expiring a cache would be such a headache...

SQLRNNR SSC Guru Points: 281334 More actions · Answer 4

Just curious, what is the Isolation level of your Reporting database?

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

chris.o.smith SSCommitted Points: 1674 More actions · Answer 5

SQLRNNR - Thursday, March 30, 2017 3:27 PM
Just curious, what is the Isolation level of your Reporting database?

read committed.

SQLRNNR SSC Guru Points: 281334 More actions · Answer 6

chris.o.smith - Thursday, March 30, 2017 3:34 PM
SQLRNNR - Thursday, March 30, 2017 3:27 PM
Just curious, what is the Isolation level of your Reporting database?
read committed.

Thanks

We can see the queries involved with these deadlocks. Since you are already trapping that info, the next step would seem to be either try the snapshot isolation level for this database and then tune the code.

The second part of that is the really hard part since it is MS code causing the deadlocks.

Another option might be to try and alter the rsconfig(?) file for the cache retention. It seems like you may be running hundreds of reports with numerous cached datasets. It may be necessary to try and find a way to split them up somehow.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Sue_H SSC Guru Points: 90860 More actions · Answer 7

Yup I was just thinking along those lines. If they could be expired by time instead of being manually forced out of cache it may work better. CleanExpiredCache is based on time where as FlushReportFromCache is based on manually forcing it. Not sure if it would make a difference or if you'd hit the same issues though since yours isn't common. I didn't really read the article for how you have this setup but it could be that process doesn't work for your scenario for some reason. But then again, it's a stored proc that just shouldn't deadlock like that so you shouldn't be hitting the error anyway.
In theory it would seem that a fairly ugly hack could work where you programmatically disabled and then quickly renabled the caching for those datasets as that type of action should refresh the cache. You'd really want to test that out though since caching can get a bit flaky sometimes and I'm not sure what kind of activity you have on the server when your process runs. And it's a hack but technically not an unsupported action.

Sue

chris.o.smith SSCommitted Points: 1674 More actions · Answer 8

Thanks Sue and Jason. We have 15,000 - 20,000 report runs per day, so the volume is pretty significant. Our ETL runs several times per day. The volume of data that the ETL brings in can vary quite a bit, which results in varying ETL end times. As a result, expiring the cache on a pre-set schedule is tough; that's why we went with the approach of expiring the cache programatically at the end of the ETL. The code that I am using in the ETL (EXEC dbo.AddEvent @EventType = 'SharedSchedule' ...) was picked up by profiling the timed schedule expiration anyway, so I think we would be running into the same issue if I just set the shared schedule to expire on a timed schedule.

I'm not too keen on the idea of altering internal MS sprocs, so I think I'm going to stick with calling the FlushReportFromCache sproc serially for each dataset at the end of the ETL. It's ugly, but I think it will work.