I don’t like the MERGE statement. The syntax is weird, it’s tricky in terms of locks, and it has a history of weird errors. The other day a developer wanted my help in troubleshooting a MERGE that caused an ACCESS VIOLATION error and a stack dump generation. That led me to tweet this:
It seems the MERGE statement will eventually destroy the universe — Matan Yungman (@MatanYungman) February 11, 2014
I may have exaggerated a bit. It might be limited just to our galaxy (thanks Doc Brown). Anyways, we started troubleshooting and eventually figured out the above error happens because of a specific code section (the whole scenario is dependent on other factors and is very hard to reproduce. We opened a bug and I’ll update when I have more details).
The problematic code block looked like this:
DECLARE @Var int = 0; MERGE TargetTable AS target USING (Column_A, Column_B, Column_C FROM SourceTable) AS source (Column_A, Column_B, Column_C) ON (target.Column_A = source.Column_A) WHEN MATCHED THEN UPDATE SET @Var = 1 WHEN NOT MATCHED THEN INSERT (Column_A, Column_B, Column_C) VALUES (Column_A, Column_B, Column_C);
Unlike me, the developer likes MERGE, and what he accomplished here is “Insert where not exists”. But for some reason, updating the variable was problematic.
Meanwhile:
We started getting alerts about an Availability Group that kept failing. I went to Failover Cluster Manager and saw the Availability Group was offline. I brought it back online, but it failed again almost immediately. I did it a few more times, but the Availability Group refused to stay online.
And then I recalled I once saw a video by Jonathan Kehayias, where he talked about “FailureConditionLevel”, a Windows Cluster configuration option that determines the severity at which there will be a cluster Failover:
- 0 – No automatic failover
- 1 – Failover/Restart when the primary node is down
- 2 – Failover/Restart when the SQL Server service is up but not responding
- 3 – Failover/Restart on critical errors like a stack dump
- 4 – Failover/Restart when a resource is unhealthy, for example, memory
- 5 – Failover/Restart on any qualifying condition and when there’s a query processing error
* Based on the output of sp_server_diagnostics. Click here for a more detailed exaplanation
The value in our cluster was the default – 3. Obviously the cluster kept failing and stayed offline because of the stack dumps generated by the MERGE statement. I changed the value to 1, and the cluster resource stopped failing.
We then moved on to fixing the MERGE code, which was very simple: Omitting the WHEN MATCHED part.
Key Takeaways:
- Use caution with MERGE
- Dig into your clusters. Know the various options and decide when a Failover is needed