At least once day, an application for one of the databases I manage gets this error:
The query processor ran out of internal resources and could not produce a query plan.
This is a rare event and only expected for extremely complex queries or queries that reference
a very large number of tables or partitions. Please simplify the query.
If you believe you have received this message in error,
contact Customer Support Services for more information.
This error is sent to the client that initiated the query, so it won't appear in the SQL Server error log. Consequently, you may not be aware of it until you receive reports from your users. However, you can set up Extended Events, traces, etc., to capture the error. That's exactly what I did because we were uncertain which specific query was causing this error.
Here is an explanation on this error:
In my experience, this is indeed an uncommon situation. However, the query itself doesn't necessarily have to be highly complex. More often than not, it's an excessively long or inefficient query. For instance, it could involve an IN clause with thousands of values, an overly convoluted query with numerous derived tables, joins, UNION ALL operations, and so on. As a result, the SQL Server query optimizer eventually gives up attempting to find an optimal execution plan for it.
In my specific case, the problematic query consists of over 400 UNION ALL operations! It consistently times out and triggers the error after 20 seconds. The duration after which the query optimizer gives up can vary and depends on various factors.
How you resolve it? From the same article:
Simplify the query by breaking the query into multiple queries along the largest dimension. First, remove any query elements that aren't necessary, then try adding a temp table and splitting the query in two. Note that if you move a part of the query to a subquery, function, or a common table expression that isn't sufficient because they get recombined into a single query by the compiler. You can also, try adding hints to force a plan earlier, for example OPTION (FORCE ORDER).
Why did I highlight the use of 'OPTION (FORCE ORDER)'? Because I've found it to be a reliable workaround to make the query work, especially when you need a quick workaround to address the issue immediately. Afterward, you can focus on implementing a better and more sensible permanent solution, assuming you have the option to rewrite the query, if not you can dynamically apply query hints using the Query Store:
https://learn.microsoft.com/en-us/sql/relational-databases/performance/query-store-hints
The database I'm referring to is a third-party database running on SQL Server version 2019 Standard edition, equipped with 8 cores and 48GB of RAM, with 40GB allocated to the SQL Server. However, it's worth noting that this error can occur in older versions as well as potentially in future versions of SQL Server.
Capturing this error using Extended Events:
Given that the error is sent to the client, if you wish to identify the query or queries that are encountering this error, you can establish an Extended Events trace to capture the necessary information, including the query text. Below is a sample code to create the Extended Events trace:
CREATE EVENT SESSION [trace_error_8623] ON SERVER ADD EVENT sqlserver.error_reported( ACTION( package0.last_error, sqlserver.client_app_name, sqlserver.client_connection_id, sqlserver.database_id, sqlserver.database_name, sqlserver.is_system, sqlserver.plan_handle, sqlserver.query_hash, sqlserver.query_plan_hash, sqlserver.session_id, sqlserver.sql_text, sqlserver.username ) WHERE ([error_number]=(8623))) ADD TARGET package0.event_file(SET filename=N'trace_error_8623.xel') WITH (MAX_MEMORY=4096 KB, EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS, MAX_DISPATCH_LATENCY=30 SECONDS, MAX_EVENT_SIZE=0 KB, MEMORY_PARTITION_MODE=NONE, TRACK_CAUSALITY=OFF, STARTUP_STATE=OFF ) GO
To query/read this XE:
;with cte as ( SELECT top 100 @@SERVERNAME [SQL Server], [file_name], cast(event_data as xml) event_data FROM sys.fn_xe_file_target_read_file('trace_error_8623*.xel', null, null, null) ) select -- TOP 10 [SQL Server] ,en.v.value('../@timestamp', 'varchar(100)') [TimeStamp] ,en.v.value('.', 'varchar(100)') [Error Number] ,em.v.value('.', 'varchar(8000)') [Error Message] ,un.v.value('.', 'varchar(100)') [User Name] ,st.v.value('.', 'varchar(max)') [SQL Text] ,datalength(st.v.value('.', 'varchar(max)')) / 1024 [SQL Size (mb)] ,sid.v.value('.', 'Int') [Session ID] ,qh.v.value('.', 'varchar(8000)') [Query Hash] ,CONCAT('0x', ph.v.value('.', 'varchar(200)')) [Plan Handle] ,dbname.v.value('.', 'varchar(8000)') [DB Name] ,dbid.v.value('.', 'Int') [DB ID] ,can.v.value('.', 'varchar(8000)') [Client App Name] ,cte.event_data [Event Data] from cte CROSS APPLY event_data.nodes('//event/data[@name = "error_number"]') as en(v) CROSS APPLY event_data.nodes('//event/data[@name = "message"]') as em(v) CROSS APPLY event_data.nodes('//event/action[@name = "username"]') as un(v) CROSS APPLY event_data.nodes('//event/action[@name = "sql_text"]') as st(v) CROSS APPLY event_data.nodes('//event/action[@name = "session_id"]') as sid(v) CROSS APPLY event_data.nodes('//event/action[@name = "query_hash"]') as qh(v) CROSS APPLY event_data.nodes('//event/action[@name = "plan_handle"]') as ph(v) CROSS APPLY event_data.nodes('//event/action[@name = "database_name"]') as dbname(v) CROSS APPLY event_data.nodes('//event/action[@name = "database_id"]') as dbid(v) CROSS APPLY event_data.nodes('//event/action[@name = "client_app_name"]') as can(v) ;
In my scenario, it seems that the application's user interface allows users to make selections, and the application constructs a query using UNION ALL for each selected value before executing it. If a user chooses an excessive number of values, the UNION ALL operations accumulate, causing the query optimizer to struggle in finding an optimal execution plan. This consistently occurs after exactly 20 seconds.
As a temporary workaround, I added the query hint 'OPTION (FORCE ORDER),' and the query now completes in just 7 seconds, returning 469 rows.
Alternatively, if I substitute the UNION ALL with an IN clause, the query finishes within a second and also returns 469 rows.
Moreover, by placing all the values into a table variable, the query can complete in less than a second.
It's important to note that the specific solution for encountering this error in one of your queries may vary depending on the query itself.
Resources:
Take a look at this article for a guide on reproducing this error on your SQL server:
When I tested the sample code provided in the article, it would trigger this error within a couple of seconds. Here, the suggested workaround is to include the query hint OPTION (USE HINT ('FORCE_LEGACY_CARDINALITY_ESTIMATION')).
However, adding the query hint OPTION (FORCE ORDER) also resolves the issue.