Tracking the Inaccurate Cardinality Estimates

When it comes to query performance, lately the Cardinality Estimates seems to have been a hot topic with each new release of SQL Server. How it is making query optimizer smarter, helping it to generate more and more efficient query execution plans, making queries run faster without you having to change or optimize your code.

At a very basic level, Cardinality Estimator is an estimate of row count for each operation in the query, especially for columns used in filter conditions (aka the WHERE CLAUSE) and columns used in JOIN PREDICATES.

The following description and warning included with the SSMS create Extended Events wizard should be self-explanatory why the cardinality estimates matter.

Inaccurate Cardinality Estimates

Occurs when an operator outputs significantly more rows than estimated by the Query Optimizer. Use this event to identify queries that may be using sub-optimal plans due to cardinality estimate inaccuracy. Using this event can have a significant performance overhead so it should only be used when troubleshooting or monitoring specific problems for brief periods of time.

Wait a minute, so according to that description, its bad for performance when the optimizer underestimates the row count. Does that mean I don't need to worry about if it overestimates the row count? Actually, overestimation is bad for performance too. Maybe not as bad but it still bad enough that you should not ignore it. When optimizer overestimates row count:

The optimizer will allocate too much memory to the query, resulting into wasted memory. If it happens too often then it can starve the SQL Server out of memory, cause spills to disk, hard paging etc.

The query will get executed with a parallel plan when a serial plan would not only have been faster but also avoid possibilities of the infamous CXPACKET waits

The optimize is more likely to choose scanning the entire index vs. seek.

How these estimates are calculated?

While the internal algorithm's evolve and keep changing, in general the Optimizer uses the statistics created on the underlying objects/tables. Statistics are implicitly/automatically created when any index is created, you can also create them separately, independent of any index. And finally, SQL Server will also automatically create statistics it needs if the AUTO_CREATE_STATISTICS setting is ON for the database.
There are three key information in statistics:
The header that includes among other, last update date, number of rows sampled etc.
Index density that measures the uniqueness of a column. Low density indicates higher uniqueness and vice-versa
Histogram that shows number of distinct values in a column and and distribution of the data in it. Take for example the distribution of data for Code column with 5 distinct values:

If there is an index (hence also have statistics on column values) on the Code column, assuming the stats are up to date and accurate, the Optimizer is (rightfully so) more likely to use Seek against that index for the following query because it knows from the statistics that the the index contains much fewer records for Code = 'E'.

SELECT * FROM Blood_Type WHERE Code = 'E';

If the filter condition instead contain Code = 'F', the Optimizer is more likely to not only ignore the index but instead do a full scan on the table/clustered index!

While there are significant improvement in how the CE's are calculated, there will be still instances where those estimates will differ, sometimes significantly enough, with the reality i.e the actual row counts.

I will show you how to use Extended Events feature to capture such anomalies or as the SQL Server documentation sometimes likes to refer to it as "skewed cardinality estimates".

And yes, we will be mindful of not to capture too much information.

Index

Creating the extended event

(If you prefer TSQL scripts instead of a GUI Wizard, you find them towards the end of this article.)

To create the extended event, connect to the SQL Server instance and expand the Management--->Extended Events and right click on the Sessions to launch the New Session Wizard:

Click Next on the Introduction page, if it appears on your screen.

On the next page, enter a name for the XE session, I am calling it xe_inaccurate_cardinality_estimate.

Click Next and on the next page, select Do Not Use a Template.

On the next page, make sure to first click on the Channel drop down and check Debug. This is important. Otherwise the "inaccurate_cardinality_estimate" event will not get displayed.

In the Event Library search box, type "inaccurate_cardinality_estimate" and click on the event name to select it. Click on > button to add the event.

Click Next that will bring up option to capture Global Fields. Here, I will check boxes next to database_id, database_name, sql_text and plan_handle. Click Next.

On the next screen, add any filters you would like to limit the results based on, for example database_id. I am going to add a filter for where the actual row count is > 1000. If you don't have a filter on the actual row count, SQL Server will capture events even when estimated row count is 1 but actual is 2. Would you call it significantly more actual rows than the estimate? Apparently it is for the XE.

And, to make this more effective and practical, you could also add another filter where the estimate row count is significantly higher or lower than the actual row count. For now I will just stick with filter on actual row counts only.

Click Next, which brings up the page to select storage location for the capture events.

Check box in front of "Save data to a file...", which should automatically be filled in with the name of the event "xe_inaccurate_cardinality_estimate" as the storage file name. You could enter a complete path to the event file, if not by default the event files will be stored under MSSQLLog within the installation folder of the instance.

Adjust the Maximum file size, rollover settings to your preferences then click Next

Click Finish to create the event.

Start the extended event

Start the newly created XE session by clicking on it then select Start Session:

Viewing the event data

Right click again on the XE and select Watch Live Data:

You can let the XE run for a while or run some test queries of your own, which is what I did to trigger a test event. You can find the script towards the end of this article.

This screenshot is of an event generated from my test script:

Normally you won't be watching the live data continuously so you can also view all events by right clicking on the event file then select View Target Data:

You can also use a query to read the events from event file. You can find the script I use towards the end of this article.

Clean Up

You can right click on the extended event then select Stop Session.

Right click on the event again and select Delete to remove it.

Note: Deleting the extended event won't delete the event files on the SQL Server.

What can you do to fix the skewed CEs?

In the captured event data, look at the queries and tables/objects referenced in those queries, see if it has proper indexes and statistics and whether they are up to date i.e. statistics are not too old or stale. Update/refresh the statistics if needed.

Availability of the up to date and accurate statistics, whether created implicitly with an index, created manually by a DBA or auto created through AUTO_CREATE_STATISTICS , is the key to improve accuracy of CEs.

Here are some general suggestions:

Check to make sure there are no missing indexes. You can use one of the famous DMV queries you can search online
Enable the database settings AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS
Regularly rebuild/refresh your indexes and statistics. Check out the index/statistics maintenance script from Ola Hallengren at https://ola.hallengren.com/
Keep up with the SQL Server upgrades and patching as they often contain query and optimizer enhancements including how the CE's are calculated and used

TSQL Scripts

Create the extended event

If you need to create this XE more than once, either on same sql server or multiple sql servers, a script is a better way to go.

-- **** FOR TESTING ONLY IN A NON-PRODUCTION SERVER ****

-- CREATE XE
if exists (SELECT * FROM sys.server_event_sessions where name = 'xe_inaccurate_cardinality_estimate')
DROP EVENT SESSION [xe_inaccurate_cardinality_estimate] ON SERVER 
GO
CREATE EVENT SESSION [xe_inaccurate_cardinality_estimate] ON SERVER 
ADD EVENT sqlserver.inaccurate_cardinality_estimate
ADD TARGET package0.event_file(SET filename=N'xe_inaccurate_cardinality_estimate')WITH 
(
MAX_MEMORY=4096 KB,
EVENT_RETENTION_MODE=ALLOW_MULTIPLE_EVENT_LOSS,
MAX_DISPATCH_LATENCY=300 SECONDS,
TRACK_CAUSALITY=OFF,
STARTUP_STATE=OFF
)
GO
-- Start the session
ALTER EVENT SESSION [xe_inaccurate_cardinality_estimate] ON SERVER STATE=START;
-- Stop the session after the workload is executed
-- ALTER EVENT SESSION [xe_inaccurate_cardinality_estimate] ON SERVER STATE=STOP;
/*-- To drop the XE
if exists (SELECT * FROM sys.server_event_sessions where name = 'xe_inaccurate_cardinality_estimate')
DROP EVENT SESSION [xe_inaccurate_cardinality_estimate] ON SERVER 
*/

Trigger an event

Generate sample data/query to trigger the inaccurate cardinality estimates event:

/* **** FOR TESTING ONLY IN A NON-PRODUCTION SERVER ****
-- disable the AUTO_CREATE_STATISTICS
USE [master]
GO
ALTER DATABASE [AdminDBA] SET AUTO_CREATE_STATISTICS OFF
GO
*/
set nocount on
if object_id('ce_test_tbl_people') is not null drop table ce_test_tbl_people
go
-- create test table
create table ce_test_tbl_people
(
id int identity primary key,
name varchar(200) not null,
zipcode int not null
);
-- Insert test data
go
insert into ce_test_tbl_people values('ABC', 12345)
GO 100
insert into ce_test_tbl_people values('ABC', 45678)
GO 200
insert into ce_test_tbl_people values('ABC', 93213)
GO 300
insert into ce_test_tbl_people values('ABC', 74616)
GO 400
insert into ce_test_tbl_people values('ABC', 48287)
GO 5000
insert into ce_test_tbl_people values('ABC', 48287)
GO 5000
insert into ce_test_tbl_people values('ABC', 48287)
GO 5000
insert into ce_test_tbl_people values('ABC', 27646)
GO 5
/* 
RUN A QUERY TO TRIGGER INACCURATE CARDINALITY ESTIATES EVENT
MAKE SURE THE XE session for xe_inaccurate_cardinality_estimate is started
*/
SELECT * FROM ce_test_tbl_people WHERE ZIPCODE = 48287;

View the event data

You can write a query to read events from the event files, which is my preferred method. Here is one such query:

--  READ THE XE FILEs
;with cte as
(
SELECT top 100 
@@SERVERNAME [SQL Server],
[file_name],
                cast(event_data as xml) event_data
FROM sys.fn_xe_file_target_read_file('xe_inaccurate_cardinality_estimate*.xel', null, null, null)
)
select 
 [SQL Server] 
,dn.v.value('.', 'varchar(100)') [Database]
,ar.v.value('.', 'int') [Actual Rows]
,er.v.value('.', 'int') [Estimated Rows]
,st.v.value('.', 'varchar(8000)') [SQL]
,[file_name] [Event File]
,cte.event_data [Event Data]
,dn.v.value('../@timestamp', 'varchar(100)') [TimeStamp]
,CONCAT('0x', ph.v.value('.', 'varchar(200)')) [Plan Handle]
from cte
CROSS APPLY event_data.nodes('//event/data[@name = "actual_rows"]')     as ar(v)
CROSS APPLY event_data.nodes('//event/data[@name = "estimated_rows"]')  as er(v)
CROSS APPLY event_data.nodes('//event/action[@name = "database_name"]') as dn(v)
CROSS APPLY event_data.nodes('//event/action[@name = "plan_handle"]')   as ph(v)
CROSS APPLY event_data.nodes('//event/action[@name = "sql_text"]')      as st(v)
-- where ar.v.value('.', 'int') > 1000
;
-- select * from sys.dm_exec_cached_plans where plan_handle = 0x303630303031303032393830376432613630353362656434386430313030

Stop and delete the extended event

-- DROP THE XE xe_inaccurate_cardinality_estimate
IF exists (SELECT * FROM sys.server_event_sessions where name = 'xe_inaccurate_cardinality_estimate')
BEGIN
IF exists(select * from sys.dm_xe_sessions where name = 'xe_inaccurate_cardinality_estimate')
ALTER EVENT SESSION [xe_inaccurate_cardinality_estimate] ON SERVER STATE=STOP;
DROP EVENT SESSION [xe_inaccurate_cardinality_estimate] ON SERVER 
END

Delete the event files

If you leave the event session running, over time the number of event files will accumulate on the sql server so whether to reclaim the disk space or you have completed your analysis and simply no longer need event files, you can utilize this PowerShell script to remotely delete the event files on the SQL Server. Make sure to replace the value for $sql_instnace_name variable before trying the script:

$sql_instance_name = "TestSQLServerTestInstance"
$sql_query = "SELECT distinct @@SERVERNAME [SQL_Instance],
                SERVERPROPERTY('MachineName') [Computer_Name],
                [file_name]
FROM sys.fn_xe_file_target_read_file('xe_inaccurate_cardinality_estimate*.xel', null, null, null);
"
#If you are using a SQL login with password, at the end of the Invoke-Sqlcmd add: -Username '<SQL Login>' -Password '<password>'
$sql_xe_files      = Invoke-Sqlcmd -ServerInstance $sql_instance_name -Query $sql_query
$sql_xe_file_names = $sql_xe_files.file_name
$sql_host_name     = $sql_xe_files.Computer_Name[1]
$sql_host_fqdn     = [System.Net.Dns]::GetHostEntry($sql_host_name).HostName
$file_object = Invoke-Command -ArgumentList $sql_xe_file_name -computername $sql_host_fqdn -scriptBlock {get-item $Using:sql_xe_file_names}
#Uncomment the next line to actually remove the event files on the server
#$file_object = Invoke-Command -ArgumentList $sql_xe_file_name -computername $sql_host_fqdn -scriptBlock {remove-item $Using:sql_xe_file_names}

Tracking the Inaccurate Cardinality Estimates

Inaccurate Cardinality Estimates

How these estimates are calculated?

Creating the extended event

Start the extended event

Viewing the event data

Clean Up

Clean Up

What can you do to fix the skewed CEs?

TSQL Scripts

Create the extended event

Trigger an event

View the event data

Stop and delete the extended event

Delete the event files

Rate

Share

Share

Rate