Problem
Introduced in SQL Server 2005, sys.dm_db_index_usage_stats can be queried to view aggregate read/write information for all indexes in a database. This query is a basic look at what this DMV has to offer:
SELECT GETDATE() AS current_date_time, 'AdventureWorks' AS [database_name], -- Replace with DB_NAME() or the name of the current database DB_TABLES.name AS [table_name], DB_INDEXES.name AS [index_name], DB_INDEX_USAGE_STATS.user_seeks, DB_INDEX_USAGE_STATS.user_scans, DB_INDEX_USAGE_STATS.user_lookups, DB_INDEX_USAGE_STATS.user_updates, DB_INDEX_USAGE_STATS.last_user_seek, DB_INDEX_USAGE_STATS.last_user_scan, DB_INDEX_USAGE_STATS.last_user_lookup, DB_INDEX_USAGE_STATS.last_user_update FROM sys.dm_db_index_usage_stats DB_INDEX_USAGE_STATS INNER JOIN sys.indexes DB_INDEXES ON DB_INDEXES.object_id = DB_INDEX_USAGE_STATS.object_id AND DB_INDEXES.index_id = DB_INDEX_USAGE_STATS.index_id LEFT JOIN sys.tables DB_TABLES ON DB_TABLES.object_id = DB_INDEXES.object_id WHERE DB_INDEX_USAGE_STATS.database_id = (SELECT DB_ID('AdventureWorks')) -- Replace with DB_ID() or ID of current database
In addition, we can take a look at missing index data that is collected by SQL Server. This data requires careful review as most indexes recommended by SQL Server are probably not good indexes to add. Despite that, this information gives us valuable insight into columns where we may lack necessary indexing, queries that are resulting in frequent table scans, or existing indexes that could be altered to improve performance:
SELECT ROW_NUMBER() OVER (ORDER BY INDEX_GROUP_STATS.avg_user_impact) AS id, INDEX_GROUP_STATS.avg_total_user_cost * ( INDEX_GROUP_STATS.avg_user_impact / 100.0 ) * ( INDEX_GROUP_STATS.user_seeks + INDEX_GROUP_STATS.user_scans ) AS improvement_measure, INDEX_DETAILS.statement + ' (' + ISNULL(INDEX_DETAILS.equality_columns, '') + CASE WHEN INDEX_DETAILS.equality_columns IS NOT NULL AND INDEX_DETAILS.inequality_columns IS NOT NULL THEN ',' ELSE '' END + ISNULL(INDEX_DETAILS.inequality_columns, '') + ')' + ISNULL(' INCLUDE (' + INDEX_DETAILS.included_columns + ')', '') AS index_create_statement, INDEX_GROUP_STATS.unique_compiles AS benefiting_compiles, INDEX_GROUP_STATS.user_seeks AS user_seeks, INDEX_GROUP_STATS.last_user_seek AS last_user_seek, INDEX_GROUP_STATS.avg_total_user_cost AS average_total_user_cost, INDEX_GROUP_STATS.avg_user_impact AS average_user_impact, SD.name AS database_name, REVERSE(SUBSTRING(REVERSE(INDEX_DETAILS.statement), 2, (CHARINDEX('[', REVERSE(INDEX_DETAILS.statement), 2)) - 2)) AS table_name, ISNULL((LEN(INDEX_DETAILS.equality_columns) - LEN(REPLACE(REPLACE(INDEX_DETAILS.equality_columns, '[', ''), ']', ''))) / 2, 0) AS equality_columns, ISNULL((LEN(INDEX_DETAILS.inequality_columns) - LEN(REPLACE(REPLACE(INDEX_DETAILS.inequality_columns, '[', ''), ']', ''))) / 2, 0) AS inequality_columns, ISNULL((LEN(INDEX_DETAILS.included_columns) - LEN(REPLACE(REPLACE(INDEX_DETAILS.included_columns, '[', ''), ']', ''))) / 2, 0) AS included_columns FROM sys.dm_db_missing_index_groups INDEX_GROUPS INNER JOIN sys.dm_db_missing_index_group_stats INDEX_GROUP_STATS ON INDEX_GROUP_STATS.group_handle = INDEX_GROUPS.index_group_handle INNER JOIN sys.dm_db_missing_index_details INDEX_DETAILS ON INDEX_GROUPS.index_handle = INDEX_DETAILS.index_handle INNER JOIN sys.databases SD ON SD.database_id = INDEX_DETAILS.database_id WHERE INDEX_GROUP_STATS.avg_total_user_cost * ( INDEX_GROUP_STATS.avg_user_impact / 100.0 ) * ( INDEX_GROUP_STATS.user_seeks + INDEX_GROUP_STATS.user_scans ) > 10 ORDER BY INDEX_GROUP_STATS.avg_total_user_cost * INDEX_GROUP_STATS.avg_user_impact * (INDEX_GROUP_STATS.user_seeks + INDEX_GROUP_STATS.user_scans) DESC
The trouble with this data is that it is accumulated since SQL Server last restarted. If that was last week, then the data may be useful, but if the last restart was a year ago, then it is mostly meaningless. The best use of this data would be to track index utilization regularly in order to identify changes. New reports, software releases, customers, or internal hardware/software changes could all affect usage and knowing immediately how these changes affect our environment will provide a clear path to optimization.
Solution
In order to truly take control of this data, we will need to persist it somewhere on our SQL Server. Weekly data collection would be the most common use case, though daily could also be handy when frequent and significant change is occurring (or is the norm).
Let’s start by creating a table to store this data:
CREATE TABLE dbo.db_index_usage_stats ( sample_date DATETIME, database_name VARCHAR(250), table_name VARCHAR(250), index_name VARCHAR(250) , user_seeks BIGINT, user_scans BIGINT, user_lookups BIGINT, user_updates BIGINT, last_user_seek DATETIME, last_user_scan DATETIME, last_user_lookup DATETIME, last_user_update DATETIME )
If you plan on saving this data for a long period of time, then you’ll want to consider indexes on sample_date, database_name, table_name, or index_name to speed up read access. Disk usage by this data is relatively small, but can add up over a long period of time as well.
Now we need to create a SQL script that can be run regularly to collect and put our missing index data into this table. If only a single database is involved, then the query is relatively simple:
INSERT INTO dbo.db_index_usage_stats ( sample_date , database_name , table_name , index_name , user_seeks , user_scans , user_lookups , user_updates , last_user_seek , last_user_scan , last_user_lookup , last_user_update ) SELECT GETDATE(), 'AdventureWorks' AS [database_name], -- Replace with DB_NAME() or the name of the current database ST.name AS [table_name], SI.name AS [index_name], IUS.user_seeks, IUS.user_scans, IUS.user_lookups, IUS.user_updates, IUS.last_user_seek, IUS.last_user_scan, IUS.last_user_lookup, IUS.last_user_update FROM sys.dm_db_index_usage_stats IUS INNER JOIN sys.indexes SI ON SI.object_id = IUS.object_id AND SI.index_id = IUS.index_id LEFT JOIN sys.tables ST ON ST.object_id = SI.object_id WHERE IUS.database_id = (SELECT DB_ID('AdventureWorks')) -- Replace with DB_ID() or ID of current database
Now we have a table with data on any given index for different collection dates. Since the values are aggregates since SQL Server last restarted, we can find the change over time by calculating the difference between 2 values. For example, if we want to know the number of scans on an index during the month of April, we would subtract the scan count at the start of the month from the count at the end of the month. If you do not want to maintain data forever, then add in a DELETE such as this at the end of your job:
DELETE FROM dbo.index_usage_aggregate_stats WHERE DATEADD(DAY, 180, sample_date) < GETDATE() -- Only keep 180 days of data
The previous example only collected data on a single database, but what if we want index stats on all user databases? The table created above will stay the same, but we will need to change how we collect the data to iterate through each database one at a time using dynamic SQL:
DECLARE @CMD VARCHAR(MAX) -- Will be used for dynamic SQL DECLARE @database VARCHAR(100) -- Will hold name of current database being read DECLARE @current_time DATETIME = GETDATE() /* Cache this date/time now so that we have the same value entered for all databases */DECLARE @database_count INT -- Stores total number of user databases on this instance DECLARE @count INT = 1 -- Used in loop to keep track of current database SELECT ROW_NUMBER() OVER (ORDER BY name) AS id, SD.name INTO #databases FROM sys.databases SD WHERE SD.name NOT IN ( 'master', 'tempdb', 'model', 'msdb' ) SELECT @database_count = COUNT(*) FROM #databases WHILE (@database_count > 0) AND (@count <= @database_count) -- Loop until we hit all databases BEGIN SELECT @database = name FROM #databases WHERE id = @count SET @CMD = 'USE ' + @database + ' INSERT INTO AdventureWorks.dbo.db_index_usage_stats /* Replace AdventureWorks with the name of the database where your index data resides */ ( sample_date , database_name , table_name , index_name , user_seeks , user_scans , user_lookups , user_updates , last_user_seek , last_user_scan , last_user_lookup , last_user_update ) SELECT ''' + CAST(@current_time AS VARCHAR(25)) + ''' AS sample_date, '''+ @database + ''' AS database_name, DB_TABLES.name AS table_name, DB_INDEXES.name AS index_name, INDEX_USAGE_STATS.user_seeks, INDEX_USAGE_STATS.user_scans, INDEX_USAGE_STATS.user_lookups, INDEX_USAGE_STATS.user_updates, INDEX_USAGE_STATS.last_user_seek, INDEX_USAGE_STATS.last_user_scan, INDEX_USAGE_STATS.last_user_lookup, INDEX_USAGE_STATS.last_user_update FROM ' + @database + '.sys.dm_db_index_usage_stats INDEX_USAGE_STATS INNER JOIN ' + @database + '.sys.indexes DB_INDEXES ON DB_INDEXES.object_id = INDEX_USAGE_STATS.object_id AND DB_INDEXES.index_id = INDEX_USAGE_STATS.index_id LEFT JOIN ' + @database + '.sys.tables DB_TABLES ON DB_TABLES.object_id = DB_INDEXES.object_id WHERE INDEX_USAGE_STATS.database_id = (SELECT DB_ID(''' + @database + ''')) ' EXEC (@CMD) SET @count = @count + 1 -- increment our count so that we go on to the next database END DROP TABLE #databases
While index usage statistics are stored for all databases in a single view, index and table data are maintained separately, so in order to collect all of the data we want, we need to loop through each database to complete our data set. With this data, we can compare index usage between different databases. Seeing how real usage differs between development and production or between databases with like schemas can allow us to compare how each database is used and potentially prepare for changes before they occur in production.
This process can be repeated for missing index statistics by replacing the SELECT portion of our INSERT statement with our missing index query from above:
IF NOT EXISTS (SELECT * FROM sys.objects WHERE type = 'U' AND name = 'db_missing_index_usage_stats') BEGIN CREATE TABLE AdventureWorks.dbo.db_missing_index_usage_stats ( -- Create new table for our missing index data sample_date DATETIME, database_name VARCHAR(250), table_name NVARCHAR(4000), improvement_measure FLOAT, index_name NVARCHAR(4000), benefiting_compiles BIGINT, user_seeks BIGINT, last_user_seek DATETIME, average_total_user_cost FLOAT, average_user_impact FLOAT, equality_columns NVARCHAR(4000), inequality_columns NVARCHAR(4000), included_columns NVARCHAR(4000) ) END DECLARE @CMD VARCHAR(MAX) -- Will be used for dynamic SQL DECLARE @database VARCHAR(100) -- Will hold name of current database being read DECLARE @current_time DATETIME = GETDATE() -- Cache this date/time now so that we have the same value entered for all databases DECLARE @database_count INT -- Stores total number of user databases on this instance DECLARE @count INT = 1 -- Used in loop to keep track of current database SELECT ROW_NUMBER() OVER (ORDER BY name) AS id, SD.name INTO #databases FROM sys.databases SD WHERE SD.name NOT IN ( 'master', 'tempdb', 'model', 'msdb' ) SELECT @database_count = COUNT(*) FROM #databases WHILE (@database_count > 0) AND (@count <= @database_count) -- Loop until we hit all databases BEGIN SELECT @database = name FROM #databases WHERE id = @count SET @CMD = 'USE ' + @database + ' INSERT INTO AdventureWorks.dbo.db_missing_index_usage_stats -- Replace AdventureWorks with the name of the database where your index data resides ( sample_date , database_name , table_name , improvement_measure , index_name , benefiting_compiles , user_seeks , last_user_seek , average_total_user_cost , average_user_impact , equality_columns , inequality_columns , included_columns ) SELECT ''' + CAST(@current_time AS VARCHAR(25)) + ''' AS sample_date, '''+ @database + ''' AS database_name, REVERSE(SUBSTRING(REVERSE(INDEX_DETAILS.statement), 2, (CHARINDEX(''['', REVERSE(INDEX_DETAILS.statement), 2)) - 2)) AS table_name, INDEX_GROUP_STATS.avg_total_user_cost * ( INDEX_GROUP_STATS.avg_user_impact / 100.0 ) * ( INDEX_GROUP_STATS.user_seeks + INDEX_GROUP_STATS.user_scans ) AS improvement_measure, INDEX_DETAILS.statement + '' ('' + ISNULL(INDEX_DETAILS.equality_columns, '''') + CASE WHEN INDEX_DETAILS.equality_columns IS NOT NULL AND INDEX_DETAILS.inequality_columns IS NOT NULL THEN '','' ELSE '''' END + ISNULL(INDEX_DETAILS.inequality_columns, '''') + '')'' + ISNULL('' INCLUDE ('' + INDEX_DETAILS.included_columns + '')'', '''') AS index_name, INDEX_GROUP_STATS.unique_compiles AS benefiting_compiles, INDEX_GROUP_STATS.user_seeks AS user_seeks, INDEX_GROUP_STATS.last_user_seek AS last_user_seek, INDEX_GROUP_STATS.avg_total_user_cost AS average_total_user_cost, INDEX_GROUP_STATS.avg_user_impact AS average_user_impact, ISNULL((LEN(INDEX_DETAILS.equality_columns) - LEN(REPLACE(REPLACE(INDEX_DETAILS.equality_columns, ''['', ''''), '']'', ''''))) / 2, 0) AS equality_columns, ISNULL((LEN(INDEX_DETAILS.inequality_columns) - LEN(REPLACE(REPLACE(INDEX_DETAILS.inequality_columns, ''['', ''''), '']'', ''''))) / 2, 0) AS inequality_columns, ISNULL((LEN(INDEX_DETAILS.included_columns) - LEN(REPLACE(REPLACE(INDEX_DETAILS.included_columns, ''['', ''''), '']'', ''''))) / 2, 0) AS included_columns FROM sys.dm_db_missing_index_groups INDEX_GROUPS INNER JOIN sys.dm_db_missing_index_group_stats INDEX_GROUP_STATS ON INDEX_GROUP_STATS.group_handle = INDEX_GROUPS.index_group_handle INNER JOIN sys.dm_db_missing_index_details INDEX_DETAILS ON INDEX_GROUPS.index_handle = INDEX_DETAILS.index_handle INNER JOIN sys.databases SD ON SD.database_id = INDEX_DETAILS.database_id WHERE INDEX_GROUP_STATS.avg_total_user_cost * ( INDEX_GROUP_STATS.avg_user_impact / 100.0 ) * ( INDEX_GROUP_STATS.user_seeks + INDEX_GROUP_STATS.user_scans ) > 10 AND SD.name = ''' + @database + ''' ORDER BY INDEX_GROUP_STATS.avg_total_user_cost * INDEX_GROUP_STATS.avg_user_impact * (INDEX_GROUP_STATS.user_seeks + INDEX_GROUP_STATS.user_scans) DESC ' EXEC (@CMD) SET @count = @count + 1 -- increment our count so that we go on to the next database END DROP TABLE #databases
Conclusion
Collecting metrics for a database environment on a regular basis can greatly improve our ability to troubleshoot problems and more importantly, prevent performance issues before they become emergencies. Having reliable data on your index usage will provide a deeper understanding of how your application works and ways in which usage changes over time. This can easily be accomplished by creating a table to store this information and writing a short script to capture this data on a regular basis.
The same procedure used here for index data can easily be applied to other SQL Server metrics of interest, such as CPU, wait stats, deadlocks, memory consumption, and more.