Indexes aren’t free, and many databases end up with unused indexes. Every time you make any update to a table you will be updating the clustered index (I assume no heaps, because I hate heaps), and every index that has uses one of the columns that were updated. Inserts and Deletes affect every column and will affect every index, with exceptions for non-typical indexes. These updates cost a lot, to the point that I have several processes that disable certain nonclustered indexes, do all of the work, then rebuild the indexes afterwards.
The problem is that you can’t tell very easily if an index is being used or not. You can look at sys.dm_db_index_usage_stats, but that information is very limited in that gets wiped out on index rebuilds, SQL service restarts (server reboot included), and anytime you ask it to. That’s not mentioning that it also ignores obscure usage of the index. However, the obscure usage is more of something to keep in mind than it is something to base all of your decisions on.
From my point of view, there are two answers to this problem. There isn’t a right and wrong answer, it’s more of a step 1 and step 2.
The long and hard one is to capture the values in sys.dm_db_index_usage_stats on a normal basis (daily, before index maintenance) and give yourself long term statistics that you can determine if an index’s usage justifies its cost. Some indexes are updated 1,000,000 times for two uses. You obviously don’t need to keep this index around all the time, but you’re still paranoid about what those uses were. Were they ad-hoc queries run by a developer, or were they year-end reports run by the CEO? You can get some idea if you look at your history and check what day it was run. I would want to make a rollback script, drop the index, and make a searchable document of indexes I dropped in case anyone complained about year-end reports not working.
That was the long answer, which is right and is not replaced by the short answer. The script below is the short answer. If all of the key fields in an index match, or at least the first three in a larger index, then there’s a good chance that one of the indexes isn’t required or they could be consolidated.
For instance, if you have two indexes that have the same two key fields in the same order but the second index also includes two columns, then you’ll probably see usage on both indexes (see the long-hard solution above). If one index isn’t in use in this situation, your job is easy. If both are used, then look into dropping the index that doesn’t include any columns. Why? Because a slightly larger index rarely increases the overhead too much, and the queries that used the first index aren’t usually impacted too much by using a slightly larger index that has the same foundation.
This area gets shakey on how you should handle it, and is really one of the few areas in SQL Server I consider more of an art. For instance, the script below will show you indexes that have 5 key columns where the first three match. You look and find that both indexes are being used, but users are complaining that updates are a bit slow, too. I’d lean towards condensing by taking the the last two key columns of one index and including them on the other, but which index gets dropped and how do you test it?
Do my uses of “look into”, “rarely”, “usually”, and “too much” seem uncertain to you? Welcome to indexing.
Here are the scripts I use for both the short and long answers mentioned. The first one is for the short answer, giving you indexes that are for the most part duplicated.
DECLARE @KeyFieldMatches INT SET @KeyFieldMatches = 3 --Number of key fields to match in order IF object_id('tempdb..#IndexList') IS NOT NULL BEGIN DROP TABLE #IndexList END IF object_id('tempdb..#IndexListShort') IS NOT NULL BEGIN DROP TABLE #IndexListShort END CREATE TABLE #IndexList ( object_id bigint not null , table_name varchar(150) not null , index_id int not null , index_name varchar(150) not null , index_column int not null , included bit not null , column_name varchar(150) not null , index_type int not null ) CREATE CLUSTERED INDEX IndexList_Clu ON #IndexList (object_id, index_id, index_column, included) CREATE TABLE #IndexListShort ( object_id bigint not null , table_name varchar(150) not null , index_id int not null , index_name varchar(150) not null , column_names_first_n varchar(450) not null , column_names_key varchar(4000) not null , column_names_included varchar(4000) not null ) DECLARE @object_id bigint DECLARE @index_id int DECLARE @List varchar(4000) DECLARE @ListIncl varchar(4000) DECLARE @ListShort varchar(450) INSERT INTO #IndexList SELECT o.object_id , table_name = o.name , i.index_id , index_name = i.name , index_column = ic.index_column_id , included = ic.is_included_column , column_name = c.name , index_type = i.type FROM sys.objects o INNER JOIN sys.indexes i ON o.object_id = i.object_id INNER JOIN sys.index_columns ic ON i.object_id = ic.object_id AND i.index_id = ic.index_id INNER JOIN sys.columns c ON ic.object_id = c.object_id AND ic.column_id = c.column_id SET @object_id = (SELECT TOP 1 object_id FROM #IndexList) SET @index_id = (SELECT TOP 1 index_id FROM #IndexList WHERE object_id = @object_id) WHILE @object_id IS NOT NULL BEGIN SET @List = '' SET @ListIncl = '' SET @ListShort = '' SELECT @List = @List + COALESCE(i.column_name + ', ', '') FROM #IndexList i WHERE i.object_id = @object_id and i.index_id = @index_id and i.included = 0 ORDER BY i.index_column SELECT @List = substring(@List, 0, Len(@List)) SELECT @ListIncl = @ListIncl + COALESCE(i.column_name + ', ', '') FROM #IndexList i WHERE i.object_id = @object_id and i.index_id = @index_id and i.included = 1 ORDER BY i.index_column SELECT @ListIncl = substring(@ListIncl, 0, Len(@ListIncl)) SELECT @ListShort = @ListShort + COALESCE(i.column_name + ', ', '') FROM #IndexList i WHERE i.object_id = @object_id and i.index_id = @index_id and i.included = 0 and index_column <= @KeyFieldMatches ORDER BY i.index_column SELECT @ListShort = substring(@ListShort, 0, Len(@ListShort)) INSERT INTO #IndexListShort SELECT TOP 1 i.object_id , i.table_name , i.index_id , i.index_name , @ListShort , @List , CASE i.index_type --Clustered indexes include everything WHEN 1 THEN '*' ELSE @ListIncl END FROM #IndexList i WHERE i.object_id = @object_id and i.index_id = @index_id DELETE #IndexList WHERE object_id = @object_id and index_id = @index_id SET @object_id = (SELECT TOP 1 object_id FROM #IndexList) SET @index_id = (SELECT TOP 1 index_id FROM #IndexList WHERE object_id = @object_id) END SELECT table_name, index_name, column_names_key, column_names_included FROM #IndexListShort i WHERE EXISTS (SELECT * FROM #IndexListShort i2 WHERE i.object_id = i2.object_id AND i.column_names_first_n = i2.column_names_first_n GROUP BY object_id, column_names_first_n HAVING Count(*) > 1) ORDER BY table_name, column_names_key DROP TABLE #IndexList DROP TABLE #IndexListShort
The long answer takes a bit more work to figure out. You need to store the information long-term, keep a snapshot of what it looked like most recently, and do all the calculations to make sure you have the right numbers. It’s really no different than what I do for other DMVs, so you’ll see this process duplicated a lot if you follow my work. The tables for storing this information:
SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[DMVIndex_Usage_Stats]( [StartTime] [smalldatetime] NOT NULL, [EndTime] [smalldatetime] NOT NULL, [database_id] [smallint] NOT NULL, [object_id] [int] NOT NULL, [index_id] [int] NOT NULL, [user_seeks] [bigint] NOT NULL, [user_scans] [bigint] NOT NULL, [user_lookups] [bigint] NOT NULL, [user_updates] [bigint] NOT NULL, [system_seeks] [bigint] NOT NULL, [system_scans] [bigint] NOT NULL, [system_lookups] [bigint] NOT NULL, [system_updates] [bigint] NOT NULL, CONSTRAINT [PK_DMVIndex_Usage_Stats] PRIMARY KEY CLUSTERED ( [StartTime] ASC, [database_id] ASC, [object_id] ASC, [index_id] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY] ) ON [PRIMARY] GO CREATE TABLE [dbo].[DMVIndex_Usage_Stats_Temp]( [TS] [smalldatetime] NOT NULL, [database_id] [smallint] NOT NULL, [object_id] [int] NOT NULL, [index_id] [int] NOT NULL, [user_seeks] [bigint] NOT NULL, [user_scans] [bigint] NOT NULL, [user_lookups] [bigint] NOT NULL, [user_updates] [bigint] NOT NULL, [system_seeks] [bigint] NOT NULL, [system_scans] [bigint] NOT NULL, [system_lookups] [bigint] NOT NULL, [system_updates] [bigint] NOT NULL, CONSTRAINT [PK_DMVIndex_Usage_Stats_Temp] PRIMARY KEY CLUSTERED ( [TS] ASC, [database_id] ASC, [object_id] ASC, [index_id] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY] ) ON [PRIMARY] GO
And now the script that captures the information and does all of your calculations. You may have noticed that the tables themselves do NOT store the index or table names which was intentional. Name changes happen, tables and indexes get dropped and created, all of that is extra work for little benefit. Is the benefit worth it? I thought it wasn’t. I’m not expecting everyone to agree with me here because there are arguments going the other way. My advice is STEAL MY WORK!!! Don’t copy it, you need to steal it. Make it your own, learn it well enough that you can improve it and build upon it. At that point you’ll truly understand every bit of it and hopefully start up your own blog. I hear EvenSimplerSQLServer.com is available, but that’s subject to change. If you don’t want to go that far, then write your ideas in the comment box below. I change these posts from time to time, and this script was actually added months after the original post was made. Just don’t ask me why it wasn’t here in the first place, my answer is just….DOH!!!! Even saying I was only 6 days into blogging at the time I wrote the original doesn’t make that acceptable.
DECLARE @TS as SmallDateTime DECLARE @Old_TS as SmallDateTime DECLARE @Reboot_TS as SmallDateTime SET @TS = GetDate() SET @Reboot_TS = (SELECT create_date FROM sys.databases WHERE [name] = 'tempdb') --If update wasn't already done this minute -- Delete all but the newest two updates - Only the newest will be used, so one spare is kept -- Add the current stats -- Calculate the usage between the last two and save in the permanent table -- If any of the values go down then the DMV reset for that index --It's a horrible join clause, but it makes everything null and uses the isnull(x,0) --AND N.user_seeks >= O.user_seeks will be hit if database is refreshed, detached, etc. IF 0 = (SELECT Count(*) FROM Perf..DMVIndex_Usage_Stats_Temp WHERE TS = @TS) BEGIN DELETE Perf..DMVIndex_Usage_Stats_Temp WHERE TS NOT IN (SELECT DISTINCT TOP 2 TS FROM Perf..DMVIndex_Usage_Stats_Temp ORDER BY TS DESC) SET @Old_TS = --Newest TS in the temp table (SELECT TOP 1 TS FROM Perf..DMVIndex_Usage_Stats_Temp ORDER BY TS DESC) INSERT INTO Perf.dbo.DMVIndex_Usage_Stats_Temp SELECT @TS , database_id , object_id , index_id , user_seeks , user_scans , user_lookups , user_updates , system_seeks , system_scans , system_lookups , system_updates FROM sys.dm_db_index_usage_stats INSERT INTO Perf..DMVIndex_Usage_Stats SELECT isnull(@Old_TS, @Reboot_TS) , @TS , N.database_id , N.object_id , N.index_id , N.user_seeks - isnull(O.user_seeks, 0) , N.user_scans - isnull(O.user_scans, 0) , N.user_lookups - isnull(O.user_lookups, 0) , N.user_updates - isnull(O.user_updates, 0) , N.system_seeks - isnull(O.system_seeks, 0) , N.system_scans - isnull(O.system_scans, 0) , N.system_lookups - isnull(O.system_lookups, 0) , N.system_updates - isnull(O.system_updates, 0) FROM Perf..DMVIndex_Usage_Stats_Temp N --N = New, O = Old LEFT JOIN Perf..DMVIndex_Usage_Stats_Temp O ON O.TS = @Old_TS AND O.database_id = N.database_id AND O.object_id = N.object_id AND O.index_id = N.index_id AND N.user_seeks >= O.user_seeks AND N.user_scans >= O.user_scans AND N.user_lookups >= O.user_lookups AND N.user_updates >= O.user_updates AND N.system_seeks >= O.system_seeks AND N.system_scans >= O.system_scans AND N.system_lookups >= O.system_lookups AND N.system_updates >= O.system_updates AND @Reboot_TS < O.TS --They're reset on reboot, so don't compare if it was rebooted since last reading WHERE N.TS = @TS END
Related articles
- Indexes – Understanding basic types and their components (simplesqlserver.com)
- Fixing Page Life Expectancy (PLE) (simplesqlserver.com)
Filed under: Indexes, Scripts, SQL Server Tagged: dm_db_index_usage_stats, Index, performance, sys.indexes, sys.index_columns