June 16, 2014 at 10:04 am
That's an understandable question---and I intentionally left out any automatic scripting from the article. Adding and removing indexes is delicate work and should be done carefully with quite a bit of review along the way.
Going over 300 indexes manually definitely sounds painful, but I consider it worth the day or two of effort to ensure that no mistakes are made along the way.
If you'd like to run the final duplicate detection script and have it auto-script a DROP INDEX statement with each row returned, you can do it like this:
;WITH CTE_INDEX_DATA AS (
SELECT
SCHEMA_DATA.name AS schema_name,
TABLE_DATA.name AS table_name,
INDEX_DATA.name AS index_name,
STUFF((SELECT ', ' + COLUMN_DATA_KEY_COLS.name + ' ' + CASE WHEN INDEX_COLUMN_DATA_KEY_COLS.is_descending_key = 1 THEN 'DESC' ELSE 'ASC' END -- Include column order (ASC / DESC)
FROM sys.tables AS T
INNER JOIN sys.indexes INDEX_DATA_KEY_COLS
ON T.object_id = INDEX_DATA_KEY_COLS.object_id
INNER JOIN sys.index_columns INDEX_COLUMN_DATA_KEY_COLS
ON INDEX_DATA_KEY_COLS.object_id = INDEX_COLUMN_DATA_KEY_COLS.object_id
AND INDEX_DATA_KEY_COLS.index_id = INDEX_COLUMN_DATA_KEY_COLS.index_id
INNER JOIN sys.columns COLUMN_DATA_KEY_COLS
ON T.object_id = COLUMN_DATA_KEY_COLS.object_id
AND INDEX_COLUMN_DATA_KEY_COLS.column_id = COLUMN_DATA_KEY_COLS.column_id
WHERE INDEX_DATA.object_id = INDEX_DATA_KEY_COLS.object_id
AND INDEX_DATA.index_id = INDEX_DATA_KEY_COLS.index_id
AND INDEX_COLUMN_DATA_KEY_COLS.is_included_column = 0
ORDER BY INDEX_COLUMN_DATA_KEY_COLS.key_ordinal
FOR XML PATH('')), 1, 2, '') AS key_column_list ,
STUFF(( SELECT ', ' + COLUMN_DATA_INC_COLS.name
FROM sys.tables AS T
INNER JOIN sys.indexes INDEX_DATA_INC_COLS
ON T.object_id = INDEX_DATA_INC_COLS.object_id
INNER JOIN sys.index_columns INDEX_COLUMN_DATA_INC_COLS
ON INDEX_DATA_INC_COLS.object_id = INDEX_COLUMN_DATA_INC_COLS.object_id
AND INDEX_DATA_INC_COLS.index_id = INDEX_COLUMN_DATA_INC_COLS.index_id
INNER JOIN sys.columns COLUMN_DATA_INC_COLS
ON T.object_id = COLUMN_DATA_INC_COLS.object_id
AND INDEX_COLUMN_DATA_INC_COLS.column_id = COLUMN_DATA_INC_COLS.column_id
WHERE INDEX_DATA.object_id = INDEX_DATA_INC_COLS.object_id
AND INDEX_DATA.index_id = INDEX_DATA_INC_COLS.index_id
AND INDEX_COLUMN_DATA_INC_COLS.is_included_column = 1
ORDER BY INDEX_COLUMN_DATA_INC_COLS.key_ordinal
FOR XML PATH('')), 1, 2, '') AS include_column_list,
INDEX_DATA.is_disabled, -- Check if index is disabled before determining which dupe to drop (if applicable)
'DROP INDEX ' + SCHEMA_DATA.name + '.' + TABLE_DATA.name + '.' + INDEX_DATA.name AS drop_statement
FROM sys.indexes INDEX_DATA
INNER JOIN sys.tables TABLE_DATA
ON TABLE_DATA.object_id = INDEX_DATA.object_id
INNER JOIN sys.schemas SCHEMA_DATA
ON SCHEMA_DATA.schema_id = TABLE_DATA.schema_id
WHERE TABLE_DATA.is_ms_shipped = 0
AND INDEX_DATA.type_desc IN ('NONCLUSTERED', 'CLUSTERED')
)
SELECT
*
FROM CTE_INDEX_DATA DUPE1
WHERE EXISTS
(SELECT * FROM CTE_INDEX_DATA DUPE2
WHERE DUPE1.schema_name = DUPE2.schema_name
AND DUPE1.table_name = DUPE2.table_name
AND (DUPE1.key_column_list LIKE LEFT(DUPE2.key_column_list, LEN(DUPE1.key_column_list)) OR DUPE2.key_column_list LIKE LEFT(DUPE1.key_column_list, LEN(DUPE2.key_column_list)))
AND DUPE1.index_name <> DUPE2.index_name)
Please note that you cannot drop everything returned by this report---the query returns all sets of duplicates. For each set, you must decide which to keep and which to drop. Dropping both will leave you with no index at all, so it's important to review everything returned by the query above and decide what to keep or drop before taking any further action.
Again, please use extreme caution when modifying indexes---I am always overly cautious about this sort of thing and try hard to ensure that sufficient research and care is taken before making any big changes, even in a development environment.
June 16, 2014 at 11:08 am
Ed Pollack (6/16/2014)
That's an understandable question---and I intentionally left out any automatic scripting from the article. Adding and removing indexes is delicate work and should be done carefully with quite a bit of review along the way.<snip>
Again, please use extreme caution when modifying indexes---I am always overly cautious about this sort of thing and try hard to ensure that sufficient research and care is taken before making any big changes, even in a development environment.
Thank you. You've gone above and beyond expectations here.
June 16, 2014 at 11:27 am
I've seen some duplicate-looking indexes created intentionally (perhaps) because the creator wanted to make use of an index that covered common queries. So be careful of deleting duplicates that might be present to allow for covered indexes (or similarly which might contain included fields). But here's a question: would a smaller duplicate be faster to use if the query in question was covered by none of the existing indexes?
June 16, 2014 at 12:34 pm
alexander.oss (6/16/2014)
I've seen some duplicate-looking indexes created intentionally (perhaps) because the creator wanted to make use of an index that covered common queries. So be careful of deleting duplicates that might be present to allow for covered indexes (or similarly which might contain included fields). But here's a question: would a smaller duplicate be faster to use if the query in question was covered by none of the existing indexes?
That's an interesting question, and one worth looking at more closely. First off, the duplicate detection here does not take into account included columns, only key columns. Two potential duplicates with different include columns will show up here in these queries---but that is OK as there may be the chance to combine them into a single covering index.
Consider also 2 indexes like these:
Index 1: Column A ASC, Column B ASC
Index 2: Column A ASC
In this case, Index 1 is duplicating the functionality of Index 2. If you run a query on only Column A, then Index 2 would technically be slightly faster as you only need to return data on that single column, and not also on Column B.
That being said, this speed difference is very, very small. The trade off in the other direction is that more indexes mean more writes when they need to be updated.
There's no automated way to handle all of this, and manual review in each of these cases is necessary to ensure that the best decisions are made. Deciding what indexes to keep or drop can be time-consuming given the trade-offs between reads & writes, but these decisions, when done carefully and correctly based on known query data, can greatly improve SQL Server performance.
June 17, 2014 at 8:42 am
Great article!
This helped me find a bunch of overlapping indexes.
Besides the is_Disabled flag, I'd also include the has_filter flag and filter_definition as well.
Most of the duplicates I found had filters.
When looking at included columns I always use a string sorting function to make the comparisons easier.
June 17, 2014 at 8:50 am
DennisPost (6/17/2014)
Great article!This helped me find a bunch of overlapping indexes.
Besides the is_Disabled flag, I'd also include the has_filter flag and filter_definition as well.
Most of the duplicates I found had filters.
When looking at included columns I always use a string sorting function to make the comparisons easier.
For anyone using filtered indexes, those would be useful additions for sure!
June 17, 2014 at 10:48 am
Great article, and very helpful for me especially right now at work.
I'm wondering why you dropped
NCI_Product_Weight_DUPE and
NCI_Product_Weight_OVERLAP
Instead of dropping
NCI_Product_Weight and
NCI_Product_Weight_DUPE
Just leaving you with the index NCI_Product_Weight_OVERLAP.
You said this earlier:
"For queries only on Weight, the new index would be slightly more expensive to use, as ProductModelID also needs to be returned, but the difference is very small compared to the burden of maintaining the other indexes indefinitely."
So I thought you'd keep the index that would be used by more queries, which would be the one that indexed Weight and ProductModelID no?
June 17, 2014 at 10:53 am
kris7233 (6/17/2014)
Great article, and very helpful for me especially right now at work.I'm wondering why you dropped
NCI_Product_Weight_DUPE and
NCI_Product_Weight_OVERLAP
Instead of dropping
NCI_Product_Weight and
NCI_Product_Weight_DUPE
Just leaving you with the index NCI_Product_Weight_OVERLAP.
You said this earlier:
"For queries only on Weight, the new index would be slightly more expensive to use, as ProductModelID also needs to be returned, but the difference is very small compared to the burden of maintaining the other indexes indefinitely."
So I thought you'd keep the index that would be used by more queries, which would be the one that indexed Weight and ProductModelID no?
No worries here---I was only dropping those 2 new indexes to remove them from Adventureworks so that any future testing or research isn't somehow affected by the presence of these 2 indexes that do not ship by default with the Adventureworks database.
I always try to cleanup after myself when I mess around in a database that will be reused for other purposes, that way I don't cause headaches later when my old changes muck up any future testing 🙂
June 19, 2014 at 9:13 am
Great scripts, thanks a lot! I've added them to my personal library, with you as the source.
Be still, and know that I am God - Psalm 46:10
June 19, 2014 at 9:18 am
Ed Pollack (6/17/2014)
No worries here---I was only dropping those 2 new indexes to remove them from Adventureworks so that any future testing or research isn't somehow affected by the presence of these 2 indexes that do not ship by default with the Adventureworks database.
I always try to cleanup after myself when I mess around in a database that will be reused for other purposes, that way I don't cause headaches later when my old changes muck up any future testing 🙂
Ah okay I see now. Thanks again for the great article.
June 19, 2014 at 12:31 pm
It is very useful.
It showed statistics as duplicate indexes though.
June 26, 2014 at 1:49 pm
May I suggest an alternative script for the same purpose? It takes into consideration ascending/descending but gives false positives for "mirror image" indexes, e.g.
(c1 , c2 desc, c3) can be reverse scanned as (c1 desc, c2, c3 desc).
Note also, that "subset" includes also identical indexes.
create table t1 (
c1 int,
c2 int,
c3 int,
c4 int,
c5 int);
create index t1x1 on t1(c1);
create index t1x2 on t1(c2);
create index t1x3 on t1(c3);
create index t1x4 on t1(c1,c2);
create index t1x5 on t1(c1,c2,c3);
create index t1x6 on t1(c1,c3);
create index t1x7 on t1(c1 desc, c3);
create index t1x8 on t1(c1 desc, c3) include (c5);
create index t1x9 on t1(c3 , c5) include (c4);
create index t1x10 on t1(c1, c3 desc);
create index t1x11 on t1(c1, c2, c4, c5);
select ix2.name, 'is a subset of', ix1.name
from sys.tables tab,
sys.indexes ix1,
sys.indexes ix2
where tab.name = 't1' and
ix1.object_id = tab.object_id and
ix2.object_id = ix1.object_id and
ix1.index_id <> ix2.index_id and
ix1.type_desc <> 'HEAP' and
ix2.type_desc <> 'HEAP' and
( select count(*) -- count of common key columns in index starting from left
from sys.index_columns ixc1,
sys.index_columns ixc2
where ixc1.object_id = ix1.object_id and
ixc1.index_id = ix1.index_id and
ixc2.object_id = ix2.object_id and
ixc2.index_id = ix2.index_id and
ixc1.key_ordinal = ixc2.key_ordinal and
ixc1.column_id = ixc2.column_id and
ixc1.is_descending_key = ixc2.is_descending_key and
ixc1.is_included_column = 0 and
ixc2.is_included_column = 0
group by ixc1.object_id
having max(ixc1.key_ordinal) = count(*) -- no gaps
)
=
( select count(*) -- count of key columns in ix2
from sys.index_columns ixc2
where ixc2.object_id = ix2.object_id and
ixc2.index_id = ix2.index_id and
ixc2.is_included_column = 0 )
order by ix2.name , ix1.name
;
June 26, 2014 at 1:59 pm
lauri.pietarinen (6/26/2014)
May I suggest an alternative script for the same purpose? It takes into consideration ascending/descending but gives false positives for "mirror image" indexes, e.g.(c1 , c2 desc, c3) can be reverse scanned as (c1 desc, c2, c3 desc).
Note also, that "subset" includes also identical indexes.
create table t1 (
c1 int,
c2 int,
c3 int,
c4 int,
c5 int);
create index t1x1 on t1(c1);
create index t1x2 on t1(c2);
create index t1x3 on t1(c3);
create index t1x4 on t1(c1,c2);
create index t1x5 on t1(c1,c2,c3);
create index t1x6 on t1(c1,c3);
create index t1x7 on t1(c1 desc, c3);
create index t1x8 on t1(c1 desc, c3) include (c5);
create index t1x9 on t1(c3 , c5) include (c4);
create index t1x10 on t1(c1, c3 desc);
create index t1x11 on t1(c1, c2, c4, c5);
select ix2.name, 'is a subset of', ix1.name
from sys.tables tab,
sys.indexes ix1,
sys.indexes ix2
where tab.name = 't1' and
ix1.object_id = tab.object_id and
ix2.object_id = ix1.object_id and
ix1.index_id <> ix2.index_id and
ix1.type_desc <> 'HEAP' and
ix2.type_desc <> 'HEAP' and
( select count(*) -- count of common key columns in index starting from left
from sys.index_columns ixc1,
sys.index_columns ixc2
where ixc1.object_id = ix1.object_id and
ixc1.index_id = ix1.index_id and
ixc2.object_id = ix2.object_id and
ixc2.index_id = ix2.index_id and
ixc1.key_ordinal = ixc2.key_ordinal and
ixc1.column_id = ixc2.column_id and
ixc1.is_descending_key = ixc2.is_descending_key and
ixc1.is_included_column = 0 and
ixc2.is_included_column = 0
group by ixc1.object_id
having max(ixc1.key_ordinal) = count(*) -- no gaps
)
=
( select count(*) -- count of key columns in ix2
from sys.index_columns ixc2
where ixc2.object_id = ix2.object_id and
ixc2.index_id = ix2.index_id and
ixc2.is_included_column = 0 )
order by ix2.name , ix1.name
;
That's a nice approach for the specific problem at hand. I actually submitted a revision (already published) to my original article to add some further functionality to the index searching. The intention of my script was to give the basics of finding overlapping/duplicate indexes and let the reader take it from there.
Currently, the script in the article will return all of your indexes as part of the duplicate-checking. It will not tell us what is a subset of what---that is left to our diligent research.
There are a ton of specific cases where more info is needed. For example, what if there were indexes with different fill factors, paddings, or were set up as hypothetical indexes? We may not want to automate every aspect of this search, but would definitely want to add those elements into the query if they matter to our specific database environment.
I like your query, though, and it's an excellent use of the system views to return a very user-friendly version of the data we are looking for! Thanks for reading, and for the helpful addition!
June 26, 2014 at 2:18 pm
Thank you for your kind words!
Indeed, there can be any number of reasons for overlapping indexes and it certainly would be irresponsible to just go and drop them without further investigation. I would suggest that in practice it is harder to get rid of an index than creating new ones, even in obvious cases!
July 8, 2016 at 2:40 am
I've just run this on one of our databases and it returned 64 rows, which was surprising as our existing "duplicate index" checking script returned 0.
The reason is that our script doesn't match indexes if they have different filter expressions. For example, it doesn't match
CREATE UNIQUE NONCLUSTERED INDEX [IX_T_AccountManualAdjustment_UniqueConstraint_701001] ON [dbo].[T_AccountManualAdjustment]
(
[ClientID] ASC,
[ServiceCategoryID] ASC,
[UniqueIfInterestCharge] ASC
)
WHERE ([Deleted]=(0))
WITH (...)
with
CREATE NONCLUSTERED INDEX [IX_T_AccountManualAdjustment_ClientID] ON [dbo].[T_AccountManualAdjustment]
(
[ClientID] ASC
)
WITH (...)
Our reasoning is that, in this example, we couldn't replace IX_T_AccountManualAdjustment_ClientID with IX_T_AccountManualAdjustment_UniqueConstraint_701001.
Any thoughts? Thanks.
UPDATE. Sorry - I missed that this had already been raised in an earlier post.
Viewing 15 posts - 16 through 30 (of 45 total)
You must be logged in to reply to this topic. Login to reply