March 16, 2012 at 1:42 pm
I'm trying to understand what seems like quirky index usage in a database I've just begun to use. Looking for something to read or some general thoughts about what might be going on. I'm missing something...
Situation
The database tables hold information loaded for several years from monthly flat files. Files are large, pre-processed to denormalize the data, and contain no primary keys--although there are some pseudo keys. Each table grows by 0.5 M to 1 M rows per month. Querying is typically by month; usage has been low but will be increasing. Data is only used for reporting but this is not a data warehouse.
Each table has a numeric id that effectively represents the data's month & year (MO_ID). Each table has a clustered index with MO_ID as the first column followed by one or more data columns. Each table also contains a single column non-clustered index on MO_ID. Statistics are updated based on auto-update statistics thresholds. Fragmentation is <1% on all indexes.
I recognize that some of this is less than ideal and that work is needed. All I'm trying to do right now is some initial clean-up of unused or unnecessary indexes.
What seems puzzling
Looking at the tables, it seems as though the non-clustered index on MO_ID could be dropped since it's the first column in the clustered index. When checking index usage, I was surprised to discover that there were times when the non-clustered MO_ID indexes were being used over the clustered indexes.
The question is this
Why might the query analyzer use a non-clustered, single column index over the clustered index that has the same column as the first column in the index? I'm trying understand what might be causing this usage before making a decision about what to do with the MO_ID nonclustered indexes.
March 16, 2012 at 2:31 pm
kl25 (3/16/2012)
What seems puzzlingLooking at the tables, it seems as though the non-clustered index on MO_ID could be dropped since it's the first column in the clustered index. When checking index usage, I was surprised to discover that there were times when the non-clustered MO_ID indexes were being used over the clustered indexes.
Yup, not unusual. Even when there are multiple indexes that are exactly identical often all will show usage. It's got to do with how the optimiser picks indexes.
The question is this
Why might the query analyzer use a non-clustered, single column index over the clustered index that has the same column as the first column in the index? I'm trying understand what might be causing this usage before making a decision about what to do with the MO_ID nonclustered indexes.
The nonclustered index is narrower, so fewer IOs if all that's needed is that single column. The clustered index has the entire data pages at the leaf level, so it's generally the largest index on the table.
This doesn't mean that the single column index can't be dropped, it probably can, but there are some reasons to have nonclustered indexes on the clustered index key, usually when there's some query that absolutely must be as fast as possible. Not the common situation though.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
March 16, 2012 at 3:17 pm
The nonclustered index is narrower, so fewer IOs if all that's needed is that single column. The clustered index has the entire data pages at the leaf level, so it's generally the largest index on the table.
This doesn't mean that the single column index can't be dropped, it probably can, but there are some reasons to have nonclustered indexes on the clustered index key, usually when there's some query that absolutely must be as fast as possible. Not the common situation though.
Thanks so much for the reply! This gives me a way to think about it. I was very focused on the column ordering and wasn't considering enough the IO implications from the clustered index size.
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply