Index design on a table where the primary key is not clustered

  • Am I corrrect that in the case of a table where the primary key is not clustered (instead the table is clustered on a single column which is not unique but has frequent searches), that:

    1) There may be utility having another nonclustered index on the primary key column with columns in the “include” list (possibly all or nearly all columns) to avoid lookups on the clustered index. But it does not make sense to have additional columns in the index list (as opposed to the include list) since the pk column is unique anyway.

    2) There is no benefit having additional indexes that start with the column used in the clustered index.

    Assume the table is searched as part of multiple table joins much more often than inserted/updated.

  • The answer to this type of question is always "it depends".

    For #1, yes, a covering index that includes additional columns may improve some query scenarios by preventing a bookmark lookup.

    For #2, an additional non-clustered index that starts with the same column as the clustered index may be useful, if it is covering, and if the index is relatively narrow and represents less disk IO than using the clustered index pages

  • Even if it IS unique, there may be some usefuleness in having an index with the PK and other columns in the key part of the index. The key part of the index helps when the parts are used in the FROM, WHERE, ORDER BY or GROUP BY; the included fields don't play into that at all unless the field it being used in a function inside of those clauses (not the greates of ideas, but sometimes you're stuck).

    So, ...

    Select

    a.a1,

    a.a2,

    b.*

    from

    a

    inner join b on a.id=b.id

    ORDER BY a.datefield

    ..would likely benefit from

    create UNIQUE INDEX ix_a on a(id,datefield) include (a1,a2)

    assuming table a has other fields that are not being shown.

    Of course - I'd usually also have an index with only the PK in the key (and perhaps some stuff in the include).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • If you have a primary key constraint defined as nonclustered, you already have an index on that column, so no worries there.

    Yes, you don't have to include the clustered index column as the leading edge for other indexes. If you have a one column index and a query that includes the column from the clustered index and the one column, I'm pretty sure that should be a covering index seak operation.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Well, I got recklessly bold and replaced all the indexes that started with the primary key column with a single index that included all the columns in those indexes. When profiler started showing scads of slow queries I hurriedly recreated those indexes.

    I conclude that "Ten Centuries" opinion that those other indexes may have utility in providing narrow index scans (for specific queries) is correct.

    The absurd number of indexes on this table (37) bothers me, but I think it's not an easy thing to decide which indexes are superflouous. Perhaps I should be looking at the dynamic management view dm_db_index_usage_stats that shows index usage. Still, just because an index is used doesn't mean another index might not provide almost as good result.

  • Or, I didn't think that through all the way and I'm wrong.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Since you're using 2005, the management views can help you in optimizing your indexes.

    This will show you if and how your existing indexes are used:

    SELECT *

    FROM sys.dm_db_index_usage_stats

    ORDER BY user_updates desc

    This will tell you where additional indexes will help:

    ---

    SELECT d.*

    , s.avg_total_user_cost

    , s.avg_user_impact

    , s.last_user_seek

    ,s.unique_compiles

    FROM sys.dm_db_missing_index_group_stats s

    ,sys.dm_db_missing_index_groups g

    ,sys.dm_db_missing_index_details d

    WHERE s.group_handle = g.index_group_handle

    and d.index_handle = g.index_handle

    ORDER BY s.avg_user_impact desc

    go

    --- suggested index columns & usage

    DECLARE @handle int

    SELECT @handle = d.index_handle

    FROM sys.dm_db_missing_index_group_stats s

    ,sys.dm_db_missing_index_groups g

    ,sys.dm_db_missing_index_details d

    WHERE s.group_handle = g.index_group_handle

    and d.index_handle = g.index_handle

    SELECT *

    FROM sys.dm_db_missing_index_columns(@handle)

    ORDER BY column_id

    [font="Verdana"]Markus Bohse[/font]

  • Would you please comment on the following? Let's say the first column or columns of a nonclustered index is unique. Might there be reasons to add additional columns to the index list as opposed to the include list? Might it be significantly faster because the level level does not need to be accessed?

  • >>Let's say the first column or columns of a nonclustered index is unique. Might there be reasons to add additional columns to the index list as opposed to the include list?

    When you say "index list" and "include list", what do you mean ?

    Index list = list of columns comprising the index ?

    Include list = ???

  • To clarify my question consider the following index. In this following example, let's say ClientID is unique. (the table is clustered on a different column).

    CREATE NONCLUSTERED INDEX [IX_TableX_ClientID] ON [dbo].[TableX]

    ([ClientID], [ColA], [ColB], [ColC])

    INCLUDE ([ColD], [ColE])

    Might there be a good reason that ColA, ColB, and ColC are part of the index columns or should they rightfully be in the include list with ColD and ColE? Here I'm assuming that all specified columns serve a purpose as a covering index.

  • The question about adding the column from the clustered index to others is this. All non-clustered indexes include the clustered value itself up to a length of I think 64 characters (sorry I forget off hand the actually lenght). Unless it is a heap then it contains the value of the page and row of the related row. So no, generally you do not have to include the clustered index value to get the covering effect.

  • CREATE NONCLUSTERED INDEX [IX_TableX_ClientID] ON [dbo].[TableX]

    ([ClientID], [ColA], [ColB], [ColC])

    INCLUDE ([ColD], [ColE])

    Might there be a good reason that ColA, ColB, and ColC are part of the index columns or should they rightfully be in the include list with ColD and ColE? Here I'm assuming that all specified columns serve a purpose as a covering index.

    Again, "it depends". Depends on the usage.

    In the example above, if ColD or ColE are used in the ORDER BY or GROUP BY, then there may be an advantage to having them in the index, rather than just included at the leaf level. Same with ColA, ColB and ColC, moving them into the include may hurt performance depending on usage. Or may improve performance by allowing more rows to squeeze into the non-leaf pages.

  • PW (12/3/2007)


    CREATE NONCLUSTERED INDEX [IX_TableX_ClientID] ON [dbo].[TableX]

    ([ClientID], [ColA], [ColB], [ColC])

    INCLUDE ([ColD], [ColE])

    Might there be a good reason that ColA, ColB, and ColC are part of the index columns or should they rightfully be in the include list with ColD and ColE? Here I'm assuming that all specified columns serve a purpose as a covering index.

    Again, "it depends". Depends on the usage.

    In the example above, if ColD or ColE are used in the ORDER BY or GROUP BY, then there may be an advantage to having them in the index, rather than just included at the leaf level. Same with ColA, ColB and ColC, moving them into the include may hurt performance depending on usage. Or may improve performance by allowing more rows to squeeze into the non-leaf pages.

    Exactly - depends on the usage. What's even uglier is that the usage may dictate a better ORDER in which to put them into the key. It's unfortunately not always the case that you should put the PK as the FIRST field in the key. If you were to "play" with the ordering a little bit - you will find that different queries prefer different orders based on the relative costs.

    Like I mentioned earlier (and PW is mentioning here), fields might need to be in the key part (what PW is referring to as "in the index"), if they participate in the FROM (a.k.a JOIN), GROUP BY, ORDER BY or WHERE clauses. If they're simply in the SELECT clause, then they go in the INCLUDE list.

    If you have something causing a scan (like a function modifiying a field in the WHERE clause), then move those fields to the INCLUDE (since it won't help you to have them in the key part).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Thanks gentlemen! I appreciate the insights.

  • I suspect the best approach is to completely reanalyse the index strategy for the table and see how well the results match the existing indexes - that's the only way you will know if there is good reason for having that many indexes on a single table.

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply