Create Index based on execution plan

  • I'm including the actual execution plan on a query, one of the tables that is a left outer join shows that it's doing a Clustered Index Scan. When I hover over the details of that, what parts of that data do I use to create a new nonclustered index so that it does a seek instead of a scan.

    Attached is the execution plan.

  • What I've always done when trying to create indexes (whether wrong or not), is go by these rules [for nonclustered indexes]:

    For joined tables, the first column is what is used in the "ON" clause, additional columns are added if there is an "AND" with that "ON" clause. Then if there are any additional columns in the queries "WHERE" clause, those get added also.

    So for my example, the statement:

    Left Outer Join Cust_Ident_Type cit On cit.Cust_Ident_Type_ID = ci.Cust_Ident_Type_ID

    And cit.Effective_DT <= @getdate-2

    And (cit.End_DT IS NULL Or cit.End_DT > @getdate-2)

    So that would tell me to create a nonclustered index like this:

    Create NonClustered Index......

    (

    Cust_Ident_Type_ID Asc,

    Effective_DT Asc,

    End_DT Asc

    )

    But when I re-run my query, it still uses the Clustered Index showing a clustered index scan.....#confused. (I can't believe I just used twitter syntax!!!)

  • I'll venture that the 'OR' clause in the join statement is the issue here. You're looking for values that are either NULL (at one end of the index range) or greater than a certain value (at the other end of the index range). I'll bet that the query optimizer decides it's more efficient to scan the clustered index than do two separate seeks at two 'ends' of the index range.

    If so, it's unlikely that ANY index will be able to cover all three criteria in the join statement. Depending on the distribution of values in the data, you may be able to create a Non-Clustered index on Cust_Ident_Type_ID and Effective_Dt that would INCLUDE End_DT; query optimizer might be able to do a partial (range?) scan of that index to find values that fit the criteria, which could be more efficient than a full clustered index scan.

    Rob Schripsema
    Propack, Inc.

  • Rob Schripsema (9/28/2011)


    I'll venture that the 'OR' clause in the join statement is the issue here. You're looking for values that are either NULL (at one end of the index range) or greater than a certain value (at the other end of the index range). I'll bet that the query optimizer decides it's more efficient to scan the clustered index than do two separate seeks at two 'ends' of the index range.

    If so, it's unlikely that ANY index will be able to cover all three criteria in the join statement. Depending on the distribution of values in the data, you may be able to create a Non-Clustered index on Cust_Ident_Type_ID and Effective_Dt that would INCLUDE End_DT; query optimizer might be able to do a partial (range?) scan of that index to find values that fit the criteria, which could be more efficient than a full clustered index scan.

    Tried doing that and it didn't work.

    I then removed the "OR" to just have "cit.End_DT IS NULL", and instead of "including" the End_DT, I had it apart of the actual index and that still didn't work.

    I'm doing a sp_recompile on my table after each addition/deletion of these indexes to make sure that it rebuilds the execution plan, is there something else that I need to do to trigger a new plan?

  • Indexes support filters first and foremost. So you want the columns that are part of the filter first (to cut the number of rows in consideration first). So you want either Effective_DT or End_DT first in the index. Which one depends on which predicate reduces the row count the most. Then you want the other of those as the second column and lastly Cust_Ident_Type_ID as the third column.

    http://sqlinthewild.co.za/index.php/2009/02/06/index-columns-selectivity-and-inequality-predicates/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • gregory.anderson (9/28/2011)


    I'm doing a sp_recompile on my table after each addition/deletion of these indexes to make sure that it rebuilds the execution plan, is there something else that I need to do to trigger a new plan?

    No need to recompile. Adding an index counts as a schema modification, it invalidates all plans that use that table.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • gregory.anderson (9/28/2011)


    Rob Schripsema (9/28/2011)


    I'll venture that the 'OR' clause in the join statement is the issue here. You're looking for values that are either NULL (at one end of the index range) or greater than a certain value (at the other end of the index range). I'll bet that the query optimizer decides it's more efficient to scan the clustered index than do two separate seeks at two 'ends' of the index range.

    If so, it's unlikely that ANY index will be able to cover all three criteria in the join statement. Depending on the distribution of values in the data, you may be able to create a Non-Clustered index on Cust_Ident_Type_ID and Effective_Dt that would INCLUDE End_DT; query optimizer might be able to do a partial (range?) scan of that index to find values that fit the criteria, which could be more efficient than a full clustered index scan.

    Tried doing that and it didn't work.

    I then removed the "OR" to just have "cit.End_DT IS NULL", and instead of "including" the End_DT, I had it apart of the actual index and that still didn't work.

    I'm doing a sp_recompile on my table after each addition/deletion of these indexes to make sure that it rebuilds the execution plan, is there something else that I need to do to trigger a new plan?

    First off, seek vs. scan has nothing to do with the columns you've included in the index. The engine will pick what it feels is faster based on the data it's looking for. If you're pulling back a lot of the table, a scan will actually be faster than seeking, and that's why the optimizer will choose that.

    That said, a clustered index scan is a full table scan, and is a last resort if you have no covering nonclustered indexes. From the execution plan, you'd need a nonclustered index like this:

    CREATE NONCLUSTERED INDEX idxCustIdentType_Z1 ON CustIdentType (

    Effective_DT, End_DT

    )

    INCLUDE (CUst_Ident_Type_ID,IND_Priority)

    I don't have your table definitions or query exactly, but try this.

    edit: forgot a column in the Include.

  • After further investigation....

    A more likely issue is that you're using a value from that second LEFT OUTER JOIN (CIT.Ind_Priority) in your ORDER BY clause; note that the query plan shows that value as output from the clustered index scan. That field is NOT in your non-clustered index, so SQL would have to first find the value in the non-clustered index then do a lookup in the clustered data in order to retrieve that value. The optimizer probably figures that it is more efficient to simply scan all the data (in the clustered index) than to do the seek and the lookup. It would really depend on how much data was in the tables, as well -- if SQL has to pull in just a page or two to get ALL of the data in the table, that's more efficient than using an index AND having to retrieve the data anyway.

    You could try including the Ind_Priority field as an INCLUDEd value in your index -- but again, depending on how much data you're talking about, it may or may not make a difference.

    Rob Schripsema
    Propack, Inc.

  • Thanks everyone so far....

    I've tried this:

    Create NonClustered Index idxNC_CustIdentType On [dbo].[Cust_Ident_Type]

    (

    Effective_DT Asc,

    End_DT Asc

    ) Include (Ind_Priority)

    And this:

    Create NonClustered Index idxNC_CustIdentType On [dbo].[Cust_Ident_Type]

    (

    Effective_DT Asc,

    End_DT Asc,

    Cust_Ident_Type_ID Asc

    ) Include (Ind_Priority)

    And they're still using the clustered index scan. My Cust_Ident_Type table is only 15 rows, could that be the reason why it's not using the nonclustered index?

  • In other words...what Derrick Smith just said...AND what you just concluded yourself.

    Rob Schripsema
    Propack, Inc.

  • Is this just to order NULL's last?:

    ORDER BY ISNULL(CIT.Ind_Priority, DBO.MaxInt())

    Could you at change it to something like:

    ORDER BY case when CIT.Ind_Priority is null then 1 else 0 end, CIT.Ind_Priority

    May not make a difference, but you want to avoid using functions where possible

  • gregory.anderson (9/28/2011)


    My Cust_Ident_Type table is only 15 rows, could that be the reason why it's not using the nonclustered index?

    Yes. With 15 rows it's likely that all the data will be on one page. In that case, the index will likely be less efficient than the table scan.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Since you just mentioned the 15 row number...why are you indexing a 15 row table? Unless it's 300 columns wide or has ridiculous XML/varbinary(max)/image fields, reading 15 rows is incredibly quick. That would be a good reason why it would just do a table scan.

  • Here's the predicate from the properties (don't use the tool tip, not enough info there).

    [IARTS].[dbo].[Cust_Ident_Type].[Effective_DT] as [CIT].[Effective_DT]<=[@GetDate] AND ([IARTS].[dbo].[Cust_Ident_Type].[End_DT] as [CIT].[End_DT] IS NULL OR [IARTS].[dbo].[Cust_Ident_Type].[End_DT] as [CIT].[End_DT]>[@GetDate])

    The two columns referenced here, Effective_DT and End_DT, are candidates for an index. But, this query also outputs a couple of columns:

    [IARTS].[dbo].[Cust_Ident_Type].Cust_Ident_Type_ID, [IARTS].[dbo].[Cust_Ident_Type].Ind_Priority

    I'm assuming that the Cust_ident_Type_ID is the clustered key? If so, your nonclustered index would also need to INCLUDE the Ind_Priority column. If not, it would need both.

    But the more concerning issue is the warning that you don't have a JOIN predicate. Something about your join is giving the optimizer conniptions.

    But, the biggest concern is your ORDER BY statement

    ORDER BY ISNULL(CIT.Ind_Priority, DBO.MaxInt())

    Functions on the columns like ISNULL can prevent index use, and what kind of function is MaxInt()?

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Yes, "Order By ISNULL(Ind_Priority, DBO.MaxInt())" was to put the nulls at the end (the select statement does a Top 1).

    Sorry for wasting everyone's time with this....(about it only being 15 rows).

    The reason why I was trying to go through adding this index is because our SQL Server is showing an extremely high "I/O wait" in our Quest Performance Analysis software. When reading up on that, it said to check for index scans as the possible culprit. This particular stored procedure that executes this code runs over 9000 times per hour, so even though it may be minimal, I was thinking any gain would be beneficial.

    Thanks to all!!! I do appreciate the help.

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply