Wrong Index Being Used

  • Roger Sabin (2/8/2013)


    So, you are saying that SQL Server determined --

    the cost of finding a few records (using second index) + lookup for other fields

    was greater than ---

    the cost of searching through thousands of records (using first index) to find the few matching that where clause but without a need to do a lookup for the additional fields needed

    and that is why the first index was chosen.

    I always thought that the index with the greatest number of equality matches (in the order of the columns in the index without any gaps) was always selected. Only when there were range matches or two or more indexes could satisfy the equality matches would it take into account things like lookups.

    Look at your execution plans, SQL Server performed an INDEX SEEK on both of them to satisify the respective queries. SQL Server did not have to scan the clustered index to find the data to satisify the query that also returned the value from the CompanyName column (if I remembered the column correctly).

  • I understand that using the first index (the clustered one) allows SQL Server to retrieve any column in the record once it has found the record it wants (using the where clause). Whereas, the second index requires a separate lookup to get any column not in the index which does take more time.

    I am having a hard time understanding why the first index would be used when it requires tremediously more IO and time than the second index to return the same records. Why isn't it obvious to the query optimizer that the first index costs more than the second index?

    For example, executing the following queries (the first one uses the clustered index, the second one uses the second index, the third one is being forced to use the second index though it has a non-indexed column):

    set statistics io on

    set statistics time on

    select ProcessYear,

    ProcessMonth,

    VisionAccountNumber,

    CustomerName

    from UN_MonthlyCMS

    where ProcessYear = 2013

    and ProcessMonth = 1

    and VisionAccountNumber = '1234'

    select ProcessYear,

    ProcessMonth,

    VisionAccountNumber

    from UN_MonthlyCMS

    where ProcessYear = 2013

    and ProcessMonth = 1

    and VisionAccountNumber = '1234'

    select ProcessYear,

    ProcessMonth,

    VisionAccountNumber,

    CustomerName

    from UN_MonthlyCMS with (index (IDX_MonthlyCMS_ProcessYear_ProcessMonth_VisionAccountNumber_CreditPlanNumber))

    where ProcessYear = 2013

    and ProcessMonth = 1

    and VisionAccountNumber = '1234'

    set statistics time off

    set statistics io off

    you get this:

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 1 ms.

    (0 row(s) affected)

    Table 'UN_MonthlyCMS'. Scan count 1, logical reads 355528, physical reads 7, read-ahead reads 216495, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 2312 ms, elapsed time = 147795 ms.

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 0 ms.

    (0 row(s) affected)

    Table 'UN_MonthlyCMS'. Scan count 1, logical reads 4, physical reads 4, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 35 ms.

    (0 row(s) affected)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'UN_MonthlyCMS'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    Though these queries are using a bogus account number so that no actual records are found, when I ran them with a real account number I got similar IO usage and times.

  • Roger Sabin (2/8/2013)


    I am having a hard time understanding why the first index would be used when it requires tremendously more IO and time than the second index to return the same records. Why isn't it obvious to the query optimizer that the first index costs more than the second index?

    There are a number of reasons.

    First up, where there is an obvious covering nonclustered index for the query, the optimizer picks it straight away at the trivial plan stage. The plan is chosen before the optimizer starts exploring the space of possible plans and choosing one based on lowest estimated cost. You can see this in the SSMS execution plan view by clicking on the SELECT icon and looking at the Optimization Level property in the Properties window - it shows TRIVIAL for the CorrectIndex.sqlplan file.

    For the second query, there is no longer an obviously best choice, and although the trivial plan logic is still run, it does not find a plan. So, the optimizer starts generating plan alternatives and costing them. One alternative is to seek the clustered index:

    ProcessYear = 2013

    ProcessMonth = 1

    ...and then apply a residual filter to rows that match:

    VisionAccountNumber = '1234'

    The alternative is to seek the nonclustered index:

    ProcessYear = 2013

    ProcessMonth = 1

    VisionAccountNumber = '1234'

    ...and then use the clustered index keys stored in the leaf level of the nonclustered index to locate the base row in the clustered index and return the value it finds for CustomerName. This process repeats for each row found by the nonclustered index seek.

    These two alternatives are assigned an estimated cost using the query optimizer's cost model, and this is where the problem occurs. The model is just a model, of course, one which happens to work pretty well in practice over the broad range of queries, database designs, and hardware in use across the world. It does not look to make a precise estimate of the cost of the query in your particular circumstances, and the cost it produces is also dependant on the quality of statistical information it has to work with.

    Now, without scripted statistics for the table in question I cannot work through the numbers and show you precisely why the clustered index seek + residual predicate is chosen over the nonclustered seek + base table lookups, but if you compare the estimated plan costs for both (using an index hint as necessary) you will see the clustered access path appears to be cheaper.

    One reason for this is that the optimizer assumes each query starts with a cold data cache, so all I/Os will at least initially be physical. The model also assumes sequential I/O (scanning a range of the clustered index) will be much faster than the random I/O expected to result from a given number of lookups from the nonclustered index to the base table.

    The main reason though is that the optimizer is grossly underestimating the number of rows returned by the range seek of the clustered index which will need to have the residual filter applied. This may be due to a lack of good statistics on the column combination concerned, or it may be a consequence of the way predicates are modelled. You can check this by issuing a query like SELECT COUNT_BIG(*) FROM UNMonthly_CMS WHERE ProcessYear = 2012 AND ProcessMonth = 1 OPTION (RECOMPILE); the estimated number of rows ought to be quite close to the actual rows.

    If you fancy scripting out the indexes and statistics using SSMS, I can go into more detail.

  • At this point, I will bow out and defer to Paul. He has spent much more time than I digging into these things than I and has a much better understanding of indexing and the optimizer.

  • Lynn Pettis (2/8/2013)


    At this point, I will bow out and defer to Paul. He has spent much more time than I digging into these things than I and has a much better understanding of indexing and the optimizer.

    I feel I should stress that pretty much everything you guys have said on this thread so far has been accurate; the problem we are dealing with here is the optimizer making what looks like the right choice from its point of view, but which turns out to be highly suboptimal because it doesn't have good information to base its choices on, and the model it uses is a very general-purpose affair. The obvious solution has already been suggested - provide an obvious best-choice (covering) index. The alternative is to create better statistics, or to accept the out-of-model condition and use an index hint. So, overall, I'm just adding extra analysis around the points already made in case it is of interest to Roger.

  • Thank you for expounding, Paul. I learned something new this morning.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Brandie Tarvin (2/11/2013)


    Thank you for expounding, Paul. I learned something new this morning.

    Ditto. Love randomly finding a thread with great information in!

Viewing 7 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply