Composite index not being chosen by the optimizer

  • Hi There,

    can anyone explain why in a query where the predicate looks for a NULL value the optimizer does not select the most appropriate index if the index is composite and the select contains an aggregate?

    For example:

    select MIN(my_column) from my_table where key_1 is null and key_2 = integer_value

    With an index on the table such as:

    CREATE NONCLUSTERED INDEX [my_composite_index] ON my_table

    ([key_1] ASC,

    [key_2 ] ASC,

    [key_3] ASC)

    ON [Partition_Scheme_Mine]([my_column])

    The index is correctly selected in the following cases:

    1) key_1 is a value

    select MIN(my_column) from my_table where key_1 = integer_value and key_2 = integer_value

    2) There is no aggregate

    select my_column from my_table where key_1 is null and key_2 = integer_value

    3) The where condition is covered

    select MIN(my_column) from my_table where key_1 is null and key_2 = integer_value and key_3 = integer_value

    Mainly I'm interested in understanding the difference between case 1 and 2 and how NULLs are treated in the statistics.

    Please note that there are very few records out of millions which are set to NULL for Key_1

    Thank you for your detailed explanation.

  • Research sargability, I can't find anything offhand.

    Thanks,

    Jared

    Jared
    CE - Microsoft

  • Have you tried using WITH(FORCESEEK)? SQL Server won't use the index unless the predicate is selective enough, and it may not be in this case.

    Jared

    Jared
    CE - Microsoft

  • Thank you Jared,

    I think you are right. The MIN aggregate makes it not "SARGABLE".

    I did try with top 1 ordered desc but had the same plan.

    Any tips are welcome

    Thanks

    Andrea

  • Yes Jared, I did try it and it does work but I was trying to understand the reason and also avoid any hints.

    Thank you though, very helpful.

    Andrea

  • My guess would be heuristics and the tipping point.

    Is 'my_column' in the actual clustered index? If not, try using it as an include in the index definition. You'll probably see what you're looking for.

    The aggregation knows it needs a range of values and doing a key lookup is considered expensive to the optimizer. There's probably an override under the hood somewhere based on the selectivity.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Hi Craig,

    yes actually my_column is the clustered index and the index composite index is aligned (see the partition schema at the end).

    I think there has to be a reason why the optimizer decides not to use the index when the value in the where clause is NULL but it uses it just fine when the value is anything else.

    Also, as mentioned, the plan with a hint works just fine. So the question remains...why not use that index when NULL is in the where clause?

    Andrea

  • Hmm... Are you looking at the Actual Execution Plan or the Estimated?

    Jared

    Jared
    CE - Microsoft

  • The index is correctly selected in the following cases:

    Remember that the query optimizer chooses what it believes to be the best plan for it to retrieve data. So, that being said, what exactly does "correctly" mean? When you use WITH(FORCESEEK) is the query faster or not? Did you look at statistics when running both?

    I'm not trying to be smug or anything, I am genuinely curious to see your findings. I wonder if there is some issue with indexing NULLS in a composite index?

    Thanks,

    Jared

    Jared
    CE - Microsoft

  • Hi Jared,

    No worries, I appreciate the discussion; this is what these forums are for.

    Yes on the actual plan (they match).

    Yes and No on the actual execution time. In one case I did have the "bad" plan showing but execution time was in milliseconds. However in all other test cases I have simulated the answer is always no. Please note that the test table I'm running on has more than 200 M records.

    Andrea

  • Yes on the actual plan (they match).

    I'm sorry, yes what? The actual and estimated match?

    Yes and No on the actual execution time.

    Yes and no what?

    In one case I did have the "bad" plan showing but execution time was in milliseconds.

    Which case and what is wrong with milliseconds? Was this with SET STATICTICS TIME ON or looking at the SSMS window?

    However in all other test cases I have simulated the answer is always no.

    Again, I'm sorry, but "no" what?

    Jared
    CE - Microsoft

  • Can you give me an average row size and NULL in key_1 percentage? going to build a local test-case.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Hi Kraig,

    Table has 400,000,000 records and 1.4M are null but on key_2 having value 2 only 40,000 are null so the percentage is about 50% on the total but that drops to 0.001% when coupled with key_2.

    Remember you also need to have a an aggregate like min on column_1 and the table has to be partitioned.

    Let me know if you can replicate it.

    Andrea

  • I'm sorry, yes what? The actual and estimated match?

    Correct the actual plan and the estimated one match.

    In one case I did have the "bad" plan showing but execution time was in milliseconds.

    This only happened one time and in a case that I do not have handy so I think you may disregard my statement

    Which case and what is wrong with milliseconds? Nothing wrong, that was the one good execution time on a bad execution plan.

    Was this with SET STATICTICS TIME ON or looking at the SSMS window? Just looking at the SSMS window

    However in all other test cases I have simulated the answer is always no.

    Again, I'm sorry, but "no" what?

    [/quote]

    No to "bad plan" = "Poor execution time"

    Sorry for the confusion,

    Andrea

  • Craig mentioned the tipping point, here is an interesting link I found on this: http://www.sqlskills.com/BLOGS/KIMBERLY/post/The-Tipping-Point-Query-Answers.aspx

    That's about all I can find on this issue. I think you will have to use a hint to force it in your specific case. However, before choosing that route be sure to not just look at your SSMS window for execution time because that is misleading. Use the command I gave you before you run each of your tests. Also, make sure to test several times and maybe using OPTION(MAXDOP 1)

    Jared

    Jared
    CE - Microsoft

Viewing 15 posts - 1 through 15 (of 36 total)

You must be logged in to reply to this topic. Login to reply