November 17, 2011 at 1:50 pm
Hi There,
can anyone explain why in a query where the predicate looks for a NULL value the optimizer does not select the most appropriate index if the index is composite and the select contains an aggregate?
For example:
select MIN(my_column) from my_table where key_1 is null and key_2 = integer_value
With an index on the table such as:
CREATE NONCLUSTERED INDEX [my_composite_index] ON my_table
([key_1] ASC,
[key_2 ] ASC,
[key_3] ASC)
ON [Partition_Scheme_Mine]([my_column])
The index is correctly selected in the following cases:
1) key_1 is a value
select MIN(my_column) from my_table where key_1 = integer_value and key_2 = integer_value
2) There is no aggregate
select my_column from my_table where key_1 is null and key_2 = integer_value
3) The where condition is covered
select MIN(my_column) from my_table where key_1 is null and key_2 = integer_value and key_3 = integer_value
Mainly I'm interested in understanding the difference between case 1 and 2 and how NULLs are treated in the statistics.
Please note that there are very few records out of millions which are set to NULL for Key_1
Thank you for your detailed explanation.
November 17, 2011 at 2:07 pm
Research sargability, I can't find anything offhand.
Thanks,
Jared
Jared
CE - Microsoft
November 17, 2011 at 2:25 pm
Have you tried using WITH(FORCESEEK)? SQL Server won't use the index unless the predicate is selective enough, and it may not be in this case.
Jared
Jared
CE - Microsoft
November 17, 2011 at 2:26 pm
Thank you Jared,
I think you are right. The MIN aggregate makes it not "SARGABLE".
I did try with top 1 ordered desc but had the same plan.
Any tips are welcome
Thanks
Andrea
November 17, 2011 at 2:28 pm
Yes Jared, I did try it and it does work but I was trying to understand the reason and also avoid any hints.
Thank you though, very helpful.
Andrea
November 17, 2011 at 2:29 pm
My guess would be heuristics and the tipping point.
Is 'my_column' in the actual clustered index? If not, try using it as an include in the index definition. You'll probably see what you're looking for.
The aggregation knows it needs a range of values and doing a key lookup is considered expensive to the optimizer. There's probably an override under the hood somewhere based on the selectivity.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
November 17, 2011 at 2:35 pm
Hi Craig,
yes actually my_column is the clustered index and the index composite index is aligned (see the partition schema at the end).
I think there has to be a reason why the optimizer decides not to use the index when the value in the where clause is NULL but it uses it just fine when the value is anything else.
Also, as mentioned, the plan with a hint works just fine. So the question remains...why not use that index when NULL is in the where clause?
Andrea
November 17, 2011 at 2:41 pm
Hmm... Are you looking at the Actual Execution Plan or the Estimated?
Jared
Jared
CE - Microsoft
November 17, 2011 at 2:48 pm
The index is correctly selected in the following cases:
Remember that the query optimizer chooses what it believes to be the best plan for it to retrieve data. So, that being said, what exactly does "correctly" mean? When you use WITH(FORCESEEK) is the query faster or not? Did you look at statistics when running both?
I'm not trying to be smug or anything, I am genuinely curious to see your findings. I wonder if there is some issue with indexing NULLS in a composite index?
Thanks,
Jared
Jared
CE - Microsoft
November 17, 2011 at 2:54 pm
Hi Jared,
No worries, I appreciate the discussion; this is what these forums are for.
Yes on the actual plan (they match).
Yes and No on the actual execution time. In one case I did have the "bad" plan showing but execution time was in milliseconds. However in all other test cases I have simulated the answer is always no. Please note that the test table I'm running on has more than 200 M records.
Andrea
November 17, 2011 at 3:00 pm
Yes on the actual plan (they match).
I'm sorry, yes what? The actual and estimated match?
Yes and No on the actual execution time.
Yes and no what?
In one case I did have the "bad" plan showing but execution time was in milliseconds.
Which case and what is wrong with milliseconds? Was this with SET STATICTICS TIME ON or looking at the SSMS window?
However in all other test cases I have simulated the answer is always no.
Again, I'm sorry, but "no" what?
Jared
CE - Microsoft
November 17, 2011 at 3:00 pm
Can you give me an average row size and NULL in key_1 percentage? going to build a local test-case.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
November 17, 2011 at 3:24 pm
Hi Kraig,
Table has 400,000,000 records and 1.4M are null but on key_2 having value 2 only 40,000 are null so the percentage is about 50% on the total but that drops to 0.001% when coupled with key_2.
Remember you also need to have a an aggregate like min on column_1 and the table has to be partitioned.
Let me know if you can replicate it.
Andrea
November 17, 2011 at 3:33 pm
I'm sorry, yes what? The actual and estimated match?
Correct the actual plan and the estimated one match.
In one case I did have the "bad" plan showing but execution time was in milliseconds.
This only happened one time and in a case that I do not have handy so I think you may disregard my statement
Which case and what is wrong with milliseconds? Nothing wrong, that was the one good execution time on a bad execution plan.
Was this with SET STATICTICS TIME ON or looking at the SSMS window? Just looking at the SSMS window
However in all other test cases I have simulated the answer is always no.
Again, I'm sorry, but "no" what?
[/quote]
No to "bad plan" = "Poor execution time"
Sorry for the confusion,
Andrea
November 17, 2011 at 3:42 pm
Craig mentioned the tipping point, here is an interesting link I found on this: http://www.sqlskills.com/BLOGS/KIMBERLY/post/The-Tipping-Point-Query-Answers.aspx
That's about all I can find on this issue. I think you will have to use a hint to force it in your specific case. However, before choosing that route be sure to not just look at your SSMS window for execution time because that is misleading. Use the command I gave you before you run each of your tests. Also, make sure to test several times and maybe using OPTION(MAXDOP 1)
Jared
Jared
CE - Microsoft
Viewing 15 posts - 1 through 15 (of 36 total)
You must be logged in to reply to this topic. Login to reply