December 27, 2011 at 9:45 am
Tks Jeff - great question
December 27, 2011 at 3:22 pm
Thanks for a great question, Jeff!
I had it wrong, for two reasons:
1. Not knowing that the optimizer is smart enough to rewrite an ISNULL of this form to a SARGable equivalent, and
2. Being too stupid to really check out the indexes given in the question to check if "doesn't include EventID" is a bluff.
So this question learned me one thing I didn't know yet, and attempted to learn me something I should know for a long time already. 😀
EDIT: After looking at the question in yet a bit more detail, I realise that the ability to rewrite the ISNULL is completely irrelevant to the question. The index seek uses StartDate, not EndDate. I'm not saying that the optimizer will not rewrite an ISNULL in some cases, but it doesn't in this case! The seek includes only the StartDate in the Seek Predicate; the predicate on the EndDate is in the (non-seek) predicate of the operator.
December 27, 2011 at 3:39 pm
Hugo Kornelis (12/27/2011)
EDIT: After looking at the question in yet a bit more detail, I realise that the ability to rewrite the ISNULL is completely irrelevant to the question. The index seek uses StartDate, not EndDate. I'm not saying that the optimizer will not rewrite an ISNULL in some cases, but it doesn't in this case! The seek includes only the StartDate in the Seek Predicate; the predicate on the EndDate is in the (non-seek) predicate of the operator.
So I decided to dig a bit deeper, and I found out that the explanation about the ISNULL is, in fact, incorrect. The optimizer will (at least in this case) NOT rewrite the ISNULL to make it SARGable.
Here's the code I used to test it - using the CREATE TABLE Jeff provided in the question, and the data generator he posted in the explanation.
CREATE INDEX ix1 ON dbo.Event(EndDate) INCLUDE (StartDate)
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
DECLARE @DesiredStartDate DATETIME,
@DesiredEndDate DATETIME,
@BeginningOfTime CHAR(8),
@EndOfTime CHAR(8);
SELECT @DesiredStartDate = '14 Jun 2012',
@DesiredEndDate = '16 Jun 2012',
@EndOfTime = '99991231';
SELECT EventID, StartDate, EndDate
FROM dbo.Event
WHERE (@DesiredEndDate <= ISNULL(EndDate,@EndOfTime))
OPTION (RECOMPILE);
I ran the last query with the actual execution plan turned on, and the result was an index SCAN instead of an index SEEK on index dbo.ix1.
December 27, 2011 at 3:53 pm
440692 I am just a number (12/27/2011)
Could you expand on the explanation for option E
Several people asked this. The answer is that the SARGability of the ISNULL is irrelevant, because the engine will use the first column in the index (StartDate) for the seek, not the second column (EndDate).
In my previous post, I have demonstrated that -at least in this case, with a 1000-row test population-, the optimizer will in fact NOT rewrite the ISNULL to a more SARGable expression. But what I think Jeff had in mind (and he'll undoubtedly correct me if I'm wrong) is that the ISNULL expression could be rewritten as follows:
Step 1: Convert @DesiredEndDate <= ISNULL (EndDate, @EndOfTime) to
@DesiredEndDate <= EndDate
OR (EndDate IS NULL AND @DesiredEndDate <= CAST(@EndOfTime AS datetime))
Step 2: Pre-evaluate the part of the expression that contains only variables:
@DesiredEndDate <= EndDate
OR (EndDate IS NULL AND (True))
Step 3: Remove needless ballast and swap order for readability:
EndDate >= @DesiredEndDate
OR EndDate IS NULL
In an RDBMS that sorts NULL values *after* other values, this can be retrieved using a single seek (for the first qualifying row) + forward range scan (for the rest). But SQL Server orders NULL values first. That makes it a bit harder; in theory, the optimizer could use two seek / forward range scan combinations (one for the NULL values at the start; the other for the non-NULL values) and merge them. Maybe this will indeed happen if the table is large enough to warrant using all phases of the optimizer to do a full optimization. But for a thousand rows, the optimizer will simply scan the index instead.
December 27, 2011 at 3:54 pm
Love this question. I got A but missed B.
December 27, 2011 at 3:59 pm
tabinsc (12/27/2011)
I do have one question: If clustered index columns are auto-appended to nonclustered indexes, what if the nonclustered index is unique? Adding the clustered index column(s) to an unique nonclustered index would change the unique constraint on it, right?
No, it won't. The index is still unique; duplicate values will still be rejected. And these duplicate values will only be checked for the actually indexed columns.
All nonclustered indexes have the columns from the clustered index added - but only on the leaf-level pages, and not as indexed columns but as extra data. These columns are the pointer to where the rest of the data is. Compare it to an index in a book - after each indexed term is a page number (the clustered index value), that you then use to look up the complete data. If you would enforce the (for a study book) silly requirement that each term can be indexed only once, the index will immediately show violations of this requirement, but there still may be multiple index entries referencing the same page.
Technically, the way clustered index columns are appended to nonclustered indexes is very similar to how included columns work.
December 27, 2011 at 4:32 pm
Sorry... bad post withdrawn. Hugo and Paul are correct. I missed my own NOT NULL constraint on the StartDate column. I'm sending an email to SSC to correct the answers without changing the ones that got it correct. :blush:
To be sure, I made a terrible mistake. ISNULL isn't SARGable.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 27, 2011 at 11:46 pm
There's always a temptation with QotD to keep expanding it until the question is 'perfect'. This question seems to have fallen victim to that temptation; in attempting to cover many things all at once, clarity suffers, and more importantly so does correctness, in at least a couple of important respects...
Hugo is correct regarding the SARGability of the ISNULL expression. The reason using ISNULL on the StartDate column does not prevent a seek is that the column is constrained to be NOT NULL. The optimizer can, and does, remove redundant ISNULL calls in this specific case during the simplification phase of query optimization. Looking at the query posted with the predicate @DesiredStartDate >= ISNULL(StartDate,0), the execution plan shows the following details for the index seek:
Seek predicate: End: [dbo].[Event].StartDate <= Scalar Operator([@DesiredStartDate])
Residual predicate: [@DesiredEndDate]<=isnull([dbo].[Event].[EndDate],CONVERT_IMPLICIT(datetime,[@EndOfTime],0))
Notice the ISNULL has been removed from the StartDate predicate (due to the column's NOT NULL constraint) resulting in an index-seek operation. Rows returned by the storage engine as qualifying for the seek are subjected to the residual predicate test in the query processor. On one particular run on my machine, the seek qualified 459 rows. The residual, applied to these 459 rows, reduced the total further to 15 rows.
The EndDate column, on the other hand, is not constrained to be NOT NULL, so the ISNULL on the column reference remains and results, in general, in a non-SARGable predicate. Moving back to the original query in the question, notice that the index seek properties are exactly the same as for the ISNULL(StartDate,0) extension noted above:
Seek: End: [dbo].[Event].StartDate <= Scalar Operator([@DesiredStartDate])
Residual: [@DesiredEndDate]<=isnull([dbo].[Event].[EndDate],CONVERT_IMPLICIT(datetime,[@EndOfTime],0))
The second important point here (aside from noting how many eventually-unqualified rows have to go through the residual predicate) is that the seek range scan only has a End: attribute, not a Start: attribute. This means the range scan starts at the very beginning of the index, and terminates when the End: predicate first returns false. To make the point perhaps clearer, add a second index:
CREATE INDEX nc1 ON dbo.[Event] (StartDate DESC, EndDate)
With the leading column sorted descending, the optimizer chooses this index over the one in the question, with these properties:
Seek: Start: [dbo].[Event].StartDate <= Scalar Operator([@DesiredStartDate])
Residual: [@DesiredEndDate]<=isnull([dbo].[Event].[EndDate],CONVERT_IMPLICIT(datetime,[@EndOfTime],0))
The estimated cost of this plan is 0.0045265 versus 0.0074895 for the original index - an improvement of close to 40%. However, the seek still qualifies 459 rows and the residual filters this to 15 as before, the cost reduction is simply a consequence of the optimizer preferring seeks with a Start: prefix over those that scan from one end to an End: prefix. Further, having EndDate part of the key adds nothing, so a better index for this query is:
CREATE INDEX nc2 ON dbo.[Event] (StartDate DESC) INCLUDE (EndDate)
Now the optimizer chooses this new index (because it has the potential to be narrower), although the new plan still has an estimated cost of 0.0045265 in this case (no improvement over the nc1 index).
The nc2 index gives us these seek properties:
Seek: Start: [dbo].[Event].StartDate <= Scalar Operator([@DesiredStartDate])
Residual: [@DesiredEndDate]<=isnull([dbo].[Event].[EndDate],CONVERT_IMPLICIT(datetime,[@EndOfTime],0))
All the index examples quoted here perform FORWARD range scans (BACKWARD scans are only problematic in some kinds of parallel plan).
December 28, 2011 at 12:27 am
For anyone wondering about the cryptic comment in the UPDATE statement:
UPDATE dbo.Event
SET EndDate = DATEADD(dd,
CASE ABS(CHECKSUM(NEWID()))%4
WHEN 0 THEN 1
WHEN 1 THEN 1
WHEN 2 THEN 1
WHEN 3 THEN 7
ELSE 14 --Yeah, this will be another QOTD.
END,
StartDate)
WHERE EventID > 10
This illustrates the problem of using side-effecting non-deterministic functions in a 'simple' CASE expression:
SELECT
CASE ABS(CHECKSUM(NEWID()))%4
WHEN 0 THEN 'Zero'
WHEN 1 THEN 'One'
WHEN 2 THEN 'Two'
WHEN 3 THEN 'Three'
ELSE 'Um, what?'
END
This code often prints the text from the ELSE clause, because simple CASE is expanded to:
[Expr1001] = Scalar Operator(
CASE WHEN abs(checksum(newid()))%(4)=(0) THEN 'Zero' ELSE
CASE WHEN abs(checksum(newid()))%(4)=(1) THEN 'One' ELSE
CASE WHEN abs(checksum(newid()))%(4)=(2) THEN 'Two' ELSE
CASE WHEN abs(checksum(newid()))%(4)=(3) THEN 'Three' ELSE 'Um, what?' END END END END)
Notice how many times abs(checksum(newid()))%(4) is re-evaluated, probably with a different result from what was intended.
December 28, 2011 at 12:52 am
Excellent! Question covers a lot.
Thanks
December 28, 2011 at 1:49 am
Hugo Kornelis (12/27/2011)
...the ISNULL expression could be rewritten [...] maybe this will indeed happen if the table is large enough to warrant using all phases of the optimizer to do a full optimization. But for a thousand rows, the optimizer will simply scan the index instead.
Hi Hugo,
That is a possible transformation, but not one the optimizer currently considers. If it existed, SQL Server would likely use a 'dynamic seek' pattern, rather than a single seek, as shown below:
ALTER TABLE dbo.Event ALTER COLUMN StartDate DATETIME NULL
SELECT
e.EventID,
e.StartDate,
e.EndDate
FROM dbo.[Event] AS e
WHERE
e.StartDate = '2012-06-14'
OR e.StartDate IS NULL
December 28, 2011 at 2:41 am
The weird thing is, I selected the first two answers because I wanted to see the *real* answer, and ended up getting the question right more or less by default... :ermm:
December 28, 2011 at 4:40 am
SQL Kiwi (12/28/2011)
For anyone wondering about the cryptic comment in the UPDATE statement:
UPDATE dbo.Event
SET EndDate = DATEADD(dd,
CASE ABS(CHECKSUM(NEWID()))%4
WHEN 0 THEN 1
WHEN 1 THEN 1
WHEN 2 THEN 1
WHEN 3 THEN 7
ELSE 14 --Yeah, this will be another QOTD.
END,
StartDate)
WHERE EventID > 10
This illustrates the problem of using side-effecting non-deterministic functions in a 'simple' CASE expression:
SELECT
CASE ABS(CHECKSUM(NEWID()))%4
WHEN 0 THEN 'Zero'
WHEN 1 THEN 'One'
WHEN 2 THEN 'Two'
WHEN 3 THEN 'Three'
ELSE 'Um, what?'
END
This code often prints the text from the ELSE clause, because simple CASE is expanded to:
[Expr1001] = Scalar Operator(
CASE WHEN abs(checksum(newid()))%(4)=(0) THEN 'Zero' ELSE
CASE WHEN abs(checksum(newid()))%(4)=(1) THEN 'One' ELSE
CASE WHEN abs(checksum(newid()))%(4)=(2) THEN 'Two' ELSE
CASE WHEN abs(checksum(newid()))%(4)=(3) THEN 'Three' ELSE 'Um, what?' END END END END)
Notice how many times abs(checksum(newid()))%(4) is re-evaluated, probably with a different result from what was intended.
Heh... there goes THAT question.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 28, 2011 at 4:43 am
Jeff Moden (12/28/2011)
Heh... there goes THAT question.
It's always worth re-doing, it catches a lot of people out (and no-one reads these comments anyway :-))
December 28, 2011 at 4:54 am
My most sincere apologies folks. Hugo and Paul are correct. I missed my own NOT NULL constraint on the StartDate column. I'm sending an email to SSC to modify the answers without changing the ones that got it correct. :blush:
To be sure, I made a terrible mistake. [font="Arial Black"]ISNULL isn't SARGable.[/font]
It doesn't, however, change the utility of being able to use an ISULL on the second column of a StartDate/EndDate paired lookup and still get an INDEX SEEK out of the code.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 16 through 30 (of 49 total)
You must be logged in to reply to this topic. Login to reply