February 13, 2014 at 5:47 am
Hi all,
I've been given a rather meaty CASE statement which is of the form:
UPDATE Table SET x =
CASE
WHEN y LIKE '%EFG%' THEN 3
WHEN y LIKE 'ABC%' THEN 1
WHEN y LIKE 'DEF%' THEN 2
END
I know that %EFG% will perform the worst as it won't be able to use the index on Column y, so if that was at the bottom of the CASE statement, would performance improve (let's assume all three WHENs have equal significance), and the ordering doesn't affect the logic.
Thanks
Singhster
February 13, 2014 at 8:19 am
Cool question and kind of fun. I tested it out this way:
IF (SELECT OBJECT_ID('Test1')
) IS NOT NULL
DROP TABLE dbo.Test1;
GO
CREATE TABLE dbo.Test1 (C1 VARCHAR(50),C2 INT, C3 INT IDENTITY);
SELECT TOP 1500
IDENTITY( INT,1,1 ) AS n
INTO #Nums
FROM Master.dbo.SysColumns sC1,
Master.dbo.SysColumns sC2;
INSERT INTO dbo.Test1
(C1,C2)
SELECT n, n
FROM #Nums;
DROP TABLE #Nums;
CREATE NONCLUSTERED INDEX i1 ON dbo.Test1 (C1) ;
UPDATE dbo.test1
SET C2 =
CASE
WHEN C1 LIKE '%42%' THEN 3
WHEN C1 LIKE '24%' THEN 2
WHEN C1 LIKE '36%' THEN 1
END
DBCC FREEPROCCACHE()
UPDATE dbo.test1
SET C2 =
CASE
WHEN C1 LIKE '19%' THEN 3
WHEN C1 LIKE '25%' THEN 2
WHEN C1 LIKE '37%' THEN 1
END
Both queries ended up with identical execution plans, right down to the query hash and plan hash. The issue is, the CASE statement itself is a function that forces a scan on the table. You won't, ever, see a performance improvement by modifying the order of the CASE statements or even eliminating the wild card as I did because the CASE itself is a function on the column, requiring a scan.
And thanks for the blog post, writing it up now.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
February 13, 2014 at 9:42 am
Great work Grant, thanks for the response. I suspected as such, but didn't have the nous to test it myself the way you have. Thanks again.
February 13, 2014 at 10:20 am
Nice analysis, Grant! While this hypothetical assumes the order of the WHENs don't have any logical significance, it should be emphasized for any folks new to CASE statements that the order usually does have logical significance. The first WHEN that matches your data will be used. So if we had a string 'DEFG', this version
UPDATE Table SET x =
CASE
WHEN y LIKE '%EFG%' THEN 3
WHEN y LIKE 'ABC%' THEN 1
WHEN y LIKE 'DEF%' THEN 2
END
Will code it as 3. If we change the order of WHENs like this:
UPDATE Table SET x =
CASE
WHEN y LIKE 'ABC%' THEN 1
WHEN y LIKE 'DEF%' THEN 2
WHEN y LIKE '%EFG%' THEN 3
END
It will be coded as 2. Just a pedantic point to keep in mind. 😀
February 13, 2014 at 11:23 am
Grant,
My understanding was the short circuit capabilities of the CASE statement would improve performance, but in a limited variety.
Instead of a simple data change on the table with no where clause (so permanent scanning), something like this would definately be different depending on what the data looked like:
...
CASE WHEN fielda = 1
THEN 2
WHEN fielda = (SELECT somevalue FROM otherTable WHERE keyfield = tablea.keyfield)
THEN NULL
ELSE 3
END
...
If you don't have anything that doesn't have a fielda = 1, that subquery will never process... or so I understood. If this isn't your understanding I'll go setup a test process to see where I'm confused.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
February 13, 2014 at 11:30 am
Evil Kraig F (2/13/2014)
Grant,My understanding was the short circuit capabilities of the CASE statement would improve performance, but in a limited variety.
Instead of a simple data change on the table with no where clause (so permanent scanning), something like this would definately be different depending on what the data looked like:
...
CASE WHEN fielda = 1
THEN 2
WHEN fielda = (SELECT somevalue FROM otherTable WHERE keyfield = tablea.keyfield)
THEN NULL
ELSE 3
END
...
If you don't have anything that doesn't have a fielda = 1, that subquery will never process... or so I understood. If this isn't your understanding I'll go setup a test process to see where I'm confused.
What you said is correct.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 13, 2014 at 12:25 pm
Evil Kraig F (2/13/2014)
Grant,My understanding was the short circuit capabilities of the CASE statement would improve performance, but in a limited variety.
Instead of a simple data change on the table with no where clause (so permanent scanning), something like this would definately be different depending on what the data looked like:
...
CASE WHEN fielda = 1
THEN 2
WHEN fielda = (SELECT somevalue FROM otherTable WHERE keyfield = tablea.keyfield)
THEN NULL
ELSE 3
END
...
If you don't have anything that doesn't have a fielda = 1, that subquery will never process... or so I understood. If this isn't your understanding I'll go setup a test process to see where I'm confused.
I tried this:
SELECT CASE WHEN t1.C2 = 1 THEN 2
WHEN t1.C3 = (SELECT t2.C3
FROM dbo.Test1 AS t2
WHERE t2.C1 = '666'
) THEN NULL
ELSE 3
END AS SomeValue
FROM dbo.Test1 t1
WHERE t1.c1 = '1';
And for the data loaded, that returns one row and one row only, the initial CASE value. But, if you look at the execution plan, the other SELECT statement is defined. Not saying it gets executed, but it's there, directly impacting the plan created.
If I then run this:
SELECT CASE WHEN t1.C2 = 1 THEN 2
--WHEN t1.C3 = (SELECT t2.C3
-- FROM dbo.Test1 AS t2
-- WHERE t2.C1 = '666'
-- ) THEN NULL
ELSE 3
END AS SomeValue
FROM dbo.Test1 t1
WHERE t1.c1 = '1';
I get exactly the same number of reads between the two plans, but the execution plans are radically different, and the execution time of the first plan is a whole lot higher, even though, execution does short circuit. So... I don't know. Based on what I'm seeing, the short-circuit is only for execution, not plan creation, and plan creation kind of doesn't care about the short circuit, so you can end up with a performance loss, that I attribute to the extra stuff in the plan (which doesn't seem to get executed).
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
February 13, 2014 at 12:41 pm
Great question. Great analysis. Great follow up.
February 13, 2014 at 4:04 pm
Grant,
that's a very interesting result you found there, but I guess I should have restated my question. The order does matter, but not for the reasons you checked. You've compared two different sets of query logic.
Whereas something like this would be the same, but just checking the effect of case order:
CASE field<> 2 AND (SELECT...)
WHEN 2 THEN...
ELSE...
END
vs.
CASE field = 2
WHEN (Select...)
ELSE...
END
This would be the same execution plan (basically) but with different timings, I believe. If i can ever get 30 minutes to myself today I'll build out a test structure.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply