April 29, 2013 at 4:13 pm
EDIT: About four posts down is an expanded test script that if you get the chance to run for timing tests, I'd appreciate it. The first one doesn't have enough of the variation for proper review. /EDIT
So, I find myself in need of a GREATEST function equivalent, and I want to make sure before I best practice this concept in my head I do some functional testing. I've created a test bed and I'm getting similar results on the systems I've tested.
It requires 2k8 to try the workarounds because they're based off the CROSS APPLY.
Now, as a baseline, I've built the Case statement from heck, and a million row test structure. My results before you get to the code look like so:
TestType AvgLen MaxLen MinLen
--------------- ----------- ----------- -----------
Cascade Test 2152 2946 1713
CASE Test 1889 2356 1506
Values Test 1027 1433 833
That said, I'd like some of the folks here to take a look at the process, if you're willing, and see what you can break in it. From what I can tell, a VALUES build is running at about twice the speed as any other alternatives I've come up with.
The iTVF wasn't worth discussing when I timing tested it. It came in at about the same level as the Cascade, for similar reasons. You have to embed the function in the function to get it to work right and it just became a mess.
UNPIVOT is right out. Because of the need in my scenario to alter what values are included, it became overly complex for maintenance. If someone has a clean way of performing it, I'd be interested. This test is simplified from my business case, but it has all the moving parts.
So, the test harness:
IF OBJECT_ID ('tempdb..#blah') IS NOT NULL
DROP TABLE #blah
IF OBJECT_ID ('tempdb..#dump1') IS NOT NULL
DROP TABLE #dump1
IF OBJECT_ID ('tempdb..#dump2') IS NOT NULL
DROP TABLE #dump2
IF OBJECT_ID ('tempdb..#dump3') IS NOT NULL
DROP TABLE #dump3
IF OBJECT_ID ('tempdb..#results') IS NOT NULL
DROP TABLE #results
CREATE TABLE #blah
( RowNum INT IDENTITY(1,1),
Date1DATETIME,
Date2DATETIME,
Date3DATETIME,
Date4_1DATETIME,
Date4_2 DATETIME,
Date4_3 DATETIME
)
CREATE CLUSTERED INDEX idx_c_Blah ON #Blah
( RowNum)
CREATE TABLE #dump1
(RowNum INT,
MaxDT DATETIME)
CREATE TABLE #dump2
(RowNum INT,
MaxDT DATETIME)
CREATE TABLE #dump3
(RowNum INT,
MaxDT DATETIME)
CREATE TABLE #results
(TestType VARCHAR(15),
MillisecondLength INT
)
;WITH cteTally
AS (
SELECT TOP 1000000 ROW_NUMBER() OVER(ORDER BY sc1.Type) AS N
FROM sys.syscolumns sc1,
sys.syscolumns sc2,
sys.syscolumns sc3
)
INSERT INTO #blah
SELECT
-- N,
GETDATE(),
GETDATE() + 1,
GETDATE() - 1,
GETDATE() + 2,
GETDATE() + 3,
GETDATE() + 4
FROM
cteTally
-------------- End Structure Build ------------------
GO
DECLARE @StartTestTime DATETIME
TRUNCATE TABLE #dump1
TRUNCATE TABLE #dump2
TRUNCATE TABLE #dump3
-- Did you know that discard results in options in 2k8R2 actually dumps the statistics in messages, too?
-- Yeah, I didn't either until just now...
--SET STATISTICS TIME, IO ON
SET @StartTestTime = GETDATE()
-- Values GREATEST --
INSERT INTO #dump1
SELECT
RowNum,
drv.MaxDt
FROM
#blah
CROSS APPLY
(SELECT MAX( dt) AS MaxDt
FROM
( VALUES ( Date1),
(Date2),
(Date3),
(CASE RowNum %4
WHEN 1 THEN Date4_1
WHEN 2 THEN Date4_2
WHEN 3 THEN Date4_3
WHEN 0 THEN '19000101'
END)
) ValueList (dt)
) AS drv
INSERT INTO #results
VALUES ( 'Values Test', DATEDIFF( ms, @StartTestTime, GETDATE()))
SET @StartTestTime = GETDATE()
-- Cascading CrossApply GREATEST --
INSERT INTO #dump2
SELECT
RowNum,
ca3.GrtDt3
FROM
#blah AS b
CROSS APPLY
( SELECT CASE WHEN b.Date1 >= b.Date2 THEN Date1 ELSE b.Date2 END AS GrtDt1) AS ca
CROSS APPLY
( SELECT CASE WHEN ca.GrtDt1 >= b.Date3 THEN ca.GrtDt1 ELSE b.Date3 END AS GrtDt2) AS ca2
CROSS APPLY
( SELECT CASE WHEN ca2.GrtDt2 >= CASE b.RowNum %4
WHEN 1 THEN b.Date4_1
WHEN 2 THEN b.Date4_2
WHEN 3 THEN b.Date4_3
WHEN 0 THEN '19000101'
END
THEN ca2.GrtDt2
ELSE CASE b.RowNum %4
WHEN 1 THEN b.Date4_1
WHEN 2 THEN b.Date4_2
WHEN 3 THEN b.Date4_3
WHEN 0 THEN '19000101'
END
END AS GrtDt3
) AS ca3
INSERT INTO #results
VALUES ( 'Cascade Test', DATEDIFF( ms, @StartTestTime, GETDATE()))
SET @StartTestTime = GETDATE()
-- CASE method GREATEST --
INSERT INTO #dump3
SELECT
RowNum,
CASE
WHEN
Date1 >= Date2
THEN
CASE WHEN Date1 >= Date3
THEN CASE
WHEN Date1 >= CASE RowNum %4
WHEN 1 THEN Date4_1
WHEN 2 THEN Date4_2
WHEN 3 THEN Date4_3
WHEN 0 THEN '19000101'
END
THEN Date1
ELSE
CASE RowNum %4
WHEN 1 THEN Date4_1
WHEN 2 THEN Date4_2
WHEN 3 THEN Date4_3
WHEN 0 THEN '19000101'
END
END
ELSE CASE
WHEN Date3 >= CASE RowNum %4
WHEN 1 THEN Date4_1
WHEN 2 THEN Date4_2
WHEN 3 THEN Date4_3
WHEN 0 THEN '19000101'
END
THEN Date3
ELSE
CASE RowNum %4
WHEN 1 THEN Date4_1
WHEN 2 THEN Date4_2
WHEN 3 THEN Date4_3
WHEN 0 THEN '19000101'
END
END
END
ELSE
CASE WHEN Date2 >= Date3
THEN CASE
WHEN Date2 >= CASE RowNum %4
WHEN 1 THEN Date4_1
WHEN 2 THEN Date4_2
WHEN 3 THEN Date4_3
WHEN 0 THEN '19000101'
END
THEN Date2
ELSE
CASE RowNum %4
WHEN 1 THEN Date4_1
WHEN 2 THEN Date4_2
WHEN 3 THEN Date4_3
WHEN 0 THEN '19000101'
END
END
ELSE CASE
WHEN Date3 >= CASE RowNum %4
WHEN 1 THEN Date4_1
WHEN 2 THEN Date4_2
WHEN 3 THEN Date4_3
WHEN 0 THEN '19000101'
END
THEN Date3
ELSE
CASE RowNum %4
WHEN 1 THEN Date4_1
WHEN 2 THEN Date4_2
WHEN 3 THEN Date4_3
WHEN 0 THEN '19000101'
END
END
END
END
FROM
#blah
INSERT INTO #results
VALUES ( 'CASE Test', DATEDIFF( ms, @StartTestTime, GETDATE()))
SET @StartTestTime = GETDATE()
--SET STATISTICS TIME, IO OFF
GO 50
SELECT
TestType,
AVG( MillisecondLength) AS AvgLen,
MAX( MillisecondLength) AS MaxLen,
MIN( MillisecondLength) AS MinLen
FROM
#results
GROUP BY
TestType
GO
As I mentioned, I'd be appreciative if some of y'all would be willing to run this and check if you get similar times, and see if you can find exceptions to the timings due to some wildness I haven't even contemplated yet.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
April 29, 2013 at 5:52 pm
Some stats for you from my PC
+-----------------------------------------+
¦[highlight="#808080"] TestType [/highlight]¦[highlight="#808080"] AvgLen [/highlight]¦[highlight="#808080"] MaxLen [/highlight]¦[highlight="#808080"] MinLen [/highlight]¦
+--------------+--------+--------+--------¦
¦ Cascade Test ¦ 877 ¦ 1070 ¦ 840 ¦
[highlight="#E0E0E0"]¦ CASE Test ¦ 758 ¦ 1053 ¦ 730 ¦[/highlight]
¦ Values Test ¦ 395 ¦ 600 ¦ 370 ¦
+-----------------------------------------+
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
April 29, 2013 at 6:22 pm
Having looked closer at the code, I have a couple of suggestions, but only in terms of the test harness - to make things fairer, I think.
Here is the amended code:
IF OBJECT_ID ('tempdb..#blah') IS NOT NULL
DROP TABLE #blah
IF OBJECT_ID ('tempdb..#dump1') IS NOT NULL
DROP TABLE #dump1
IF OBJECT_ID ('tempdb..#dump2') IS NOT NULL
DROP TABLE #dump2
IF OBJECT_ID ('tempdb..#dump3') IS NOT NULL
DROP TABLE #dump3
IF OBJECT_ID ('tempdb..#results') IS NOT NULL
DROP TABLE #results
CREATE TABLE #blah
( RowNum INT IDENTITY(1,1),
Date1DATETIME,
Date2DATETIME,
Date3DATETIME,
Date4DATETIME
)
CREATE CLUSTERED INDEX idx_c_Blah ON #Blah
( RowNum)
CREATE TABLE #dump1
(RowNum INT,
MaxDT DATETIME)
CREATE TABLE #dump2
(RowNum INT,
MaxDT DATETIME)
CREATE TABLE #dump3
(RowNum INT,
MaxDT DATETIME)
CREATE TABLE #results
(TestType VARCHAR(15),
MillisecondLength INT
)
;WITH cteTally
AS (
SELECT TOP 1000000 ROW_NUMBER() OVER(ORDER BY @@SPID) AS N
FROM (values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) sc1(a)
, (values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) sc2(a)
, (values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) sc3(a)
, (values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) sc4(a)
, (values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) sc5(a)
, (values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) sc6(a)
)
INSERT INTO #blah
SELECT
-- N,
GETDATE(),
GETDATE() + 1,
GETDATE() - 1,
CASE N %4
WHEN 1 THEN getdate()+2
WHEN 2 THEN getdate()+3
WHEN 3 THEN getdate()+4
WHEN 0 THEN '19000101'
END
FROM
cteTally
-------------- End Structure Build ------------------
GO
DECLARE @StartTestTime DATETIME
TRUNCATE TABLE #dump1
TRUNCATE TABLE #dump2
TRUNCATE TABLE #dump3
-- Did you know that discard results in options in 2k8R2 actually dumps the statistics in messages, too?
-- Yeah, I didn't either until just now...
--SET STATISTICS TIME, IO ON
declare @di int,@dt datetime;
SET @StartTestTime = GETDATE()
-- Values GREATEST --
--INSERT INTO #dump1
SELECT
@di=RowNum,
@dt=drv.MaxDt
FROM
#blah
CROSS APPLY
(SELECT MAX( dt) AS MaxDt
FROM
( VALUES ( Date1),
(Date2),
(Date3),
(Date4)
) ValueList (dt)
) AS drv
INSERT INTO #results
VALUES ( 'Values Test', DATEDIFF( ms, @StartTestTime, GETDATE()))
SET @StartTestTime = GETDATE()
-- Cascading CrossApply GREATEST --
--INSERT INTO #dump2
SELECT
@di=RowNum,
@dt=ca3.GrtDt3
FROM
#blah AS b
CROSS APPLY
( SELECT CASE WHEN b.Date1 >= b.Date2 THEN Date1 ELSE b.Date2 END AS GrtDt1) AS ca
CROSS APPLY
( SELECT CASE WHEN ca.GrtDt1 >= b.Date3 THEN ca.GrtDt1 ELSE b.Date3 END AS GrtDt2) AS ca2
CROSS APPLY
( SELECT CASE WHEN ca2.GrtDt2 >= Date4
THEN ca2.GrtDt2
ELSE Date4
END AS GrtDt3
) AS ca3
INSERT INTO #results
VALUES ( 'Cascade Test', DATEDIFF( ms, @StartTestTime, GETDATE()))
SET @StartTestTime = GETDATE()
-- CASE method GREATEST --
--INSERT INTO #dump3
SELECT
@di=RowNum,
@dt=CASE
WHEN
Date1 >= Date2
THEN
CASE WHEN Date1 >= Date3
THEN CASE
WHEN Date1 >= Date4
THEN Date1
ELSE
Date4
END
ELSE CASE
WHEN Date3 >= Date4
THEN Date3
ELSE
Date4
END
END
ELSE
CASE WHEN Date2 >= Date3
THEN CASE
WHEN Date2 >= Date4
THEN Date2
ELSE
Date4
END
ELSE CASE
WHEN Date3 >= Date4
THEN Date3
ELSE
Date4
END
END
END
FROM
#blah
INSERT INTO #results
VALUES ( 'CASE Test', DATEDIFF( ms, @StartTestTime, GETDATE()))
SET @StartTestTime = GETDATE()
--SET STATISTICS TIME, IO OFF
GO 50
SELECT
TestType,
AVG( MillisecondLength) AS AvgLen,
MAX( MillisecondLength) AS MaxLen,
MIN( MillisecondLength) AS MinLen
FROM
#results
GROUP BY
TestType
GO
You will see I have moved the CASE N%4 ... statement into the test data generator, and swapped the syscolumns cross joins for VALUES cross joins as it is much quicker on my system... I also swapped the insert into a temp table for dumping into variables to avoid IO issues affecting the results unduly.
I moved the CASE N%4 statement because it didn't seem fair to the queries that had it in there multiple times - and despite how you would expect the engine to evaluate it once per row and be done with it, it does seem to have levelled the playing field.
Here are my new results:
+-----------------------------------------+
¦[highlight="#808080"] TestType [/highlight]¦[highlight="#808080"] AvgLen [/highlight]¦[highlight="#808080"] MaxLen [/highlight]¦[highlight="#808080"] MinLen [/highlight]¦
+--------------+--------+--------+--------¦
¦ Cascade Test ¦ 385 ¦ 410 ¦ 370 ¦
[highlight="#E0E0E0"]¦ CASE Test ¦ 310 ¦ 333 ¦ 300 ¦[/highlight]
¦ Values Test ¦ 317 ¦ 1360 ¦ 276 ¦
+-----------------------------------------+
I expect that high value on the Values test was just a blip.
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
April 29, 2013 at 6:32 pm
mister.magoo (4/29/2013)
You will see I have moved the CASE N%4 ... statement into the test data generator, and swapped the syscolumns cross joins for VALUES cross joins as it is much quicker on my system... I also swapped the insert into a temp table for dumping into variables to avoid IO issues affecting the results unduly.
Two concerns here. First, the reason the CASE statement isn't in the data generator is because it's supposed to signify different conditions. An example. Let's say this is tracking orders from all sources, and you have a source of data column. You also have columns stating 'first email', 'order from internet', etc. This switch swaps out the correct 'first contact' date with the proper setting.
By moving it to the data generator you've adjusted the column decision to the data before you ever start the process. One of the reasons for this is to actually make that decision during the greatest component. I understand where you're going with it for 'fairness' but I also wanted to keep the complexity that drove the question in the first place.
Also, Table Variables will write or not write to disk just as often/not often as a #table. Check out Wayne's table variable myths article, it's quite thorough. 🙂 I'll see if I can find the link later if some other helpful soul doesn't drop by with it.
I moved the CASE N%4 statement because it didn't seem fair to the queries that had it in there multiple times - and despite how you would expect the engine to evaluate it once per row and be done with it, it does seem to have levelled the playing field.
Just so, but as discussed above, the decision tree is part of the optimization piece I'm trying to evaluate. Imagine if those values came from different tables instead of the test-harness presentation. (Order Date, Ship Date, CustomerEmailLog, etc)
Now, could/should they be compared side by side? Sure, absolutely. In that case, however, just strip out the offending case code and simply swap in Date4_1, and you'll see equivalent execution plans as to pre-deciding the variable entry.
What I really need to do to level this playing field is to variablize the values in the dates so that the case has a chance to actually short circuit more in line with probable data. I will adjust for that in the near future.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
April 29, 2013 at 6:39 pm
Evil Kraig F (4/29/2013)
Two concerns here. First, the reason the CASE statement isn't in the data generator is because it's supposed to signify different conditions. An example. Let's say this is tracking orders from all sources, and you have a source of data column. You also have columns stating 'first email', 'order from internet', etc. This switch swaps out the correct 'first contact' date with the proper setting.
Ah, ok I see (sort of) - I thought you were just switching dates to keep the engine on it's toes 😛
Also, Table Variables will write or not write to disk just as often/not often as a #table. Check out Wayne's table variable myths article, it's quite thorough. 🙂 I'll see if I can find the link later if some other helpful soul doesn't drop by with it.
No table variables here, just a plain simple variable to accept the values from the select and dump them - look again :w00t: (sometimes referred to here as a "bitbucket" after the member of the same handle I believe)
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
April 29, 2013 at 6:59 pm
mister.magoo (4/29/2013)
Evil Kraig F (4/29/2013)
No table variables here, just a plain simple variable to accept the values from the select and dump them - look again :w00t: (sometimes referred to here as a "bitbucket" after the member of the same handle I believe)
D'oh. I misread what you wrote and didn't read the stinkin' code. Not my best moment. That idea I'll definately steal wholeheartedly and without shame. 🙂
However, in the interest of the fairness checks you've mentioned, I've wrapped the different styles into the process as you'll see below:
(waits about 10 minutes for the thing to complete without errors this time please c'mon anyday now... 7:29 later)
SET NOCOUNT ON;
IF OBJECT_ID ('tempdb..#blah') IS NOT NULL
DROP TABLE #blah
IF OBJECT_ID ('tempdb..#VarDates') IS NOT NULL
DROP TABLE #VarDates
IF OBJECT_ID ('tempdb..#results') IS NOT NULL
DROP TABLE #results
CREATE TABLE #blah
( RowNum INT IDENTITY(1,1),
Date1DATETIME,
Date2DATETIME,
Date3DATETIME,
Date4_1DATETIME,
Date4_2 DATETIME,
Date4_3 DATETIME
)
CREATE CLUSTERED INDEX idx_c_Blah ON #Blah
( RowNum)
CREATE TABLE #VarDates
( RowNum INT IDENTITY(1,1),
Date1DATETIME,
Date2DATETIME,
Date3DATETIME,
Date4_1DATETIME,
Date4_2 DATETIME,
Date4_3 DATETIME
)
CREATE CLUSTERED INDEX idx_c_VarDates ON #VarDates
( RowNum)
CREATE TABLE #results
(TestSource VARCHAR(15),
TestType VARCHAR(15),
MillisecondLength INT
)
;WITH cteTally
AS (
SELECT TOP 1000000 ROW_NUMBER() OVER(ORDER BY sc1.Type) AS N
FROM sys.syscolumns sc1,
sys.syscolumns sc2,
sys.syscolumns sc3
)
INSERT INTO #blah
SELECT
-- N,
GETDATE(),
GETDATE() + 1,
GETDATE() - 1,
GETDATE() + 2,
GETDATE() + 3,
GETDATE() + 4
FROM
cteTally
;WITH cteTally
AS (
SELECT TOP 1000000 ROW_NUMBER() OVER(ORDER BY sc1.Type) AS N
FROM sys.syscolumns sc1,
sys.syscolumns sc2,
sys.syscolumns sc3
)
INSERT INTO #VarDates
SELECT
--N,
DATEADD( dd, ABS(CHECKSUM(NEWID())) % 3250 , '20000101'),
DATEADD( dd, ABS(CHECKSUM(NEWID())) % 3250 , '20000101'),
DATEADD( dd, ABS(CHECKSUM(NEWID())) % 3250 , '20000101'),
DATEADD( dd, ABS(CHECKSUM(NEWID())) % 3250 , '20000101'),
DATEADD( dd, ABS(CHECKSUM(NEWID())) % 3250 , '20000101'),
DATEADD( dd, ABS(CHECKSUM(NEWID())) % 3250 , '20000101')
FROM
cteTally
-------------- End Structure Build ------------------
GO
-- Static Dates Build
DECLARE @dt DATETIME, @rn INT
DECLARE @StartTestTime DATETIME
-- Did you know that discard results in options in 2k8R2 actually dumps the statistics in messages, too?
-- Yeah, I didn't either until just now...
--SET STATISTICS TIME, IO ON
SET @StartTestTime = GETDATE()
-- Values GREATEST --
SELECT
@rn = RowNum,
@dt = drv.MaxDt
FROM
#blah
CROSS APPLY
(SELECT MAX( dt) AS MaxDt
FROM
( VALUES ( Date1),
(Date2),
(Date3),
(CASE RowNum %4
WHEN 1 THEN Date4_1
WHEN 2 THEN Date4_2
WHEN 3 THEN Date4_3
WHEN 0 THEN '19000101'
END)
) ValueList (dt)
) AS drv
INSERT INTO #results
VALUES ( 'Static Dates', 'Values Test', DATEDIFF( ms, @StartTestTime, GETDATE()))
SET @StartTestTime = GETDATE()
-- Cascading CrossApply GREATEST --
SELECT
@rn = RowNum,
@dt = ca3.GrtDt3
FROM
#blah AS b
CROSS APPLY
( SELECT CASE WHEN b.Date1 >= b.Date2 THEN Date1 ELSE b.Date2 END AS GrtDt1) AS ca
CROSS APPLY
( SELECT CASE WHEN ca.GrtDt1 >= b.Date3 THEN ca.GrtDt1 ELSE b.Date3 END AS GrtDt2) AS ca2
CROSS APPLY
( SELECT CASE WHEN ca2.GrtDt2 >= CASE b.RowNum %4
WHEN 1 THEN b.Date4_1
WHEN 2 THEN b.Date4_2
WHEN 3 THEN b.Date4_3
WHEN 0 THEN '19000101'
END
THEN ca2.GrtDt2
ELSE CASE b.RowNum %4
WHEN 1 THEN b.Date4_1
WHEN 2 THEN b.Date4_2
WHEN 3 THEN b.Date4_3
WHEN 0 THEN '19000101'
END
END AS GrtDt3
) AS ca3
INSERT INTO #results
VALUES ( 'Static Dates', 'Cascade Test', DATEDIFF( ms, @StartTestTime, GETDATE()))
SET @StartTestTime = GETDATE()
-- CASE method GREATEST --
SELECT
@rn = RowNum,
@dt = CASE
WHEN
Date1 >= Date2
THEN
CASE WHEN Date1 >= Date3
THEN CASE
WHEN Date1 >= CASE RowNum %4
WHEN 1 THEN Date4_1
WHEN 2 THEN Date4_2
WHEN 3 THEN Date4_3
WHEN 0 THEN '19000101'
END
THEN Date1
ELSE
CASE RowNum %4
WHEN 1 THEN Date4_1
WHEN 2 THEN Date4_2
WHEN 3 THEN Date4_3
WHEN 0 THEN '19000101'
END
END
ELSE CASE
WHEN Date3 >= CASE RowNum %4
WHEN 1 THEN Date4_1
WHEN 2 THEN Date4_2
WHEN 3 THEN Date4_3
WHEN 0 THEN '19000101'
END
THEN Date3
ELSE
CASE RowNum %4
WHEN 1 THEN Date4_1
WHEN 2 THEN Date4_2
WHEN 3 THEN Date4_3
WHEN 0 THEN '19000101'
END
END
END
ELSE
CASE WHEN Date2 >= Date3
THEN CASE
WHEN Date2 >= CASE RowNum %4
WHEN 1 THEN Date4_1
WHEN 2 THEN Date4_2
WHEN 3 THEN Date4_3
WHEN 0 THEN '19000101'
END
THEN Date2
ELSE
CASE RowNum %4
WHEN 1 THEN Date4_1
WHEN 2 THEN Date4_2
WHEN 3 THEN Date4_3
WHEN 0 THEN '19000101'
END
END
ELSE CASE
WHEN Date3 >= CASE RowNum %4
WHEN 1 THEN Date4_1
WHEN 2 THEN Date4_2
WHEN 3 THEN Date4_3
WHEN 0 THEN '19000101'
END
THEN Date3
ELSE
CASE RowNum %4
WHEN 1 THEN Date4_1
WHEN 2 THEN Date4_2
WHEN 3 THEN Date4_3
WHEN 0 THEN '19000101'
END
END
END
END
FROM
#blah
INSERT INTO #results
VALUES ( 'Static Dates', 'CASE Test', DATEDIFF( ms, @StartTestTime, GETDATE()))
SET @StartTestTime = GETDATE()
--SET STATISTICS TIME, IO OFF
GO 50
-- No case Build
DECLARE @dt DATETIME, @rn INT
DECLARE @StartTestTime DATETIME
-- Did you know that discard results in options in 2k8R2 actually dumps the statistics in messages, too?
-- Yeah, I didn't either until just now...
--SET STATISTICS TIME, IO ON
SET @StartTestTime = GETDATE()
-- Values GREATEST --
SELECT
@rn = RowNum,
@dt = drv.MaxDt
FROM
#blah
CROSS APPLY
(SELECT MAX( dt) AS MaxDt
FROM
( VALUES ( Date1),
(Date2),
(Date3),
(Date4_1)
) ValueList (dt)
) AS drv
INSERT INTO #results
VALUES ( 'No Case Check', 'Values Test', DATEDIFF( ms, @StartTestTime, GETDATE()))
SET @StartTestTime = GETDATE()
-- Cascading CrossApply GREATEST --
SELECT
@rn = RowNum,
@dt = ca3.GrtDt3
FROM
#blah AS b
CROSS APPLY
( SELECT CASE WHEN b.Date1 >= b.Date2 THEN Date1 ELSE b.Date2 END AS GrtDt1) AS ca
CROSS APPLY
( SELECT CASE WHEN ca.GrtDt1 >= b.Date3 THEN ca.GrtDt1 ELSE b.Date3 END AS GrtDt2) AS ca2
CROSS APPLY
( SELECT CASE WHEN ca2.GrtDt2 >= Date4_1
THEN ca2.GrtDt2
ELSE Date4_1
END AS GrtDt3
) AS ca3
INSERT INTO #results
VALUES ( 'No Case Check', 'Cascade Test', DATEDIFF( ms, @StartTestTime, GETDATE()))
SET @StartTestTime = GETDATE()
-- CASE method GREATEST --
SELECT
@rn = RowNum,
@dt = CASE
WHEN
Date1 >= Date2
THEN
CASE WHEN Date1 >= Date3
THEN CASE
WHEN Date1 >= Date4_1
THEN Date1
ELSE
Date4_1
END
ELSE CASE
WHEN Date3 >= Date4_1
THEN Date3
ELSE
Date4_1
END
END
ELSE
CASE WHEN Date2 >= Date3
THEN CASE
WHEN Date2 >= Date4_1
THEN Date2
ELSE
Date4_1
END
ELSE CASE
WHEN Date3 >= Date4_1
THEN Date3
ELSE
Date4_1
END
END
END
FROM
#blah
INSERT INTO #results
VALUES ( 'No Case Check', 'CASE Test', DATEDIFF( ms, @StartTestTime, GETDATE()))
SET @StartTestTime = GETDATE()
--SET STATISTICS TIME, IO OFF
GO 50
-- High Variation Date Check
-- Static Dates Build
DECLARE @dt DATETIME, @rn INT
DECLARE @StartTestTime DATETIME
-- Did you know that discard results in options in 2k8R2 actually dumps the statistics in messages, too?
-- Yeah, I didn't either until just now...
--SET STATISTICS TIME, IO ON
SET @StartTestTime = GETDATE()
-- Values GREATEST --
SELECT
@rn = RowNum,
@dt = drv.MaxDt
FROM
#VarDates
CROSS APPLY
(SELECT MAX( dt) AS MaxDt
FROM
( VALUES ( Date1),
(Date2),
(Date3),
(CASE RowNum %4
WHEN 1 THEN Date4_1
WHEN 2 THEN Date4_2
WHEN 3 THEN Date4_3
WHEN 0 THEN '19000101'
END)
) ValueList (dt)
) AS drv
INSERT INTO #results
VALUES ( 'Variable Dates', 'Values Test', DATEDIFF( ms, @StartTestTime, GETDATE()))
SET @StartTestTime = GETDATE()
-- Cascading CrossApply GREATEST --
SELECT
@rn = RowNum,
@dt = ca3.GrtDt3
FROM
#VarDates AS b
CROSS APPLY
( SELECT CASE WHEN b.Date1 >= b.Date2 THEN Date1 ELSE b.Date2 END AS GrtDt1) AS ca
CROSS APPLY
( SELECT CASE WHEN ca.GrtDt1 >= b.Date3 THEN ca.GrtDt1 ELSE b.Date3 END AS GrtDt2) AS ca2
CROSS APPLY
( SELECT CASE WHEN ca2.GrtDt2 >= CASE b.RowNum %4
WHEN 1 THEN b.Date4_1
WHEN 2 THEN b.Date4_2
WHEN 3 THEN b.Date4_3
WHEN 0 THEN '19000101'
END
THEN ca2.GrtDt2
ELSE CASE b.RowNum %4
WHEN 1 THEN b.Date4_1
WHEN 2 THEN b.Date4_2
WHEN 3 THEN b.Date4_3
WHEN 0 THEN '19000101'
END
END AS GrtDt3
) AS ca3
INSERT INTO #results
VALUES ( 'Variable Dates', 'Cascade Test', DATEDIFF( ms, @StartTestTime, GETDATE()))
SET @StartTestTime = GETDATE()
-- CASE method GREATEST --
SELECT
@rn = RowNum,
@dt = CASE
WHEN
Date1 >= Date2
THEN
CASE WHEN Date1 >= Date3
THEN CASE
WHEN Date1 >= CASE RowNum %4
WHEN 1 THEN Date4_1
WHEN 2 THEN Date4_2
WHEN 3 THEN Date4_3
WHEN 0 THEN '19000101'
END
THEN Date1
ELSE
CASE RowNum %4
WHEN 1 THEN Date4_1
WHEN 2 THEN Date4_2
WHEN 3 THEN Date4_3
WHEN 0 THEN '19000101'
END
END
ELSE CASE
WHEN Date3 >= CASE RowNum %4
WHEN 1 THEN Date4_1
WHEN 2 THEN Date4_2
WHEN 3 THEN Date4_3
WHEN 0 THEN '19000101'
END
THEN Date3
ELSE
CASE RowNum %4
WHEN 1 THEN Date4_1
WHEN 2 THEN Date4_2
WHEN 3 THEN Date4_3
WHEN 0 THEN '19000101'
END
END
END
ELSE
CASE WHEN Date2 >= Date3
THEN CASE
WHEN Date2 >= CASE RowNum %4
WHEN 1 THEN Date4_1
WHEN 2 THEN Date4_2
WHEN 3 THEN Date4_3
WHEN 0 THEN '19000101'
END
THEN Date2
ELSE
CASE RowNum %4
WHEN 1 THEN Date4_1
WHEN 2 THEN Date4_2
WHEN 3 THEN Date4_3
WHEN 0 THEN '19000101'
END
END
ELSE CASE
WHEN Date3 >= CASE RowNum %4
WHEN 1 THEN Date4_1
WHEN 2 THEN Date4_2
WHEN 3 THEN Date4_3
WHEN 0 THEN '19000101'
END
THEN Date3
ELSE
CASE RowNum %4
WHEN 1 THEN Date4_1
WHEN 2 THEN Date4_2
WHEN 3 THEN Date4_3
WHEN 0 THEN '19000101'
END
END
END
END
FROM
#VarDates
INSERT INTO #results
VALUES ( 'Variable Dates', 'CASE Test', DATEDIFF( ms, @StartTestTime, GETDATE()))
SET @StartTestTime = GETDATE()
--SET STATISTICS TIME, IO OFF
GO 50
SELECT
TestSource,
TestType,
AVG( MillisecondLength) AS AvgLen,
MAX( MillisecondLength) AS MaxLen,
MIN( MillisecondLength) AS MinLen,
COUNT(*) AS CountOfEntries
FROM
#results
GROUP BY
TestSource,
TestType
ORDER BY
TestSource,
TestType
GO
SET NOCOUNT OFF;
GO
Along with my current results:
TestSource TestType AvgLen MaxLen MinLen CountOfEntries
--------------- --------------- ----------- ----------- ----------- --------------
No Case Check Cascade Test 671 1023 520 50
No Case Check CASE Test 591 860 463 50
No Case Check Values Test 735 1200 570 50
Static Dates Cascade Test 1348 1926 1090 50
Static Dates CASE Test 1172 1646 966 50
Static Dates Values Test 768 1010 650 50
Variable Dates Cascade Test 1308 1723 1080 50
Variable Dates CASE Test 1235 1756 953 50
Variable Dates Values Test 816 1156 670 50
Which produces some... interesting results.
For the lack of a CASE check during the evaluation, the horrible and evil case statement seems to rule the roost, with Cascade coming in second and Values coming in a horrendous 3rd.
Whenever a decision tree to determine the included value is involved however, Values wins by a landslide. I find that very interesting.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
April 29, 2013 at 7:08 pm
Because of Mr. Magoo's confusion as to my use case, I want to present an idea of the use case. It's not the one in question but that's a different story, it illustrates my business concern that I'm optimizing for. Not that I don't want to find the exceptions either.
Assume you have an CRS that takes orders, ships products, delivers products, and also tracks emails, phone calls, and physical mail to your customer. Sample schema would be something like this:
CREATE TABLE Customers
(CustomerID INT IDENTITY (1, 1) NOT NULL,
CustomerName VARCHAR(50) NOT NULL,
PreferredContactMethod VARCHAR(50)
)
CREATE TABLE Orders
(OrderID INT IDENTITY ( 1,1) NOT NULL,
CustomerIDINT NOT NULL,
OrderDate DATETIME NOT NULL,
ShipDate DATETIME NULL,
DeliverDate DATETIME NULL
)
CREATE TABLE OrderLineItem
(-- Stuff, not important
)
CREATE TABLE EmailContact
( CustomerID INT, EmailTime DATETIME)
CREATE TABLE SnailMail
(CustomerID INT, MailSent DATETIME)
CREATE TABLE PhoneCall
(CustomerID INT, PhoneCall DATETIME)
The idea of the query is to let us know the last time we contacted the client either about product or via their preferred contact method. So, in the end, we want something that does this:
SELECT
GREATEST( OrderDate, ShipDate, DeliverDate,
CASE PreferredContactMethod
WHEN 'Email' THEN EmailTime
WHEN 'Snailmail' THEN MailSent
WHEN 'Phone' THEN PhoneCall
ELSE GREATEST( EmailTime, MailSent, PhoneCall)
END
I hope a clarification of the use case helps to explain what drove the idea behind researching this optimization in the first place.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
April 29, 2013 at 7:08 pm
Evil Kraig F (4/29/2013)
D'oh. I misread what you wrote and didn't read the stinkin' code. Not my best moment. That idea I'll definately steal wholeheartedly and without shame. 🙂
No worries, and steal away - that's what I did 😛
(waits about 10 minutes for the thing to complete without errors this time please c'mon anyday now... 7:29 later)
I remember waiting 40 hours once for a Trillion row test...I did get coffee though 😎
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
April 30, 2013 at 2:22 am
Results from 2012 SP1
Cascade Test131824601220
CASE Test123823501150
Values Test6011016503
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537April 30, 2013 at 4:18 am
Evil Kraig F (4/29/2013)
Because of Mr. Magoo's confusion as to my use case, I want to present an idea of the use case. It's not the one in question but that's a different story, it illustrates my business concern that I'm optimizing for. Not that I don't want to find the exceptions either....
I hope a clarification of the use case helps to explain what drove the idea behind researching this optimization in the first place.
Much clearer 🙂
I like the idea of trying to pin down a "best practice" for this, and will watch with interest to see if you achieve that goal.
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
April 30, 2013 at 6:17 am
mister.magoo (4/30/2013)
Evil Kraig F (4/29/2013)
Because of Mr. Magoo's confusion as to my use case, I want to present an idea of the use case. It's not the one in question but that's a different story, it illustrates my business concern that I'm optimizing for. Not that I don't want to find the exceptions either....
I hope a clarification of the use case helps to explain what drove the idea behind researching this optimization in the first place.
Much clearer 🙂
I like the idea of trying to pin down a "best practice" for this, and will watch with interest to see if you achieve that goal.
+1
Results from local instance, version 10.50.1600.1
Run 1 with actual plan selected
TestSourceTestTypeAvgLenMaxLenMinLenCountOfEntries
No Case CheckCascade Test71687060050
No Case CheckCASE Test 172699358650
No Case CheckCASE Test 270885059050
No Case CheckValues Test50566648350
Static DatesCascade Test 178595063050
Static DatesCascade Test 278298363350
Static DatesCascade Test 378797363350
Static DatesCASE Test 166581652350
Static DatesCASE Test 268391052650
Static DatesValues Test59476655050
Variable DatesCascade Test11211256102650
Variable DatesCASE Test999124089050
Variable DatesValues Test617278655350
Run 2
TestSourceTestTypeAvgLenMaxLenMinLenCountOfEntries
No Case CheckCascade Test31134030650
No Case CheckCASE Test 129932029650
No Case CheckCASE Test 230131329650
No Case CheckValues Test44658042050
Static DatesCascade Test 162066061050
Static DatesCascade Test 261464060650
Static DatesCascade Test 361364360650
Static DatesCASE Test 149059048350
Static DatesCASE Test 249053648650
Static DatesValues Test575261049650
Variable DatesCascade Test60263059650
Variable DatesCASE Test47450647050
Variable DatesValues Test574291649650
Run 3
TestSourceTestTypeAvgLenMaxLenMinLenCountOfEntries
No Case CheckCascade Test31233030650
No Case CheckCASE Test 130332329650
No Case CheckCASE Test 230231629650
No Case CheckValues Test46757643050
Static DatesCascade Test 162583661050
Static DatesCascade Test 262690361350
Static DatesCascade Test 3622100060650
Static DatesCASE Test 150360649050
Static DatesCASE Test 251399049350
Static DatesValues Test629365350650
Variable DatesCascade Test60164659350
Variable DatesCASE Test47451047050
Variable DatesValues Test646421051350
Run 4
TestSourceTestTypeAvgLenMaxLenMinLenCountOfEntries
No Case CheckCascade Test31939031050
No Case CheckCASE Test 130432330050
No Case CheckCASE Test 230733630050
No Case CheckValues Test50280343350
Static DatesCascade Test 162465661650
Static DatesCascade Test 262573061650
Static DatesCascade Test 361864661050
Static DatesCASE Test 151357050650
Static DatesCASE Test 251357350350
Static DatesValues Test637295653650
Variable DatesCascade Test62171061350
Variable DatesCASE Test47452046650
Variable DatesValues Test628292350350
Run 5
TestSourceTestTypeAvgLenMaxLenMinLenCountOfEntries
No Case CheckCascade Test30932630350
No Case CheckCASE Test 130137629650
No Case CheckCASE Test 230038029350
No Case CheckValues Test45053042050
Static DatesCascade Test 161869361050
Static DatesCascade Test 261570060650
Static DatesCascade Test 360972360050
Static DatesCASE Test 150360349350
Static DatesCASE Test 249853349050
Static DatesValues Test575283350050
Variable DatesCascade Test60268059050
Variable DatesCASE Test47453347050
Variable DatesValues Test585291650050
Test 1, Test 2 and Test 3 are slight variations of the base code, since the performance difference isn't particularly significant I'll not post unless someone requests.
Notice how much the results are bent by running with actual plan on.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
April 30, 2013 at 11:44 am
Mark-101232 (4/30/2013)
Results from 2012 SP1Cascade Test131824601220
CASE Test123823501150
Values Test6011016503
Appreciate the time Mark. Would you be willing to rerun with the harness in the fourth post? After Mr. Magoo brought up a few things I expanded the harness.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
April 30, 2013 at 11:46 am
ChrisM@Work (4/30/2013)
Test 1, Test 2 and Test 3 are slight variations of the base code, since the performance difference isn't particularly significant I'll not post unless someone requests.
Notice how much the results are bent by running with actual plan on.
I didn't run most of my tests with Execution Plans, and apparently we have some weaker hardware. Now what's interesting is in your tests CASE is the clear winner almost across the boards, whereas I'm definately finding differently.
I assume the first run was the execution plan run for you?
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
April 30, 2013 at 12:41 pm
Hi Craig.......run on my laptop
hope this helps
kind regards
Microsoft SQL Server 2012 - 11.0.2218.0 (Intel X86)
Enterprise Evaluation Edition on Windows NT 6.1 <X86> (Build 7601: Service Pack 1)
TestSource TestType AvgLen MaxLen MinLen CountOfEntries
--------------- --------------- ----------- ----------- ----------- --------------
No Case Check Cascade Test 392 430 380 50
No Case Check CASE Test 340 363 323 50
No Case Check Values Test 423 460 406 50
Static Dates Cascade Test 700 866 663 50
Static Dates CASE Test 620 776 590 50
Static Dates Values Test 532 1973 470 50
Variable Dates Cascade Test 645 670 636 50
Variable Dates CASE Test 543 586 526 50
Variable Dates Values Test 519 2336 466 50
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
April 30, 2013 at 3:14 pm
J Livingston SQL (4/30/2013)
Hi Craig.......run on my laptophope this helps
kind regards
It does J, thanks. It shows that even though you got huge outliers just like above, the ratios stayed similar to what the rest of us are finding.
Those outliers in the major runs done above seem to be rather significant. I'm going to have to test with MAXDOP 1 and see if it's parallelism that's causing the outliers. Can't hurt to know what a straight line computation would do, either.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply