January 20, 2016 at 9:21 am
below86 (1/20/2016)
Eirikur Eiriksson (1/20/2016)
below86 (1/20/2016)
Just a simple way I would try and get this:
CREATE TABLE #datetest
(
id varchar(8),
L1 date,
L2 date,
L3 date
);
INSERT INTO #datetest (id, L1, L2, L3)
VALUES ('1005', '1-1-16', '1-17-16', '1-10-16');
-- I did the max on the date here in case there is more than one row per ID
SELECT ID, MAX(L1) AS Max_Date, 'L1' AS Column_Name
INTO #datetest2
FROM #datetest
GROUP BY ID
UNION
SELECT ID, MAX(L2) AS Max_Date, 'L2' AS Column_Name
FROM #datetest
GROUP BY ID
UNION
SELECT ID, MAX(L3) AS Max_Date, 'L3' AS Column_Name
FROM #datetest
GROUP BY ID
;
SELECT L.ID, L.Max_Date, L.Column_Name
FROM #datetest2 L
INNER JOIN (SELECT ID, MAX(Max_Date) AS Max_Date
FROM #datetest2
GROUP BY ID) R
ON L.id = R.id
AND L.Max_Date = R.Max_Date
;
An equivalent but slightly better method using UNION ALL is covered in the test harness, mind you that using UNION forces the server to deduplicate the result set which can be quite expensive.
😎
Your right, I should have caught that it should have been a Union ALL. I didn't read through all the responses before posting by thoughts, next time I will. 🙂
No worries, just wanted to safe you the trouble of plugging your code into the test harness;-)
😎
January 20, 2016 at 9:39 am
Jacob Wilkins (1/20/2016)
Eirikur Eiriksson (1/19/2016)
Jacob Wilkins (1/19/2016)
Throwing in the CASE using Eirikur's wonderful setup yielded this on one million rows on my machine (tangentially, I'm embarrassed that Eirikur's laptop is faster than my test server :blush:):Thanks for this Jacob, plugged your snip into the harness and so far it's leading the race, good job indeed!
😎
What is left is to verify that all the code brings back the correct result set although I don't think that is an issue.
The old laptop I'm using is a Dell 6220 but I've made few "adjustments", quite a nifty little thingy;-)
Alas, as I was drifting off to sleep last night I had a terrible realization (made more terrible by the fact that I was thinking about this before sleep :crazy:). The current CASE formulation will fall on its face if one of the columns is NULL. It needs an extra check for whether the value being compared is NULL. That brings its time up pretty noticeably, although it's still a bit faster than the other solutions on my machine.
This precaution can be simplified as the comparison is correct when the NULL is on the left hand side of the gt operator
😎
Consider this
/* First incorrectly returns 2 and the second is correct*/
SELECT
CASE
WHEN 1 > NULL THEN 1
ELSE 2
END
,CASE
WHEN NULL > 1 THEN 1
ELSE 2
END
-- Use this to simplify the statement
SELECT
CASE
WHEN 1 > ISNULL(NULL,0) THEN 1
ELSE 2
END
Edit: typo
January 20, 2016 at 10:07 am
Eirikur Eiriksson (1/20/2016)
Jacob Wilkins (1/20/2016)
Eirikur Eiriksson (1/19/2016)
Jacob Wilkins (1/19/2016)
Throwing in the CASE using Eirikur's wonderful setup yielded this on one million rows on my machine (tangentially, I'm embarrassed that Eirikur's laptop is faster than my test server :blush:):Thanks for this Jacob, plugged your snip into the harness and so far it's leading the race, good job indeed!
😎
What is left is to verify that all the code brings back the correct result set although I don't think that is an issue.
The old laptop I'm using is a Dell 6220 but I've made few "adjustments", quite a nifty little thingy;-)
Alas, as I was drifting off to sleep last night I had a terrible realization (made more terrible by the fact that I was thinking about this before sleep :crazy:). The current CASE formulation will fall on its face if one of the columns is NULL. It needs an extra check for whether the value being compared is NULL. That brings its time up pretty noticeably, although it's still a bit faster than the other solutions on my machine.
This precaution can be simplified as the comparison is correct when the NULL is on the left hand side of the gt operator
😎
Consider this
/* First incorrectly returns 2 and the second is correct*/
SELECT
CASE
WHEN 1 > NULL THEN 1
ELSE 2
END
,CASE
WHEN NULL > 1 THEN 1
ELSE 2
END
-- Use this to simplify the statement
SELECT
CASE
WHEN 1 > ISNULL(NULL,0) THEN 1
ELSE 2
END
Edit: typo
I agree that using ISNULL would make it read a bit better. My main concern with that was general applicability, particularly, what do you replace NULLs with that won't affect the comparisons? Of course you could use the minimum value of whatever datatype is being used, but then you have to have a separate implementation depending on datatype.
That's not the end of the world, of course, but I figured just adding the OR IS NULL to the condition was a bit clearer about the intent and I didn't have to worry about choosing a replacement value appropriate to the datatype.
Cheers!
January 20, 2016 at 10:11 am
Jacob Wilkins (1/20/2016)
I agree that using ISNULL would make it read a bit better. My main concern with that was general applicability, particularly, what do you replace NULLs with that won't affect the comparisons? Of course you could use the minimum value of whatever datatype is being used, but then you have to have a separate implementation depending on datatype.
That's not the end of the world, of course, but I figured just adding the OR IS NULL to the condition was a bit clearer about the intent and I didn't have to worry about choosing a replacement value appropriate to the datatype.
Cheers!
Absolutely correct, the CASE chase could turn out to be almost endless.
😎
January 20, 2016 at 10:17 am
Eirikur Eiriksson (1/20/2016)
Jacob Wilkins (1/20/2016)
I agree that using ISNULL would make it read a bit better. My main concern with that was general applicability, particularly, what do you replace NULLs with that won't affect the comparisons? Of course you could use the minimum value of whatever datatype is being used, but then you have to have a separate implementation depending on datatype.
That's not the end of the world, of course, but I figured just adding the OR IS NULL to the condition was a bit clearer about the intent and I didn't have to worry about choosing a replacement value appropriate to the datatype.
Cheers!
Absolutely correct, the CASE chase could turn out to be almost endless.
😎
Having said that, it looks like using ISNULL with an appropriate minimum value instead of the OR IS NULL brings the performance of the CASE back in line with its initial results, so if we're trying to squeeze all the performance out of it we can, your suggestion is the way to go 🙂
January 20, 2016 at 10:42 am
Jacob Wilkins (1/20/2016)
Having said that, it looks like using ISNULL with an appropriate minimum value instead of the OR IS NULL brings the performance of the CASE back in line with its initial results, so if we're trying to squeeze all the performance out of it we can, your suggestion is the way to go 🙂
The execution plan gets quite complicated very quickly when using a combination of binary and logical operators, a single OR can double the complexity of the operation.;-)
😎
If we leave out the "requirement" of identifying the columnar origin of the maximum value then we can introduce a method that will use stream aggregation operator, doubt there is a quicker method.
SELECT
TC.TAMC_ID
,XC.MXD
FROM dbo.TBL_TEST_ACROSS_MULTI_COLUMN TC
CROSS APPLY
(
SELECT
MAX(MXCOL.XDATE)
FROM
(
SELECT TC.COL01 UNION ALL
SELECT TC.COL02 UNION ALL
SELECT TC.COL03 UNION ALL
SELECT TC.COL04
) AS MXCOL(XDATE)
)AS XC(MXD);
Updated test harness with all the correctly working solutions
USE tempdb;
GO
SET NOCOUNT ON;
--/*
IF OBJECT_ID(N'dbo.TBL_TEST_ACROSS_MULTI_COLUMN') IS NOT NULL DROP TABLE dbo.TBL_TEST_ACROSS_MULTI_COLUMN;
CREATE TABLE dbo.TBL_TEST_ACROSS_MULTI_COLUMN
(
TAMC_ID INT IDENTITY NOT NULL CONSTRAINT PK_DBO_TBL_TEST_ACROSS_MULTI_COLUMN_TAMC_ID PRIMARY KEY CLUSTERED
,COL01 DATE NOT NULL
,COL02 DATE NOT NULL
,COL03 DATE NOT NULL
,COL04 DATE NOT NULL
);
DECLARE @SAMPLE_SIZE INT = 1000000;
;WITH T(N) AS (SELECT N FROM ( VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS X(N))
, NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) 0 AS N FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7,T T8,T T9)
INSERT INTO dbo.TBL_TEST_ACROSS_MULTI_COLUMN(COL01,COL02,COL03,COL04)
SELECT
DATEADD(DAY,ABS(CHECKSUM(NEWID())) % 43000,0)
,DATEADD(DAY,ABS(CHECKSUM(NEWID())) % 43000,0)
,DATEADD(DAY,ABS(CHECKSUM(NEWID())) % 43000,0)
,DATEADD(DAY,ABS(CHECKSUM(NEWID())) % 43000,0)
FROM NUMS NM;
-- */
DECLARE @timer TABLE (T_TEXT VARCHAR(100) NOT NULL, T_TS DATETIME2(7) NOT NULL DEFAULT (SYSDATETIME()));
DECLARE @OS_INFO TABLE (S_ID INT IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
STEP_NAME VARCHAR(100) NOT NULL, ms_ticks BIGINT NOT NULL, process_kernel_time_ms BIGINT NOT NULL, process_user_time_ms BIGINT NOT NULL);
DECLARE @INT_BUCKET INT = 0;
DECLARE @DATE_BUCKET DATE = NULL;
DECLARE @CHAR_BUCKET VARCHAR(10) = '';
INSERT INTO @OS_INFO (STEP_NAME,ms_ticks,process_kernel_time_ms,process_user_time_ms)
SELECT
'START' AS STEP_NAME
,OSI.ms_ticks
,OSI.process_kernel_time_ms
,OSI.process_user_time_ms
FROM sys.dm_os_sys_info OSI
INSERT INTO @timer(T_TEXT) VALUES('DRY RUN');
SELECT
@INT_BUCKET = MT.TAMC_ID
,@DATE_BUCKET = MT.COL01
,@DATE_BUCKET = MT.COL02
,@DATE_BUCKET = MT.COL03
,@DATE_BUCKET = MT.COL04
FROM dbo.TBL_TEST_ACROSS_MULTI_COLUMN AS MT;
INSERT INTO @timer(T_TEXT) VALUES('DRY RUN');
INSERT INTO @OS_INFO (STEP_NAME,ms_ticks,process_kernel_time_ms,process_user_time_ms)
SELECT
'DRY RUN' AS STEP_NAME
,OSI.ms_ticks
,OSI.process_kernel_time_ms
,OSI.process_user_time_ms
FROM sys.dm_os_sys_info OSI
INSERT INTO @timer(T_TEXT) VALUES('DRY RUN2');
SELECT
@INT_BUCKET = MT.TAMC_ID
,@DATE_BUCKET = MT.COL01
,@DATE_BUCKET = MT.COL02
,@DATE_BUCKET = MT.COL03
,@DATE_BUCKET = MT.COL04
FROM dbo.TBL_TEST_ACROSS_MULTI_COLUMN AS MT;
INSERT INTO @timer(T_TEXT) VALUES('DRY RUN2');
INSERT INTO @OS_INFO (STEP_NAME,ms_ticks,process_kernel_time_ms,process_user_time_ms)
SELECT
'DRY RUN2' AS STEP_NAME
,OSI.ms_ticks
,OSI.process_kernel_time_ms
,OSI.process_user_time_ms
FROM sys.dm_os_sys_info OSI
INSERT INTO @timer(T_TEXT) VALUES('TOP VALUES');
SELECT
@DATE_BUCKET = MA.MaxDate
,@CHAR_BUCKET = MA.ColName
,@INT_BUCKET = MT.TAMC_ID
FROM dbo.TBL_TEST_ACROSS_MULTI_COLUMN AS MT
CROSS APPLY
(
SELECT TOP(1) VA.LName, VA.LDate
FROM (VALUES ('COL01',MT.COL01)
,('COL02',MT.COL02)
,('COL03',MT.COL03)
,('COL04',MT.COL04)
) VA(LName, LDate)
ORDER BY VA.LDate DESC
) AS MA(ColName, MaxDate)
--OPTION (MAXDOP 1)
;
INSERT INTO @timer(T_TEXT) VALUES('TOP VALUES');
INSERT INTO @OS_INFO (STEP_NAME,ms_ticks,process_kernel_time_ms,process_user_time_ms)
SELECT
'TOP VALUES' AS STEP_NAME
,OSI.ms_ticks
,OSI.process_kernel_time_ms
,OSI.process_user_time_ms
FROM sys.dm_os_sys_info OSI
INSERT INTO @timer(T_TEXT) VALUES('TOP VALUES2');
SELECT
@DATE_BUCKET = MA.MaxDate
,@CHAR_BUCKET = MA.ColName
,@INT_BUCKET = MT.TAMC_ID
FROM dbo.TBL_TEST_ACROSS_MULTI_COLUMN AS MT
CROSS APPLY
(
SELECT TOP(1) VA.LName, VA.LDate
FROM (VALUES ('COL01',MT.COL01)
,('COL02',MT.COL02)
,('COL03',MT.COL03)
,('COL04',MT.COL04)
) VA(LName, LDate)
ORDER BY VA.LDate DESC
) AS MA(ColName, MaxDate)
--OPTION (MAXDOP 1)
;
INSERT INTO @timer(T_TEXT) VALUES('TOP VALUES2');
INSERT INTO @OS_INFO (STEP_NAME,ms_ticks,process_kernel_time_ms,process_user_time_ms)
SELECT
'TOP VALUES2' AS STEP_NAME
,OSI.ms_ticks
,OSI.process_kernel_time_ms
,OSI.process_user_time_ms
FROM sys.dm_os_sys_info OSI
INSERT INTO @timer(T_TEXT) VALUES('MAX GROUP');
SELECT
@INT_BUCKET = TC.TAMC_ID
,@DATE_BUCKET = MAX(MXC.COLVAL) --AS MXVAL
FROM dbo.TBL_TEST_ACROSS_MULTI_COLUMN TC
CROSS APPLY
(
SELECT 'COL01', TC.COL01 UNION ALL
SELECT 'COL02', TC.COL02 UNION ALL
SELECT 'COL03', TC.COL03 UNION ALL
SELECT 'COL04', TC.COL04
) AS MXC(COLNAME,COLVAL)
GROUP BY TC.TAMC_ID;
INSERT INTO @timer(T_TEXT) VALUES('MAX GROUP');
INSERT INTO @OS_INFO (STEP_NAME,ms_ticks,process_kernel_time_ms,process_user_time_ms)
SELECT
'MAX GROUP' AS STEP_NAME
,OSI.ms_ticks
,OSI.process_kernel_time_ms
,OSI.process_user_time_ms
FROM sys.dm_os_sys_info OSI
INSERT INTO @timer(T_TEXT) VALUES('MAX GROUP2');
SELECT
@INT_BUCKET = TC.TAMC_ID
,@DATE_BUCKET = MAX(MXC.COLVAL) --AS MXVAL
FROM dbo.TBL_TEST_ACROSS_MULTI_COLUMN TC
CROSS APPLY
(
SELECT 'COL01', TC.COL01 UNION ALL
SELECT 'COL02', TC.COL02 UNION ALL
SELECT 'COL03', TC.COL03 UNION ALL
SELECT 'COL04', TC.COL04
) AS MXC(COLNAME,COLVAL)
GROUP BY TC.TAMC_ID;
INSERT INTO @timer(T_TEXT) VALUES('MAX GROUP2');
INSERT INTO @OS_INFO (STEP_NAME,ms_ticks,process_kernel_time_ms,process_user_time_ms)
SELECT
'MAX GROUP2' AS STEP_NAME
,OSI.ms_ticks
,OSI.process_kernel_time_ms
,OSI.process_user_time_ms
FROM sys.dm_os_sys_info OSI
INSERT INTO @timer(T_TEXT) VALUES('DOUBLE CROSS');
SELECT
@INT_BUCKET = TC.TAMC_ID
,@CHAR_BUCKET = TMC.COLNAME
,@DATE_BUCKET =TMC.COLVAL
FROM dbo.TBL_TEST_ACROSS_MULTI_COLUMN TC
CROSS APPLY
(
SELECT 'COL01', TC.COL01 UNION ALL
SELECT 'COL02', TC.COL02 UNION ALL
SELECT 'COL03', TC.COL03 UNION ALL
SELECT 'COL04', TC.COL04
) AS MXC(COLNAME,COLVAL)
CROSS APPLY
(
SELECT TOP(1) MXC.COLNAME
,MXC.COLVAL
ORDER BY MXC.COLVAL DESC
)AS TMC(COLNAME,COLVAL);
INSERT INTO @timer(T_TEXT) VALUES('DOUBLE CROSS');
INSERT INTO @OS_INFO (STEP_NAME,ms_ticks,process_kernel_time_ms,process_user_time_ms)
SELECT
'DOUBLE CROSS' AS STEP_NAME
,OSI.ms_ticks
,OSI.process_kernel_time_ms
,OSI.process_user_time_ms
FROM sys.dm_os_sys_info OSI
INSERT INTO @timer(T_TEXT) VALUES('DOUBLE CROSS2');
SELECT
@INT_BUCKET = TC.TAMC_ID
,@CHAR_BUCKET = TMC.COLNAME
,@DATE_BUCKET =TMC.COLVAL
FROM dbo.TBL_TEST_ACROSS_MULTI_COLUMN TC
CROSS APPLY
(
SELECT 'COL01', TC.COL01 UNION ALL
SELECT 'COL02', TC.COL02 UNION ALL
SELECT 'COL03', TC.COL03 UNION ALL
SELECT 'COL04', TC.COL04
) AS MXC(COLNAME,COLVAL)
CROSS APPLY
(
SELECT TOP(1) MXC.COLNAME
,MXC.COLVAL
ORDER BY MXC.COLVAL DESC
)AS TMC(COLNAME,COLVAL);
INSERT INTO @timer(T_TEXT) VALUES('DOUBLE CROSS2');
INSERT INTO @OS_INFO (STEP_NAME,ms_ticks,process_kernel_time_ms,process_user_time_ms)
SELECT
'DOUBLE CROSS2' AS STEP_NAME
,OSI.ms_ticks
,OSI.process_kernel_time_ms
,OSI.process_user_time_ms
FROM sys.dm_os_sys_info OSI
INSERT INTO @timer(T_TEXT) VALUES('UNION ALL');
;WITH BASE_DATA(TAMC_ID,COLVAL) AS
(
SELECT
TC.TAMC_ID
,TC.COL01
FROM dbo.TBL_TEST_ACROSS_MULTI_COLUMN TC
UNION ALL
SELECT
TC.TAMC_ID
,TC.COL02
FROM dbo.TBL_TEST_ACROSS_MULTI_COLUMN TC
UNION ALL
SELECT
TC.TAMC_ID
,TC.COL03
FROM dbo.TBL_TEST_ACROSS_MULTI_COLUMN TC
UNION ALL
SELECT
TC.TAMC_ID
,TC.COL04
FROM dbo.TBL_TEST_ACROSS_MULTI_COLUMN TC
)
SELECT
@INT_BUCKET = BD.TAMC_ID
,@DATE_BUCKET =MAX(BD.COLVAL)
FROM BASE_DATA BD
GROUP BY BD.TAMC_ID;
INSERT INTO @timer(T_TEXT) VALUES('UNION ALL');
INSERT INTO @OS_INFO (STEP_NAME,ms_ticks,process_kernel_time_ms,process_user_time_ms)
SELECT
'UNION ALL' AS STEP_NAME
,OSI.ms_ticks
,OSI.process_kernel_time_ms
,OSI.process_user_time_ms
FROM sys.dm_os_sys_info OSI
INSERT INTO @timer(T_TEXT) VALUES('UNION ALL2');
;WITH BASE_DATA(TAMC_ID,COLVAL) AS
(
SELECT
TC.TAMC_ID
,TC.COL01
FROM dbo.TBL_TEST_ACROSS_MULTI_COLUMN TC
UNION ALL
SELECT
TC.TAMC_ID
,TC.COL02
FROM dbo.TBL_TEST_ACROSS_MULTI_COLUMN TC
UNION ALL
SELECT
TC.TAMC_ID
,TC.COL03
FROM dbo.TBL_TEST_ACROSS_MULTI_COLUMN TC
UNION ALL
SELECT
TC.TAMC_ID
,TC.COL04
FROM dbo.TBL_TEST_ACROSS_MULTI_COLUMN TC
)
SELECT
@INT_BUCKET = BD.TAMC_ID
,@DATE_BUCKET =MAX(BD.COLVAL)
FROM BASE_DATA BD
GROUP BY BD.TAMC_ID;
INSERT INTO @timer(T_TEXT) VALUES('UNION ALL2');
INSERT INTO @OS_INFO (STEP_NAME,ms_ticks,process_kernel_time_ms,process_user_time_ms)
SELECT
'UNION ALL2' AS STEP_NAME
,OSI.ms_ticks
,OSI.process_kernel_time_ms
,OSI.process_user_time_ms
FROM sys.dm_os_sys_info OSI
INSERT INTO @timer(T_TEXT) VALUES('BINARY CONCAT');
SELECT
@INT_BUCKET = t.TAMC_ID
,@CHAR_BUCKET = CAST(SUBSTRING(MAX(x.DateValue), 5, 4) AS VARCHAR(2))
,@DATE_BUCKET = CAST(SUBSTRING(MAX(x.DateValue), 1, 4) AS DATE)
FROM
dbo.TBL_TEST_ACROSS_MULTI_COLUMN t
CROSS APPLY (VALUES
(CAST(t.COL01 AS BINARY(4)) + CAST('c1' AS BINARY(4))),
(CAST(t.COL02 AS BINARY(4)) + CAST('c2' AS BINARY(4))),
(CAST(t.COL03 AS BINARY(4)) + CAST('c3' AS BINARY(4))),
(CAST(t.COL04 AS BINARY(4)) + CAST('c4' AS BINARY(4))))
x (DateValue)
GROUP BY
t.TAMC_ID;
INSERT INTO @timer(T_TEXT) VALUES('BINARY CONCAT');
INSERT INTO @OS_INFO (STEP_NAME,ms_ticks,process_kernel_time_ms,process_user_time_ms)
SELECT
'BINARY CONCAT' AS STEP_NAME
,OSI.ms_ticks
,OSI.process_kernel_time_ms
,OSI.process_user_time_ms
FROM sys.dm_os_sys_info OSI
INSERT INTO @timer(T_TEXT) VALUES('BINARY CONCAT2');
SELECT
@INT_BUCKET = t.TAMC_ID
,@CHAR_BUCKET = CAST(SUBSTRING(MAX(x.DateValue), 5, 4) AS VARCHAR(2))
,@DATE_BUCKET = CAST(SUBSTRING(MAX(x.DateValue), 1, 4) AS DATE)
FROM
dbo.TBL_TEST_ACROSS_MULTI_COLUMN t
CROSS APPLY (VALUES
(CAST(t.COL01 AS BINARY(4)) + CAST('c1' AS BINARY(4))),
(CAST(t.COL02 AS BINARY(4)) + CAST('c2' AS BINARY(4))),
(CAST(t.COL03 AS BINARY(4)) + CAST('c3' AS BINARY(4))),
(CAST(t.COL04 AS BINARY(4)) + CAST('c4' AS BINARY(4))))
x (DateValue)
GROUP BY
t.TAMC_ID;
INSERT INTO @timer(T_TEXT) VALUES('BINARY CONCAT2');
INSERT INTO @OS_INFO (STEP_NAME,ms_ticks,process_kernel_time_ms,process_user_time_ms)
SELECT
'BINARY CONCAT2' AS STEP_NAME
,OSI.ms_ticks
,OSI.process_kernel_time_ms
,OSI.process_user_time_ms
FROM sys.dm_os_sys_info OSI
INSERT INTO @timer(T_TEXT) VALUES('UNPIVOT');
SELECT
@INT_BUCKET = X.TAMC_ID
,@CHAR_BUCKET = X.L
,@DATE_BUCKET = X.Val
FROM
(
SELECT
dates.TAMC_ID,
dates.L,
dates.Val,
ROW_NUMBER() OVER (PARTITION BY dates.TAMC_ID ORDER BY dates.Val DESC) AS RID
FROM (
SELECT TAMC_ID,
COL01,
COL02,
COL03,
COL04
FROM dbo.TBL_TEST_ACROSS_MULTI_COLUMN
) dt UNPIVOT ( Val FOR L IN (COL01,COL02,COL03,COL04) ) AS dates
) AS X
WHERE X.RID = 1;
INSERT INTO @timer(T_TEXT) VALUES('UNPIVOT');
INSERT INTO @OS_INFO (STEP_NAME,ms_ticks,process_kernel_time_ms,process_user_time_ms)
SELECT
'UNPIVOT' AS STEP_NAME
,OSI.ms_ticks
,OSI.process_kernel_time_ms
,OSI.process_user_time_ms
FROM sys.dm_os_sys_info OSI
INSERT INTO @timer(T_TEXT) VALUES('UNPIVOT2');
SELECT
@INT_BUCKET = X.TAMC_ID
,@CHAR_BUCKET = X.L
,@DATE_BUCKET = X.Val
FROM
(
SELECT
dates.TAMC_ID,
dates.L,
dates.Val,
ROW_NUMBER() OVER (PARTITION BY dates.TAMC_ID ORDER BY dates.Val DESC) AS RID
FROM (
SELECT TAMC_ID,
COL01,
COL02,
COL03,
COL04
FROM dbo.TBL_TEST_ACROSS_MULTI_COLUMN
) dt UNPIVOT ( Val FOR L IN (COL01,COL02,COL03,COL04) ) AS dates
) AS X
WHERE X.RID = 1;
INSERT INTO @timer(T_TEXT) VALUES('UNPIVOT2');
INSERT INTO @OS_INFO (STEP_NAME,ms_ticks,process_kernel_time_ms,process_user_time_ms)
SELECT
'UNPIVOT2' AS STEP_NAME
,OSI.ms_ticks
,OSI.process_kernel_time_ms
,OSI.process_user_time_ms
FROM sys.dm_os_sys_info OSI
INSERT INTO @timer(T_TEXT) VALUES('NULL-PROOFED CASE');
SELECT @INT_BUCKET =TAMC_ID,
@DATE_BUCKET=CASE WHEN (COL01>=COL02 OR COL02 IS NULL) AND (COL01>=COL03 OR COL03 IS NULL) AND (COL01>=COL04 OR COL04 IS NULL) THEN COL01
WHEN (COL02>=COL03 OR COL03 IS NULL) AND (COL02>=COL04 OR COL04 IS NULL) THEN COL02
WHEN (COL03>=COL04 OR COL04 IS NULL) THEN COL03
ELSE COL04
END,
@CHAR_BUCKET=CASE WHEN (COL01>=COL02 OR COL02 IS NULL) AND (COL01>=COL03 OR COL03 IS NULL) AND (COL01>=COL04 OR COL04 IS NULL) THEN 'COL01'
WHEN (COL02>=COL03 OR COL03 IS NULL) AND (COL02>=COL04 OR COL04 IS NULL) THEN 'COL02'
WHEN (COL03>=COL04 OR COL04 IS NULL) THEN 'COL03'
ELSE 'COL04'
END
FROM dbo.TBL_TEST_ACROSS_MULTI_COLUMN;
INSERT INTO @timer(T_TEXT) VALUES('NULL-PROOFED CASE');
INSERT INTO @OS_INFO (STEP_NAME,ms_ticks,process_kernel_time_ms,process_user_time_ms)
SELECT
'NULL-PROOFED CASE' AS STEP_NAME
,OSI.ms_ticks
,OSI.process_kernel_time_ms
,OSI.process_user_time_ms
FROM sys.dm_os_sys_info OSI;
INSERT INTO @timer(T_TEXT) VALUES('NULL-PROOFED CASE2');
SELECT @INT_BUCKET =TAMC_ID,
@DATE_BUCKET=CASE WHEN (COL01>=COL02 OR COL02 IS NULL) AND (COL01>=COL03 OR COL03 IS NULL) AND (COL01>=COL04 OR COL04 IS NULL) THEN COL01
WHEN (COL02>=COL03 OR COL03 IS NULL) AND (COL02>=COL04 OR COL04 IS NULL) THEN COL02
WHEN (COL03>=COL04 OR COL04 IS NULL) THEN COL03
ELSE COL04
END,
@CHAR_BUCKET=CASE WHEN (COL01>=COL02 OR COL02 IS NULL) AND (COL01>=COL03 OR COL03 IS NULL) AND (COL01>=COL04 OR COL04 IS NULL) THEN 'COL01'
WHEN (COL02>=COL03 OR COL03 IS NULL) AND (COL02>=COL04 OR COL04 IS NULL) THEN 'COL02'
WHEN (COL03>=COL04 OR COL04 IS NULL) THEN 'COL03'
ELSE 'COL04'
END
FROM dbo.TBL_TEST_ACROSS_MULTI_COLUMN;
INSERT INTO @timer(T_TEXT) VALUES('NULL-PROOFED CASE2');
INSERT INTO @OS_INFO (STEP_NAME,ms_ticks,process_kernel_time_ms,process_user_time_ms)
SELECT
'NULL-PROOFED CASE2' AS STEP_NAME
,OSI.ms_ticks
,OSI.process_kernel_time_ms
,OSI.process_user_time_ms
FROM sys.dm_os_sys_info OSI;
INSERT INTO @timer(T_TEXT) VALUES('MAX CROSS');
SELECT
@INT_BUCKET = TC.TAMC_ID
,@DATE_BUCKET = XC.MXD
FROM dbo.TBL_TEST_ACROSS_MULTI_COLUMN TC
CROSS APPLY
(
SELECT
MAX(MXCOL.XDATE)
FROM
(
SELECT TC.COL01 UNION ALL
SELECT TC.COL02 UNION ALL
SELECT TC.COL03 UNION ALL
SELECT TC.COL04
) AS MXCOL(XDATE)
)AS XC(MXD);
INSERT INTO @timer(T_TEXT) VALUES('MAX CROSS');
INSERT INTO @OS_INFO (STEP_NAME,ms_ticks,process_kernel_time_ms,process_user_time_ms)
SELECT
'MAX CROSS' AS STEP_NAME
,OSI.ms_ticks
,OSI.process_kernel_time_ms
,OSI.process_user_time_ms
FROM sys.dm_os_sys_info OSI;
INSERT INTO @timer(T_TEXT) VALUES('MAX CROSS2');
SELECT
@INT_BUCKET = TC.TAMC_ID
,@DATE_BUCKET = XC.MXD
FROM dbo.TBL_TEST_ACROSS_MULTI_COLUMN TC
CROSS APPLY
(
SELECT
MAX(MXCOL.XDATE)
FROM
(
SELECT TC.COL01 UNION ALL
SELECT TC.COL02 UNION ALL
SELECT TC.COL03 UNION ALL
SELECT TC.COL04
) AS MXCOL(XDATE)
)AS XC(MXD);
INSERT INTO @timer(T_TEXT) VALUES('MAX CROSS2');
INSERT INTO @OS_INFO (STEP_NAME,ms_ticks,process_kernel_time_ms,process_user_time_ms)
SELECT
'MAX CROSS2' AS STEP_NAME
,OSI.ms_ticks
,OSI.process_kernel_time_ms
,OSI.process_user_time_ms
FROM sys.dm_os_sys_info OSI;
;WITH BASE_RESULT AS
(
SELECT
T.T_TEXT
,DATEDIFF(MICROSECOND,MIN(T.T_TS),MAX(T.T_TS)) AS DURATION
FROM @timer T
GROUP BY T.T_TEXT
)
,OS_INFO AS
(
SELECT
OT.STEP_NAME
,OT.ms_ticks - OI.ms_ticks AS MS_TICKS
,OT.process_kernel_time_ms - OI.process_kernel_time_ms AS PROC_KERNEL_MS
,OT.process_user_time_ms - OI.process_user_time_ms AS USER_MS
FROM @OS_INFO OI
INNER JOIN @OS_INFO OT
ON OI.S_ID = OT.S_ID - 1
)
SELECT
BR.T_TEXT
,BR.DURATION
,OI.MS_TICKS
,OI.USER_MS
,OI.PROC_KERNEL_MS
FROM BASE_RESULT BR
INNER JOIN OS_INFO OI
ON BR.T_TEXT = OI.STEP_NAME
ORDER BY BR.DURATION;
Results
T_TEXT DURATION MS_TICKS USER_MS PROC_KERNEL_MS
-------------------- ----------- -------------------- -------------------- ----------------
DRY RUN2 216013 216 203 0
DRY RUN 219012 220 218 0
MAX CROSS 577033 577 577 0
MAX CROSS2 577033 576 577 0
TOP VALUES2 664038 664 2106 0
NULL-PROOFED CASE 728042 728 733 0
NULL-PROOFED CASE2 736042 737 733 0
MAX GROUP2 863050 862 858 0
MAX GROUP 867049 868 873 0
DOUBLE CROSS 1132064 1132 1139 0
DOUBLE CROSS2 1152066 1153 1154 0
BINARY CONCAT2 1440082 1441 1435 0
BINARY CONCAT 1447083 1448 1451 0
UNION ALL2 1459083 1459 1466 0
UNPIVOT2 1465084 1465 4790 16
UNPIVOT 1485085 1485 4758 0
UNION ALL 1634094 1634 1623 0
TOP VALUES 2501143 2501 3807 46
January 20, 2016 at 1:58 pm
If all we care about is the max date for each ID, it should be close between those and the CASE, if the CASE version is also modified to only care about the max date.
Just for fun I made the following changes:
1) Changed the basic CASE to use ISNULL for NULL-proofing, since that seemed to perform better.
2) Added a copy of the query from #1, but without the CASE that determines column source as CASE DATE-ONLY
3) Made the modification I mentioned earlier so that the CASE only has to do n-1 comparisons instead of (n2-n)/2 in the worst case. For that I included both a date-only and a date+column source version.
#3 above makes the code hard to follow, especially if we ever need more than 4 columns to be compared, but I figured I'd include it for completeness' sake.
Here were the results on my machine:
T_TEXT DURATION MS_TICKS USER_MS PROC_KERNEL_MS
-------------------- ----------- --------- --------- --------------
DRY RUN 301027 302 313 0
N-1 CASE DATE-ONLY 504044 505 500 0
MAX CROSS 561017 561 1110 0
CASE DATE-ONLY 574986 575 579 0
N-1 CASE 719096 719 718 0
CASE 895144 895 890 0
DOUBLE CROSS 1293120 1293 2422 0
MAX GROUP 1678134 1678 1672 0
BINARY CONCAT 2142189 2142 2156 0
UNION ALL 2316205 2316 2297 0
TOP VALUES 3445320 3445 4390 16
UNPIVOT 5376484 5377 8563 172
Here are the CASE versions used:
INSERT INTO @timer(T_TEXT) VALUES('CASE');
SELECT @INT_BUCKET =TAMC_ID,
@DATE_BUCKET=CASE WHEN COL01>=ISNULL(COL02,'00010101') AND COL01>=ISNULL(COL03,'00010101') AND COL01>=ISNULL(COL04,'00010101') THEN COL01
WHEN COL02>=ISNULL(COL03,'00010101') AND COL02>=ISNULL(COL04,'00010101') THEN COL02
WHEN COL03>=ISNULL(COL04,'00010101') THEN COL03
ELSE COL04
END,
@CHAR_BUCKET=CASE WHEN COL01>=ISNULL(COL02,'00010101') AND COL01>=ISNULL(COL03,'00010101') AND COL01>=ISNULL(COL04,'00010101') THEN 'COL01'
WHEN COL02>=ISNULL(COL03,'00010101') AND COL02>=ISNULL(COL04,'00010101') THEN 'COL02'
WHEN COL03>=ISNULL(COL04,'00010101') THEN 'COL03'
ELSE 'COL04'
END
FROM dbo.TBL_TEST_ACROSS_MULTI_COLUMN;
INSERT INTO @timer(T_TEXT) VALUES('CASE');
INSERT INTO @OS_INFO (STEP_NAME,ms_ticks,process_kernel_time_ms,process_user_time_ms)
SELECT
'CASE' AS STEP_NAME
,OSI.ms_ticks
,OSI.process_kernel_time_ms
,OSI.process_user_time_ms
FROM sys.dm_os_sys_info OSI
INSERT INTO @timer(T_TEXT) VALUES('CASE DATE-ONLY');
SELECT @INT_BUCKET =TAMC_ID,
@DATE_BUCKET=CASE WHEN COL01>=ISNULL(COL02,'00010101') AND COL01>=ISNULL(COL03,'00010101') AND COL01>=ISNULL(COL04,'00010101') THEN COL01
WHEN COL02>=ISNULL(COL03,'00010101') AND COL02>=ISNULL(COL04,'00010101') THEN COL02
WHEN COL03>=ISNULL(COL04,'00010101') THEN COL03
ELSE COL04
END
FROM dbo.TBL_TEST_ACROSS_MULTI_COLUMN;
INSERT INTO @timer(T_TEXT) VALUES('CASE DATE-ONLY');
INSERT INTO @OS_INFO (STEP_NAME,ms_ticks,process_kernel_time_ms,process_user_time_ms)
SELECT
'CASE DATE-ONLY' AS STEP_NAME
,OSI.ms_ticks
,OSI.process_kernel_time_ms
,OSI.process_user_time_ms
FROM sys.dm_os_sys_info OSI;
INSERT INTO @timer(T_TEXT) VALUES('N-1 CASE DATE-ONLY');
SELECT @INT_BUCKET=TAMC_ID,
@DATE_BUCKET=CASE WHEN COL01>=ISNULL(COL02,'00010101')
THEN CASE WHEN COL01>=ISNULL(COL03,'00010101')
THEN CASE WHEN COL01>=ISNULL(COL04,'00010101')
THEN COL01
ELSE COL04
END
ELSE CASE WHEN COL03>=ISNULL(COL04,'00010101')
THEN COL03
ELSE COL04
END
END
ELSE CASE WHEN COL02>=ISNULL(COL03,'00010101')
THEN CASE WHEN COL02>=ISNULL(COL04,'00010101')
THEN COL02
ELSE COL04
END
ELSE CASE WHEN COL03>=ISNULL(COL04,'00010101')
THEN COL03
ELSE COL04
END
END
END
FROM dbo.TBL_TEST_ACROSS_MULTI_COLUMN;
INSERT INTO @timer(T_TEXT) VALUES('N-1 CASE DATE-ONLY');
INSERT INTO @OS_INFO (STEP_NAME,ms_ticks,process_kernel_time_ms,process_user_time_ms)
SELECT
'N-1 CASE DATE-ONLY' AS STEP_NAME
,OSI.ms_ticks
,OSI.process_kernel_time_ms
,OSI.process_user_time_ms
FROM sys.dm_os_sys_info OSI;
INSERT INTO @timer(T_TEXT) VALUES('N-1 CASE');
SELECT @INT_BUCKET=TAMC_ID,
@DATE_BUCKET=CASE WHEN COL01>=ISNULL(COL02,'00010101')
THEN CASE WHEN COL01>=ISNULL(COL03,'00010101')
THEN CASE WHEN COL01>=ISNULL(COL04,'00010101')
THEN COL01
ELSE COL04
END
ELSE CASE WHEN COL03>=ISNULL(COL04,'00010101')
THEN COL03
ELSE COL04
END
END
ELSE CASE WHEN COL02>=ISNULL(COL03,'00010101')
THEN CASE WHEN COL02>=ISNULL(COL04,'00010101')
THEN COL02
ELSE COL04
END
ELSE CASE WHEN COL03>=ISNULL(COL04,'00010101')
THEN COL03
ELSE COL04
END
END
END,
@CHAR_BUCKET=CASE WHEN COL01>=ISNULL(COL02,'00010101')
THEN CASE WHEN COL01>=ISNULL(COL03,'00010101')
THEN CASE WHEN COL01>=ISNULL(COL04,'00010101')
THEN 'COL01'
ELSE 'COL04'
END
ELSE CASE WHEN COL03>=ISNULL(COL04,'00010101')
THEN 'COL03'
ELSE 'COL04'
END
END
ELSE CASE WHEN COL02>=ISNULL(COL03,'00010101')
THEN CASE WHEN COL02>=ISNULL(COL04,'00010101')
THEN 'COL02'
ELSE 'COL04'
END
ELSE CASE WHEN COL03>=ISNULL(COL04,'00010101')
THEN 'COL03'
ELSE 'COL04'
END
END
END
FROM dbo.TBL_TEST_ACROSS_MULTI_COLUMN;
INSERT INTO @timer(T_TEXT) VALUES('N-1 CASE');
INSERT INTO @OS_INFO (STEP_NAME,ms_ticks,process_kernel_time_ms,process_user_time_ms)
SELECT
'N-1 CASE' AS STEP_NAME
,OSI.ms_ticks
,OSI.process_kernel_time_ms
,OSI.process_user_time_ms
FROM sys.dm_os_sys_info OSI;
One interesting thing that points back to some of this being very machine-specific is that in my runs, the MAX CROSS version was staying competitive with the DATE-ONLY CASEs for duration, but it was exploiting parallelism and using both cores (yeah, my test server has a whopping 2 cores :hehe:). With MAXDOP 1 set, it actually got edged out by all of the CASE versions, even the ones that also returned the column source.
Here are the MAXDOP 1 results (including the old NULL-PROOFED CASE that used OR IS NULL instead of ISNULL):
T_TEXT DURATION MS_TICKS USER_MS PROC_KERNEL_MS
-------------------- ----------- --------- -------- --------------
DRY RUN 318027 318 328 0
N-1 CASE DATE-ONLY 474046 474 453 0
CASE DATE-ONLY 550050 551 547 0
N-1 CASE 731075 731 719 0
CASE 854072 854 860 0
MAX CROSS 899074 900 906 0
NULL-PROOFED CASE 1102096 1102 1125 0
DOUBLE CROSS 1586143 1586 1594 0
MAX GROUP 1638144 1638 1640 0
BINARY CONCAT 2132189 2133 2125 0
UNION ALL 2361208 2361 2359 0
TOP VALUES 4470401 4472 4407 16
UNPIVOT 7019626 7020 6453 94
As evidenced by the difference between the old NULL-PROOFED CASE and the new CASE that uses ISNULL, change #1 above seems to be a big part of the reason all the CASEs did better this run. As you pointed out, throwing in all those ORs really complicated things.
The worst-case optimization from (n2-n)/2 comparison to n-1 also had a noticeable effect, but not quite as large an effect (in absolute terms, at least; percentage-wise the improvement is close) as switching from OR IS NULL to ISNULL.
Cheers!
January 20, 2016 at 2:27 pm
Jacob Wilkins (1/20/2016)
If all we care about is the max date for each ID, it should be close between those and the CASE, if the CASE version is also modified to only care about the max date.Just for fun I made the following changes:
1) Changed the basic CASE to use ISNULL for NULL-proofing, since that seemed to perform better.
2) Added a copy of the query from #1, but without the CASE that determines column source as CASE DATE-ONLY
3) Made the modification I mentioned earlier so that the CASE only has to do n-1 comparisons instead of (n2-n)/2 in the worst case. For that I included both a date-only and a date+column source version.
#3 above makes the code hard to follow, especially if we ever need more than 4 columns to be compared, but I figured I'd include it for completeness' sake.
Good job again Jacob, will test on different hardware when I have the time. You are absolutely right on the point that the results are hardware specific, but not only that, also the build and the setup of the SQL Server. There is no OSFA, that's the core lesson taken from this exercise.
😎
January 20, 2016 at 2:33 pm
You could use something like this to just find the max date of those columns, it won't get you the column name though. I found this on another site, stackoverflow.com. I think this has to be 2008 and greater.
http://stackoverflow.com/questions/71022/sql-max-of-multiple-columns
SELECT ID,
(SELECT MAX(v)
FROM (VALUES (L1), (L2), (L3)) AS VALUE(v)) AS Max_Date
FROM #datetest
-------------------------------------------------------------
we travel not to escape life but for life not to escape us
Don't fear failure, fear regret.
January 20, 2016 at 2:48 pm
below86 (1/20/2016)
You could use something like this to just find the max date of those columns, it won't get you the column name though. I found this on another site, stackoverflow.com. I think this has to be 2008 and greater.http://stackoverflow.com/questions/71022/sql-max-of-multiple-columns
SELECT ID,
(SELECT MAX(v)
FROM (VALUES (L1), (L2), (L3)) AS VALUE(v)) AS Max_Date
FROM #datetest
Thanks! You just proved that parallel universes do exist, this thread is almost a copy of that thread (which means that we guys are somewhat 7-8 years behind)
😎
This query (using the posted harness)
SELECT TC.TAMC_ID,
(SELECT MAX(v)
FROM (VALUES (TC.COL01), (TC.COL02), (TC.COL03),(TC.COL04)) AS VALUE(v)) AS Max_Date
FROM dbo.TBL_TEST_ACROSS_MULTI_COLUMN TC
is on my system getting exactly the same execution plan and runs for exactly the same time as
SELECT
TC.TAMC_ID
,XC.MXD
FROM dbo.TBL_TEST_ACROSS_MULTI_COLUMN TC
CROSS APPLY
(
SELECT
MAX(MXCOL.XDATE)
FROM
(
SELECT TC.COL01 UNION ALL
SELECT TC.COL02 UNION ALL
SELECT TC.COL03 UNION ALL
SELECT TC.COL04
) AS MXCOL(XDATE)
)AS XC(MXD);
January 20, 2016 at 2:48 pm
Here you go. It's entirely possibly for multiple rows, or multiple columns within the same row, to tie for having highest date, so it will potentially return multiple rows in the result. If a tie breaker is needed to return a single row, then you can modify the ranking like so: (order by MaxDate desc, ID desc).
select id, MaxDate, Col from
(
select id, MaxDate, Col
, dense_rank() over (order by MaxDate desc) rank_id
from
(
select id, L1 as MaxDate, 'L1' as Col from datetest
union all
select id, L2 as MaxDate, 'L2' as Col from datetest
union all
select id, L3 as MaxDate, 'L3' as Col from datetest
) X
) Y
where rank_id = 1;
id MaxDate Col
-------- ---------- ----
1005 2016-01-17 L2
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
January 20, 2016 at 2:57 pm
Eric M Russell (1/20/2016)
Here you go. It's entirely possibly for multiple rows, or multiple columns within the same row, to tie for having highest date, so it will potentially return multiple rows in the result. If a tie breaker is needed to return a single row, then you can modify the ranking like so: (order by MaxDate desc, ID desc).
select id, MaxDate, Col from
(
select id, MaxDate, Col
, dense_rank() over (order by MaxDate desc) rank_id
from
(
select id, L1 as MaxDate, 'L1' as Col from datetest
union all
select id, L2 as MaxDate, 'L2' as Col from datetest
union all
select id, L3 as MaxDate, 'L3' as Col from datetest
) X
) Y
where rank_id = 1;
id MaxDate Col
-------- ---------- ----
1005 2016-01-17 L2
Quite impressive, 28 operators in the execution plan, parallel execution, four full table scans and I'm now putting out the fire ignited by the CPU when running this query:-P
😎
January 21, 2016 at 7:14 am
Eirikur Eiriksson (1/20/2016)
......
Quite impressive, 28 operators in the execution plan, parallel execution, four full table scans and I'm now putting out the fire ignited by the CPU when running this query:-P
😎
Oh, yeah, well yours isn't returning the name of the column containing the max date value. :satisfied:
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
January 22, 2016 at 1:26 am
Eric M Russell (1/21/2016)
Eirikur Eiriksson (1/20/2016)
......
Quite impressive, 28 operators in the execution plan, parallel execution, four full table scans and I'm now putting out the fire ignited by the CPU when running this query:-P
😎
Oh, yeah, well yours isn't returning the name of the column containing the max date value. :satisfied:
It does and 100 times faster than yours:-D
😎
Viewing 14 posts - 31 through 43 (of 43 total)
You must be logged in to reply to this topic. Login to reply