January 18, 2016 at 1:55 pm
I have ten user defined date fields, but for simplicity let's focus on three. These are only dates, not datetime. They're labeled L1, L2, and L3 and contain a date. They are manual entry so the date can be manually changed by the user. I need to find the latest date and the column header it belongs to.
CREATE TABLE datetest (id varchar(8), L1 date, L2 date, L3 date)
INSERT INTO datetest (id, L1, L2, L3)
VALUES ('1005', '1-1-16', '1-17-16', '1-10-16')
What I want to see is three columns: id, MAX date, Column it comes from. In the case above, I'd hope for:
1005, 1-17-16, L2
January 18, 2016 at 2:00 pm
You can use a CASE statement. It'll be messy, but SQL doesn't have a 'max across columns' function.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 18, 2016 at 2:25 pm
I assume you mean like this? I appreciate the guidance.
SELECT id
, CASE
WHEN L1 > L2 AND L1 > L3 THEN L1
WHEN L2 > L1 AND L2 > L3 THEN L2
ELSE L3
END AS 'Date'
, CASE
WHEN L1 > L2 AND L1 > L3 THEN 'L1'
WHEN L2 > L1 AND L2 > L3 THEN 'L2'
ELSE 'L3'
END AS 'L'
FROM datetest
January 18, 2016 at 4:59 pm
You can also use APPLY
SELECT MA.MaxDate
FROM <mytable> AS MT
CROSS APPLY (
SELECT MAX(VA.LDate)
FROM (VALUES(MT.L1),(MT.L2),(MT.L3)....) VA(LDate)
) AS MA(MaxDate)
Excuse any slight syntax errors, I have typed this away from SQL.
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
January 18, 2016 at 5:01 pm
Just noticed "and the column header it belongs to", so:
SELECT MA.MaxDate, MA.ColName
FROM <mytable> AS MT
CROSS APPLY (
SELECT TOP(1) VA.LName, VA.LDate
FROM (VALUES('L1',MT.L1),('L2',MT.L2),('L3',MT.L3)....) VA(LName, LDate)
ORDER BY VA.LDate DESC
) AS MA(ColName, MaxDate)
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
January 18, 2016 at 9:03 pm
@mister.magoo, never seen APPLY used with a row-ctor to unpivot like that. Very cool.
Just for the sake of providing an alternative, we can do the same using a traditional UNPIVOT:
SELECT TOP 1
dates.id,
dates.L,
dates.Val
FROM (
SELECT id,
L1,
L2,
L3
FROM dbo.datetest
) dt UNPIVOT ( Val FOR L IN (L1, L2, L3) ) AS dates
ORDER BY dates.Val DESC;
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
January 18, 2016 at 9:17 pm
Just for the fun of it... Here's a solution that uses a cast to binary & contaminate method...
Test data...
IF OBJECT_ID('tempdb..#temp', 'U') IS NOT NULL
DROP TABLE #temp;
CREATE TABLE #temp (
id INT PRIMARY KEY,
c1 DATE,
c2 DATE,
c3 DATE,
c4 DATE,
c5 DATE
);
INSERT #temp (id,c1,c2,c3,c4,c5)VALUES
(1, '2014-01-05','2014-06-03','2015-12-31','2015-06-01','2015-01-08'),
(2, '2015-01-05','2014-07-08','2014-12-31','2016-06-07','2015-02-02'),
(3, '2016-01-15','2014-04-03','2014-12-21','2015-07-01','2014-01-05'),
(4, '2014-01-05','2014-06-13','2045-12-30','2014-06-11','2015-11-28'),
(5, '2015-11-05','2015-06-03','2014-12-22','2014-11-01','2015-02-18');
And the solution...
SELECT
t.id,
ColumnName = CAST(SUBSTRING(MAX(x.DateValue), 5, 4) AS VARCHAR(2)),
DateValue = CAST(SUBSTRING(MAX(x.DateValue), 1, 4) AS DATE)
FROM
#temp t
CROSS APPLY (VALUES
(CAST(t.c1 AS BINARY(4)) + CAST('c1' AS BINARY(4))),
(CAST(t.c2 AS BINARY(4)) + CAST('c2' AS BINARY(4))),
(CAST(t.c3 AS BINARY(4)) + CAST('c3' AS BINARY(4))),
(CAST(t.c4 AS BINARY(4)) + CAST('c4' AS BINARY(4))),
(CAST(t.c5 AS BINARY(4)) + CAST('c5' AS BINARY(4))))
x (DateValue)
GROUP BY
t.id;
January 18, 2016 at 10:28 pm
Orlando Colamatteo (1/18/2016)
@mister.magoo, never seen APPLY used with a row-ctor to unpivot like that. Very cool.Just for the sake of providing an alternative, we can do the same using a traditional UNPIVOT:
SELECT TOP 1
dates.id,
dates.L,
dates.Val
FROM (
SELECT id,
L1,
L2,
L3
FROM dbo.datetest
) dt UNPIVOT ( Val FOR L IN (L1, L2, L3) ) AS dates
ORDER BY dates.Val DESC;
This doesn't work as the top 1 limits the output to a single row regardless of how many rows are in the table.
😎
January 18, 2016 at 10:49 pm
Jason A. Long (1/18/2016)
Just for the fun of it... Here's a solution that uses a cast to binary & contaminate method...Test data...
IF OBJECT_ID('tempdb..#temp', 'U') IS NOT NULL
DROP TABLE #temp;
CREATE TABLE #temp (
id INT PRIMARY KEY,
c1 DATE,
c2 DATE,
c3 DATE,
c4 DATE,
c5 DATE
);
INSERT #temp (id,c1,c2,c3,c4,c5)VALUES
(1, '2014-01-05','2014-06-03','2015-12-31','2015-06-01','2015-01-08'),
(2, '2015-01-05','2014-07-08','2014-12-31','2016-06-07','2015-02-02'),
(3, '2016-01-15','2014-04-03','2014-12-21','2015-07-01','2014-01-05'),
(4, '2014-01-05','2014-06-13','2045-12-30','2014-06-11','2015-11-28'),
(5, '2015-11-05','2015-06-03','2014-12-22','2014-11-01','2015-02-18');
And the solution...
SELECT
t.id,
ColumnName = CAST(SUBSTRING(MAX(x.DateValue), 5, 4) AS VARCHAR(2)),
DateValue = CAST(SUBSTRING(MAX(x.DateValue), 1, 4) AS DATE)
FROM
#temp t
CROSS APPLY (VALUES
(CAST(t.c1 AS BINARY(4)) + CAST('c1' AS BINARY(4))),
(CAST(t.c2 AS BINARY(4)) + CAST('c2' AS BINARY(4))),
(CAST(t.c3 AS BINARY(4)) + CAST('c3' AS BINARY(4))),
(CAST(t.c4 AS BINARY(4)) + CAST('c4' AS BINARY(4))),
(CAST(t.c5 AS BINARY(4)) + CAST('c5' AS BINARY(4))))
x (DateValue)
GROUP BY
t.id;
Nice idea but I see two potential problems, firstly the costly concatenation and substringing with the data type conversion is going to impact the performance probably more than what is gained by the potential stream aggregation in the execution plan. Secondly because of concatenation it is very unlikely to get a parallel plan which probably makes it perform similar to a normal union all for all columns. There are two flavours of unpivoting using cross apply, values and union all, the difference is that the former most likely goes parallel and the latter never goes parallel.
😎
Edit: Correction
January 18, 2016 at 11:59 pm
Eirikur Eiriksson (1/18/2016)
Orlando Colamatteo (1/18/2016)
@mister.magoo, never seen APPLY used with a row-ctor to unpivot like that. Very cool.Just for the sake of providing an alternative, we can do the same using a traditional UNPIVOT:
SELECT TOP 1
dates.id,
dates.L,
dates.Val
FROM (
SELECT id,
L1,
L2,
L3
FROM dbo.datetest
) dt UNPIVOT ( Val FOR L IN (L1, L2, L3) ) AS dates
ORDER BY dates.Val DESC;
This doesn't work as the top 1 limits the output to a single row regardless of how many rows are in the table.
😎
My query works for a single row, i.e. on the OP's test data. It wants a WHERE-clause to limit what goes in to a single row.
If you needed something to work on a set you could partition and order the rows:
SELECT up.id,
up.L,
up.Val
FROM (
SELECT dates.id,
dates.L,
dates.Val,
ROW_NUMBER() OVER (PARTITION BY dates.id ORDER BY dates.Val DESC) AS uprank
FROM (
SELECT id,
L1,
L2,
L3
FROM dbo.datetest
) dt UNPIVOT ( Val FOR L IN (L1, L2, L3) ) AS dates
) up
WHERE up.uprank = 1;
I tried both of these on 1MM rows and as I suspected the CROSS APPLY method was quite a bit faster and more efficient on CPU time too.
CROSS APPLY:
Table 'datetest'. Scan count 5, logical reads 3448, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1891 ms, elapsed time = 1343 ms.
(1000000 row(s) affected)
UNPIVOT+ROW_NUMBER:
Table 'datetest'. Scan count 5, logical reads 3448, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 10154 ms, elapsed time = 2970 ms.
(1000000 row(s) affected)
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
January 19, 2016 at 7:01 am
Eirikur Eiriksson (1/18/2016)
Nice idea but I see two potential problems, firstly the costly concatenation and substringing with the data type conversion is going to impact the performance probably more than what is gained by the potential stream aggregation in the execution plan. Secondly because of concatenation it is very unlikely to get a parallel plan which probably makes it perform similar to a normal union all for all columns. There are two flavours of unpivoting using cross apply, values and union all, the difference is that the former most likely goes parallel and the latter never goes parallel.😎
Edit: Correction
I can't comment on the above without testing on a larger data set. That said, the "top 1 / order by" method has a sort operator on an expression and the "binary concatenation" method puts the sort on the ID column... Which, in turn, means the sort operator can be eliminated by putting a clustered index on the ID column.
My thought (not proven w/o testing) is that the elimination sort operation should give the "binary concatenation" method the edge.
January 19, 2016 at 7:51 am
Jason A. Long (1/19/2016)
Eirikur Eiriksson (1/18/2016)
Nice idea but I see two potential problems, firstly the costly concatenation and substringing with the data type conversion is going to impact the performance probably more than what is gained by the potential stream aggregation in the execution plan. Secondly because of concatenation it is very unlikely to get a parallel plan which probably makes it perform similar to a normal union all for all columns. There are two flavours of unpivoting using cross apply, values and union all, the difference is that the former most likely goes parallel and the latter never goes parallel.😎
Edit: Correction
I can't comment on the above without testing on a larger data set. That said, the "top 1 / order by" method has a sort operator on an expression and the "binary concatenation" method puts the sort on the ID column... Which, in turn, means the sort operator can be eliminated by putting a clustered index on the ID column.
My thought (not proven w/o testing) is that the elimination sort operation should give the "binary concatenation" method the edge.
The binary method run my 1MM row test table without a CI is a bit slower than the CROSS APPLY:
Table 'datetest'. Scan count 5, logical reads 3448, physical reads 0, read-ahead reads 2850, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 5328 ms, elapsed time = 1832 ms.
(1000000 row(s) affected)
With a CI, the binary method still ends up a bit slower on elapsed time than the CROSS APPLY with or without the same CI but does better on CPU time:
Table 'datetest'. Scan count 1, logical reads 3464, physical reads 0, read-ahead reads 17, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1515 ms, elapsed time = 1532 ms.
The CROSS APPLY results again:
Table 'datetest'. Scan count 5, logical reads 3448, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1891 ms, elapsed time = 1343 ms.
My lowly entry using UNPIVOT+ROW_NUMBER still lags behind the others but it benefited the most from the CI:
Table 'Worktable'. Scan count 1, logical reads 2000005, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'datetest'. Scan count 1, logical reads 3464, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 2562 ms, elapsed time = 2565 ms.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
January 19, 2016 at 8:03 am
Thanks for testing Orlando.
I wonder how the performance of each method will depend on how many "date columns" are present as well, whether having 20 columns vs 10 columns is an issue?
Not suggesting anyone bother to test that 🙂
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
January 19, 2016 at 8:29 am
The timings I presented were for 3 column 1MM rows. I wondered the same and did 10 to see how it did and the placing came out the same:
cross apply:
Table 'datetest'. Scan count 5, logical reads 6271, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 3842 ms, elapsed time = 2719 ms.
binary:
Table 'datetest'. Scan count 5, logical reads 6255, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 5296 ms, elapsed time = 3524 ms.
unpivot:
Table 'Worktable'. Scan count 1, logical reads 2000019, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'datetest'. Scan count 1, logical reads 6197, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 5500 ms, elapsed time = 5515 ms.
Maybe tonight I will work up a CASE-expression for this just to see.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
January 19, 2016 at 8:50 am
Just a cautionary note on the CASE version: be aware of how your implementation handles ties. If you're not careful with the CASE, then you can get incorrect results, as in this example.
SELECT ID=1,
L1='20151201',
L2='20151201',
L3='19000101'
INTO #temp;
SELECT id
, CASE
WHEN L1 > L2 AND L1 > L3 THEN L1
WHEN L2 > L1 AND L2 > L3 THEN L2
ELSE L3
END AS 'Date'
, CASE
WHEN L1 > L2 AND L1 > L3 THEN 'L1'
WHEN L2 > L1 AND L2 > L3 THEN 'L2'
ELSE 'L3'
END AS 'L'
FROM #temp;
DROP TABLE #temp;
A quick fix would be to change > to >= in the checks. Just be aware that if multiple columns are tied, then you'll always get the first of the ties in the order you evaluate them. That might not matter, or you might want to make sure you return a particular one of the ties, in which case you would need additional modifications.
The CASE statement's branches get evaluated in order, so you can also simplify a bit. If the second branch is being evaluated (after switching to >=), you know L1 isn't even a tie for the highest value, so you don't need to compare L2 to it. If the branch with L3's comparisons is being evaluated, you know that neither L1 nor L2 were even ties for the highest date, so L3 doesn't have to be compared to them, and so on down the branches (so your CASE statements end up being triangular looking affairs).
Here's what it would look like for Jason's sample data:
SELECT id,
CASE WHEN c1>=c2 AND c1>=c3 AND c1>=c4 and c1>=c5 THEN c1
WHEN c2>=c3 AND c2>=c4 AND c2>=c5 THEN c2
WHEN c3>=c4 AND c3>=c5 THEN c3
WHEN c4>=c5 THEN c4
ELSE c5
END,
CASE WHEN c1>=c2 AND c1>=c3 AND c1>=c4 and c1>=c5 THEN 'c1'
WHEN c2>=c3 AND c2>=c4 AND c2>=c5 THEN 'c2'
WHEN c3>=c4 AND c3>=c5 THEN 'c3'
WHEN c4>=c5 THEN 'c4'
ELSE 'c5'
END
FROM #temp
Cheers!
Viewing 15 posts - 1 through 15 (of 43 total)
You must be logged in to reply to this topic. Login to reply