Find Latest Date from Multiple Columns

Question

Post reply

Find Latest Date from Multiple Columns

jcobb 20350

Default port

Points: 1438
More actions
January 18, 2016 at 1:55 pm

#315117

I have ten user defined date fields, but for simplicity let's focus on three. These are only dates, not datetime. They're labeled L1, L2, and L3 and contain a date. They are manual entry so the date can be manually changed by the user. I need to find the latest date and the column header it belongs to.
CREATE TABLE datetest (id varchar(8), L1 date, L2 date, L3 date)
INSERT INTO datetest (id, L1, L2, L3)
VALUES ('1005', '1-1-16', '1-17-16', '1-10-16')
What I want to see is three columns: id, MAX date, Column it comes from. In the case above, I'd hope for:
1005, 1-17-16, L2
Forum Etiquette: How to post Reporting Services problems

[/url]

Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
[/url]

How to Post Performance Problems - by Gail Shaw
[/url]

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 1

You can use a CASE statement. It'll be messy, but SQL doesn't have a 'max across columns' function.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

jcobb 20350 Default port Points: 1438 More actions · Answer 2

I assume you mean like this? I appreciate the guidance.

SELECT id

, CASE

WHEN L1 > L2 AND L1 > L3 THEN L1

WHEN L2 > L1 AND L2 > L3 THEN L2

ELSE L3

END AS 'Date'

, CASE

WHEN L1 > L2 AND L1 > L3 THEN 'L1'

WHEN L2 > L1 AND L2 > L3 THEN 'L2'

ELSE 'L3'

END AS 'L'

FROM datetest

mister.magoo SSC-Forever Points: 47174 More actions · Answer 3

You can also use APPLY

SELECT MA.MaxDate

FROM <mytable> AS MT

CROSS APPLY (

SELECT MAX(VA.LDate)

FROM (VALUES(MT.L1),(MT.L2),(MT.L3)....) VA(LDate)

) AS MA(MaxDate)

Excuse any slight syntax errors, I have typed this away from SQL.

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

mister.magoo SSC-Forever Points: 47174 More actions · Answer 4

Just noticed "and the column header it belongs to", so:

SELECT MA.MaxDate, MA.ColName

FROM <mytable> AS MT

CROSS APPLY (

SELECT TOP(1) VA.LName, VA.LDate

FROM (VALUES('L1',MT.L1),('L2',MT.L2),('L3',MT.L3)....) VA(LName, LDate)

ORDER BY VA.LDate DESC

) AS MA(ColName, MaxDate)

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

Orlando Colamatteo SSC Guru Points: 182279 More actions · Answer 5

@mister.magoo, never seen APPLY used with a row-ctor to unpivot like that. Very cool.

Just for the sake of providing an alternative, we can do the same using a traditional UNPIVOT:

SELECT TOP 1

dates.id,

dates.L,

dates.Val

FROM (

SELECT id,

L1,

L2,

L3

FROM dbo.datetest

) dt UNPIVOT ( Val FOR L IN (L1, L2, L3) ) AS dates

ORDER BY dates.Val DESC;

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Jason A. Long SSC-Insane Points: 23786 More actions · Answer 6

Just for the fun of it... Here's a solution that uses a cast to binary & contaminate method...

Test data...

IF OBJECT_ID('tempdb..#temp', 'U') IS NOT NULL

DROP TABLE #temp;

CREATE TABLE #temp (

id INT PRIMARY KEY,

c1 DATE,

c2 DATE,

c3 DATE,

c4 DATE,

c5 DATE

);

INSERT #temp (id,c1,c2,c3,c4,c5)VALUES

(1, '2014-01-05','2014-06-03','2015-12-31','2015-06-01','2015-01-08'),

(2, '2015-01-05','2014-07-08','2014-12-31','2016-06-07','2015-02-02'),

(3, '2016-01-15','2014-04-03','2014-12-21','2015-07-01','2014-01-05'),

(4, '2014-01-05','2014-06-13','2045-12-30','2014-06-11','2015-11-28'),

(5, '2015-11-05','2015-06-03','2014-12-22','2014-11-01','2015-02-18');

And the solution...

SELECT

t.id,

ColumnName = CAST(SUBSTRING(MAX(x.DateValue), 5, 4) AS VARCHAR(2)),

DateValue = CAST(SUBSTRING(MAX(x.DateValue), 1, 4) AS DATE)

FROM

#temp t

CROSS APPLY (VALUES

(CAST(t.c1 AS BINARY(4)) + CAST('c1' AS BINARY(4))),

(CAST(t.c2 AS BINARY(4)) + CAST('c2' AS BINARY(4))),

(CAST(t.c3 AS BINARY(4)) + CAST('c3' AS BINARY(4))),

(CAST(t.c4 AS BINARY(4)) + CAST('c4' AS BINARY(4))),

(CAST(t.c5 AS BINARY(4)) + CAST('c5' AS BINARY(4))))

x (DateValue)

GROUP BY

t.id;

Eirikur Eiriksson SSC Guru Points: 182920 More actions · Answer 7

Orlando Colamatteo (1/18/2016)
@mister.magoo, never seen APPLY used with a row-ctor to unpivot like that. Very cool.
Just for the sake of providing an alternative, we can do the same using a traditional UNPIVOT:
SELECT TOP 1
dates.id,
dates.L,
dates.Val
FROM (
SELECT id,
L1,
L2,
L3
FROM dbo.datetest
) dt UNPIVOT ( Val FOR L IN (L1, L2, L3) ) AS dates
ORDER BY dates.Val DESC;

This doesn't work as the top 1 limits the output to a single row regardless of how many rows are in the table.

😎

Eirikur Eiriksson SSC Guru Points: 182920 More actions · Answer 8

Jason A. Long (1/18/2016)
Just for the fun of it... Here's a solution that uses a cast to binary & contaminate method...
Test data...
IF OBJECT_ID('tempdb..#temp', 'U') IS NOT NULL
DROP TABLE #temp;
CREATE TABLE #temp (
id INT PRIMARY KEY,
c1 DATE,
c2 DATE,
c3 DATE,
c4 DATE,
c5 DATE
);
INSERT #temp (id,c1,c2,c3,c4,c5)VALUES
(1, '2014-01-05','2014-06-03','2015-12-31','2015-06-01','2015-01-08'),
(2, '2015-01-05','2014-07-08','2014-12-31','2016-06-07','2015-02-02'),
(3, '2016-01-15','2014-04-03','2014-12-21','2015-07-01','2014-01-05'),
(4, '2014-01-05','2014-06-13','2045-12-30','2014-06-11','2015-11-28'),
(5, '2015-11-05','2015-06-03','2014-12-22','2014-11-01','2015-02-18');
And the solution...
SELECT
t.id,
ColumnName = CAST(SUBSTRING(MAX(x.DateValue), 5, 4) AS VARCHAR(2)),
DateValue = CAST(SUBSTRING(MAX(x.DateValue), 1, 4) AS DATE)
FROM
#temp t
CROSS APPLY (VALUES
(CAST(t.c1 AS BINARY(4)) + CAST('c1' AS BINARY(4))),
(CAST(t.c2 AS BINARY(4)) + CAST('c2' AS BINARY(4))),
(CAST(t.c3 AS BINARY(4)) + CAST('c3' AS BINARY(4))),
(CAST(t.c4 AS BINARY(4)) + CAST('c4' AS BINARY(4))),
(CAST(t.c5 AS BINARY(4)) + CAST('c5' AS BINARY(4))))
x (DateValue)
GROUP BY
t.id;

Nice idea but I see two potential problems, firstly the costly concatenation and substringing with the data type conversion is going to impact the performance probably more than what is gained by the potential stream aggregation in the execution plan. Secondly because of concatenation it is very unlikely to get a parallel plan which probably makes it perform similar to a normal union all for all columns. There are two flavours of unpivoting using cross apply, values and union all~~, the difference is that the former most likely goes parallel and the latter never goes parallel~~.

😎

Edit: Correction

Orlando Colamatteo SSC Guru Points: 182279 More actions · Answer 9

Eirikur Eiriksson (1/18/2016)
Orlando Colamatteo (1/18/2016)
@mister.magoo, never seen APPLY used with a row-ctor to unpivot like that. Very cool.
Just for the sake of providing an alternative, we can do the same using a traditional UNPIVOT:
SELECT TOP 1
dates.id,
dates.L,
dates.Val
FROM (
SELECT id,
L1,
L2,
L3
FROM dbo.datetest
) dt UNPIVOT ( Val FOR L IN (L1, L2, L3) ) AS dates
ORDER BY dates.Val DESC;
This doesn't work as the top 1 limits the output to a single row regardless of how many rows are in the table.
😎

My query works for a single row, i.e. on the OP's test data. It wants a WHERE-clause to limit what goes in to a single row.

If you needed something to work on a set you could partition and order the rows:

SELECT up.id,

up.L,

up.Val

FROM (

SELECT dates.id,

dates.L,

dates.Val,

ROW_NUMBER() OVER (PARTITION BY dates.id ORDER BY dates.Val DESC) AS uprank

FROM (

SELECT id,

L1,

L2,

L3

FROM dbo.datetest

) dt UNPIVOT ( Val FOR L IN (L1, L2, L3) ) AS dates

) up

WHERE up.uprank = 1;

I tried both of these on 1MM rows and as I suspected the CROSS APPLY method was quite a bit faster and more efficient on CPU time too.

CROSS APPLY:

Table 'datetest'. Scan count 5, logical reads 3448, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1891 ms, elapsed time = 1343 ms.
(1000000 row(s) affected)

UNPIVOT+ROW_NUMBER:

Table 'datetest'. Scan count 5, logical reads 3448, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 10154 ms, elapsed time = 2970 ms.
(1000000 row(s) affected)

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Jason A. Long SSC-Insane Points: 23786 More actions · Answer 10

Eirikur Eiriksson (1/18/2016)
Nice idea but I see two potential problems, firstly the costly concatenation and substringing with the data type conversion is going to impact the performance probably more than what is gained by the potential stream aggregation in the execution plan. Secondly because of concatenation it is very unlikely to get a parallel plan which probably makes it perform similar to a normal union all for all columns. There are two flavours of unpivoting using cross apply, values and union all~~, the difference is that the former most likely goes parallel and the latter never goes parallel~~.
😎
Edit: Correction

I can't comment on the above without testing on a larger data set. That said, the "top 1 / order by" method has a sort operator on an expression and the "binary concatenation" method puts the sort on the ID column... Which, in turn, means the sort operator can be eliminated by putting a clustered index on the ID column.

My thought (not proven w/o testing) is that the elimination sort operation should give the "binary concatenation" method the edge.

Orlando Colamatteo SSC Guru Points: 182279 More actions · Answer 11

Jason A. Long (1/19/2016)
Eirikur Eiriksson (1/18/2016)
Nice idea but I see two potential problems, firstly the costly concatenation and substringing with the data type conversion is going to impact the performance probably more than what is gained by the potential stream aggregation in the execution plan. Secondly because of concatenation it is very unlikely to get a parallel plan which probably makes it perform similar to a normal union all for all columns. There are two flavours of unpivoting using cross apply, values and union all~~, the difference is that the former most likely goes parallel and the latter never goes parallel~~.
😎
Edit: Correction
I can't comment on the above without testing on a larger data set. That said, the "top 1 / order by" method has a sort operator on an expression and the "binary concatenation" method puts the sort on the ID column... Which, in turn, means the sort operator can be eliminated by putting a clustered index on the ID column.
My thought (not proven w/o testing) is that the elimination sort operation should give the "binary concatenation" method the edge.

The binary method run my 1MM row test table without a CI is a bit slower than the CROSS APPLY:

Table 'datetest'. Scan count 5, logical reads 3448, physical reads 0, read-ahead reads 2850, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 5328 ms, elapsed time = 1832 ms.
(1000000 row(s) affected)

With a CI, the binary method still ends up a bit slower on elapsed time than the CROSS APPLY with or without the same CI but does better on CPU time:

Table 'datetest'. Scan count 1, logical reads 3464, physical reads 0, read-ahead reads 17, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1515 ms, elapsed time = 1532 ms.

The CROSS APPLY results again:

Table 'datetest'. Scan count 5, logical reads 3448, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1891 ms, elapsed time = 1343 ms.

My lowly entry using UNPIVOT+ROW_NUMBER still lags behind the others but it benefited the most from the CI:

Table 'Worktable'. Scan count 1, logical reads 2000005, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'datetest'. Scan count 1, logical reads 3464, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 2562 ms, elapsed time = 2565 ms.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

mister.magoo SSC-Forever Points: 47174 More actions · Answer 12

Thanks for testing Orlando.

I wonder how the performance of each method will depend on how many "date columns" are present as well, whether having 20 columns vs 10 columns is an issue?

Not suggesting anyone bother to test that 🙂

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

Orlando Colamatteo SSC Guru Points: 182279 More actions · Answer 13

The timings I presented were for 3 column 1MM rows. I wondered the same and did 10 to see how it did and the placing came out the same:

cross apply:

Table 'datetest'. Scan count 5, logical reads 6271, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 3842 ms, elapsed time = 2719 ms.

binary:

Table 'datetest'. Scan count 5, logical reads 6255, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 5296 ms, elapsed time = 3524 ms.

unpivot:

Table 'Worktable'. Scan count 1, logical reads 2000019, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'datetest'. Scan count 1, logical reads 6197, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 5500 ms, elapsed time = 5515 ms.

Maybe tonight I will work up a CASE-expression for this just to see.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Jacob Wilkins One Orange Chip Points: 27976 More actions · Answer 14

Just a cautionary note on the CASE version: be aware of how your implementation handles ties. If you're not careful with the CASE, then you can get incorrect results, as in this example.

SELECT ID=1,

L1='20151201',

L2='20151201',

L3='19000101'

INTO #temp;

SELECT id

, CASE

WHEN L1 > L2 AND L1 > L3 THEN L1

WHEN L2 > L1 AND L2 > L3 THEN L2

ELSE L3

END AS 'Date'

, CASE

WHEN L1 > L2 AND L1 > L3 THEN 'L1'

WHEN L2 > L1 AND L2 > L3 THEN 'L2'

ELSE 'L3'

END AS 'L'

FROM #temp;

DROP TABLE #temp;

A quick fix would be to change > to >= in the checks. Just be aware that if multiple columns are tied, then you'll always get the first of the ties in the order you evaluate them. That might not matter, or you might want to make sure you return a particular one of the ties, in which case you would need additional modifications.

The CASE statement's branches get evaluated in order, so you can also simplify a bit. If the second branch is being evaluated (after switching to >=), you know L1 isn't even a tie for the highest value, so you don't need to compare L2 to it. If the branch with L3's comparisons is being evaluated, you know that neither L1 nor L2 were even ties for the highest date, so L3 doesn't have to be compared to them, and so on down the branches (so your CASE statements end up being triangular looking affairs).

Here's what it would look like for Jason's sample data:

SELECT id,

CASE WHEN c1>=c2 AND c1>=c3 AND c1>=c4 and c1>=c5 THEN c1

WHEN c2>=c3 AND c2>=c4 AND c2>=c5 THEN c2

WHEN c3>=c4 AND c3>=c5 THEN c3

WHEN c4>=c5 THEN c4

ELSE c5

END,

CASE WHEN c1>=c2 AND c1>=c3 AND c1>=c4 and c1>=c5 THEN 'c1'

WHEN c2>=c3 AND c2>=c4 AND c2>=c5 THEN 'c2'

WHEN c3>=c4 AND c3>=c5 THEN 'c3'

WHEN c4>=c5 THEN 'c4'

ELSE 'c5'

END

FROM #temp

Cheers!