Finding the column containing the minimum value

Question

Post reply

Finding the column containing the minimum value

Phil Parkin

SSC Guru

Points: 246993
More actions
October 14, 2011 at 6:02 am

#245330

Given a table like this
declare @Tab1 table
(
id int identity(1,1),
v1 int,
v2 int,
v3 int,
v4 int,
v5 int,
v6 int)
insert @Tab1
(v1,v2,v3,v4,v5,v6)
select 1,2,3,4,5,6
union
select 2,3,1,4,5,6
union
select 3,1,2,4,5,6
union
select 4,5,6,2,3,1
select * from @Tab1 t
I want a decent-performing way of returning for each row ID, the column number containing the minimum of v1 to v6, thus:
1,1
2,3
3,2
4,6
Any ideas?
Thanks.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply

Lowell SSC Guru Points: 323518 More actions · Answer 1

Phil the only way i could see it was to use a case statement that evaluates each column;

this works, but i don't know how it will perform on large data sets:

also i think we need to use <= for all the value comparisons in case two columns are equal, you take the first column witht he lowest value?

--based on the setup you provided:

SELECT Id,

CASE

WHEN (v1 <= v2) AND (v1 <= v3) AND (v1 <= v4) AND (v1 <= v5) AND (v1 <= v6)

THEN 1

WHEN (v2 <= v1) AND (v2 <= v3) AND (v2 <= v4) AND (v2 <= v5) AND (v2 <= v6)

THEN 2

WHEN (v3 <= v1) AND (v3 <= v2) AND (v3 <= v4) AND (v3 <= v5) AND (v3 <= v6)

THEN 3

WHEN (v4 <= v1) AND (v4 <= v2) AND (v4 <= v3) AND (v4 <= v5) AND (v4 <= v6)

THEN 4

WHEN (v5 <= v1) AND (v5 <= v2) AND (v5 <= v3) AND (v5 <= v4) AND (v5 <= v6)

THEN 5

ELSE 6

END AS ColIndex

FROM @Tab1 t

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 2

Something similar to this outperforms the obvious pivot approach for removing the blanks from address elements:

select t.*, x.ColIndex

from @Tab1 t

CROSS APPLY (SELECT TOP 1 ColIndex from (

SELECT ColIndex = 1, ColName = v1 UNION ALL SELECT 2, v2 UNION ALL SELECT 3, v3 UNION ALL SELECT 4, v4 UNION ALL SELECT 5, v5 UNION ALL SELECT 6, v6

) d ORDER BY ColName

) x (ColIndex)

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Lowell SSC Guru Points: 323518 More actions · Answer 3

maybe it's just me, but both my solution and Chris's seem to produce the exact same execution plan?, but Chris's is 48% relative to the batch where mine was 17%? so the big case is better?

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 4

Lowell (10/14/2011)
maybe it's just me, but both my solution and Chris's seem to produce the exact same execution plan?, but Chris's is 48% relative to the batch where mine was 17%? so the big case is better?

Interesting - here, the plans are different. Your query is a straightup table scan and compute scalar, mine features a sort (and NLIJ) and is 4x "more expensive".

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Lowell SSC Guru Points: 323518 More actions · Answer 5

i'm an ID-TEN-T;

i looked at jsu tthe execution plan for the setup table variable...of course they are the same...once i scrolled down, i saw differences.

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Phil Parkin SSC Guru Points: 246993 More actions · Answer 6

Lowell (10/14/2011)
i'm an ID-TEN-T;
i looked at jsu tthe execution plan for the setup table variable...of course they are the same...once i scrolled down, i saw differences.

Haha. Thanks for the excellent input guys.

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 7

Anytime Phil. Test them both, I reckon Lowell's MegaCASE will win hands down.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Cadavre SSC-Forever Points: 41690 More actions · Answer 8

ChrisM@Work (10/14/2011)
Anytime Phil. Test them both, I reckon Lowell's MegaCASE will win hands down.

Now that you mention it, that was what I was looking at 🙂

I like Chris' solution, because it's something I wouldn't have come up with, so I set up a test with 1,000,000 rows.

IF object_id('tempdb..#Tab1') IS NOT NULL

BEGIN

DROP TABLE #Tab1

END

--1,000,000 Random rows of data

SELECT TOP 1000000 IDENTITY(INT,1,1) AS ID,

v1, v2, v3, v4, v5, v6

INTO #Tab1

FROM (SELECT (ABS(CHECKSUM(NewId())) % 10) + 1 AS v1,

(ABS(CHECKSUM(NewId())) % 10) + 1 AS v2,

(ABS(CHECKSUM(NewId())) % 10) + 1 AS v3,

(ABS(CHECKSUM(NewId())) % 10) + 1 AS v4,

(ABS(CHECKSUM(NewId())) % 10) + 1 AS v5,

(ABS(CHECKSUM(NewId())) % 10) + 1 AS v6

FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3) a

WHERE (v1 <> v2 AND v1 <> v3 AND v1 <> v4 AND v1 <> v5 AND v1 <> v6) AND

(v2 <> v3 AND v2 <> v4 AND v2 <> v5 AND v2 <> v6) AND (v3 <> v4 AND v3 <> v5 AND v3 <> v6) AND

(v4 <> v5 AND v4 <> v6) AND (v5 <> v6)

--Add a Primary Key

ALTER TABLE #Tab1

ADD CONSTRAINT Tab1_PK_ID

PRIMARY KEY CLUSTERED (ID) WITH FILLFACTOR = 100

DECLARE @VAR INT

PRINT '========== BASELINE =========='

SET STATISTICS TIME ON

SELECT @VAR = ID FROM #Tab1

SET STATISTICS TIME OFF

PRINT REPLICATE('=',80)

PRINT '========== CROSS JOIN =========='

SET STATISTICS TIME ON

SELECT @VAR = x.ColIndex

FROM #Tab1 t

CROSS APPLY (

SELECT TOP 1 ColIndex

FROM (SELECT ColIndex = 1, ColName = v1 UNION ALL SELECT 2, v2

UNION ALL SELECT 3, v3 UNION ALL SELECT 4, v4

UNION ALL SELECT 5, v5 UNION ALL SELECT 6, v6) d

ORDER BY ColName) x(ColIndex)

SET STATISTICS TIME OFF

PRINT REPLICATE('=',80)

PRINT '========== CASE =========='

SET STATISTICS TIME ON

SELECT @VAR = CASE WHEN (v1 <= v2) AND (v1 <= v3) AND (v1 <= v4) AND (v1 <= v5) AND (v1 <= v6)

THEN 1

WHEN (v2 <= v1) AND (v2 <= v3) AND (v2 <= v4) AND (v2 <= v5) AND (v2 <= v6)

THEN 2

WHEN (v3 <= v1) AND (v3 <= v2) AND (v3 <= v4) AND (v3 <= v5) AND (v3 <= v6)

THEN 3

WHEN (v4 <= v1) AND (v4 <= v2) AND (v4 <= v3) AND (v4 <= v5) AND (v4 <= v6)

THEN 4

WHEN (v5 <= v1) AND (v5 <= v2) AND (v5 <= v3) AND (v5 <= v4) AND (v5 <= v6)

THEN 5

ELSE 6

END

FROM #Tab1 t

SET STATISTICS TIME OFF

PRINT REPLICATE('=',80)

========== BASELINE ==========

SQL Server Execution Times:

CPU time = 140 ms, elapsed time = 153 ms.

================================================================================

========== CROSS JOIN ==========

SQL Server Execution Times:

CPU time = 9937 ms, elapsed time = 9950 ms.

================================================================================

========== CASE ==========

SQL Server Execution Times:

CPU time = 516 ms, elapsed time = 509 ms.

================================================================================

Forever trying to learn
My blog - http://www.cadavre.co.uk/
For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 9

20x faster! I'm going home...:blush:

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Phil Parkin SSC Guru Points: 246993 More actions · Answer 10

Applying Lowell's suggested method to my real-world version of this problem worked splendidly. >300k records updated in less than a second.

Thanks again.

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

spaghettidba SSC Guru Points: 105732 More actions · Answer 11

I'm perfectly aware that Lowell provided a nice lightning-fast solution, but...

I couldn't resist!

SELECT id, col = (

SELECT col

FROM (

SELECT id, v1, v2, v3, v4, v5, v6

) AS v

UNPIVOT ( value FOR col IN (v1, v2, v3, v4, v5, v6) ) AS u

WHERE value = (

SELECT MIN(value)

FROM (

SELECT *

FROM (

SELECT id, v1, v2, v3, v4, v5, v6

) AS v1

UNPIVOT ( value FOR col IN (v1, v2, v3, v4, v5, v6) ) AS u

) AS src

)

FROM #Tab1 AS T

-- Gianluca Sartori

Cadavre SSC-Forever Points: 41690 More actions · Answer 12

Gianluca Sartori (10/14/2011)
I couldn't resist!

Me either 😀

========== BASELINE ==========

SQL Server Execution Times:

CPU time = 203 ms, elapsed time = 205 ms.

================================================================================

========== CROSS JOIN ==========

SQL Server Execution Times:

CPU time = 12558 ms, elapsed time = 12561 ms.

================================================================================

========== CASE ==========

SQL Server Execution Times:

CPU time = 733 ms, elapsed time = 731 ms.

================================================================================

========== UNPIVOT ==========

SQL Server Execution Times:

CPU time = 3088 ms, elapsed time = 829 ms.

================================================================================

Forever trying to learn
My blog - http://www.cadavre.co.uk/
For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 13

Cadavre (10/17/2011)
Gianluca Sartori (10/14/2011)
I couldn't resist!
Me either 😀
========== BASELINE ==========
SQL Server Execution Times:
CPU time = 203 ms, elapsed time = 205 ms.
================================================================================
========== CROSS JOIN ==========
SQL Server Execution Times:
CPU time = 12558 ms, elapsed time = 12561 ms.
================================================================================
========== CASE ==========
SQL Server Execution Times:
CPU time = 733 ms, elapsed time = 731 ms.
================================================================================
========== UNPIVOT ==========
SQL Server Execution Times:
CPU time = 3088 ms, elapsed time = 829 ms.
================================================================================

Et tu, Brute!

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

spaghettidba SSC Guru Points: 105732 More actions · Answer 14

Cadavre (10/17/2011)
Gianluca Sartori (10/14/2011)
I couldn't resist!
Me either 😀

I already knew it was slow, but it was such a fun problem!

It's something you don't stumble upon very often.

-- Gianluca Sartori