Highest value supported by 2 of 3 columns

Question

Highest value supported by 2 of 3 columns

ArcticEd32

Say Hey Kid

Points: 694
More actions
May 23, 2017 at 8:44 am

#402088

Consider a table consisting of an ID number and 3 columns of scores (Score1, Score2, Score3). The final "score" for each record is the highest value supported by 2 of the 3 scores. Scores range from 1 through 5.
Examples, showing the three scores and final score:
1, 3, 5 -> 3
3, 4, 5 -> 4
4, 4, 4 -> 4
1, 1, 2 -> 1
Trying to find a way to calculate the final score for each record in a single pass through the table. Not opposed to a function, but there could be 5 records in the table or there could be 100,000 and it needs to be efficient. The actual data has two sets of score "clusters" that result in two independent final scores; don't know if that matters or changes the pathway.
My searching came up with how to use the VALUES constructor to get the max/min of the three columns, but can't wrap my head around the best way to get highest 2 of 3. I started down a path of using UNPIVOT to count the number of scores that were greater than each possible value but have gotten lost.
Any assistance would be greatly appreciated!
What I've worked on so far:
--Create table & insert some records IF OBJECT_ID('tempdb..#tmp') IS NOT NULL DROP TABLE #tmp; CREATE TABLE #tmp (ID int, Score1 int, Score2 int, Score3 int); INSERT INTO #tmp VALUES ((1),(1),(3),(5)), ((2),(3),(4),(5)), ((3),(4),(4),(4)), ((4),(1),(1),(2));
--This gets the highest value, but not the highest value --supported by 2 of 3 scores SELECT ID, Score1, Score2, Score3, ( SELECT Max(s) FROM ( VALUES (Score1), (Score2), (Score3) ) AS Scores(s) ) as MaxScore FROM #tmp;
--Tried to count number of scores >= each possible score but don't know if this is a workable path WITH Unpvt as (SELECT ID, Score FROM #tmp UNPIVOT ( Score for Col in(Score1, Score2, Score3)) as u) SELECT ID ,sum(case when Score >=1 then 1 else 0 end) as L1 ,sum(case when Score >=2 then 1 else 0 end) as L2 ,sum(case when Score >=3 then 1 else 0 end) as L3 ,sum(case when Score >=4 then 1 else 0 end) as L4 ,sum(case when Score >=5 then 1 else 0 end) as L5 from Unpvt group by ID;

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

ravdep SSC Journeyman Points: 99 More actions · Answer 1

It seems like you are trying to find the median of the 3 scores. If yes, then a query like below should work. If that is not what you need then please give better examples. Your current example shows score2 as the final score in all scenarios.

WITH A AS (SELECT ID, ScoreNumber,Score
FROM
(SELECT ID, Score1, Score2, Score3
FROM #tmp) p
UNPIVOT
(Score FOR ScoreNumber IN
  (Score1, Score2, Score3)
) AS unpvt)
,
B AS (
SELECT ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Score) AS RowNum,
     ID,
     Score,
     ScoreNumber
FROM A)
SELECT ID,Score AS FinalScore,ScoreNumber FROM B
WHERE RowNum=2

ArcticEd32 Say Hey Kid Points: 694 More actions · Answer 2

ravdep - Tuesday, May 23, 2017 9:50 AM
It seems like you are trying to find the median of the 3 scores. If yes, then a query like below should work. If that is not what you need then please give better examples. Your current example shows score2 as the final score in all scenarios.
WITH A AS (SELECT ID, ScoreNumber,Score
FROM
(SELECT ID, Score1, Score2, Score3
FROM #tmp) p
UNPIVOT
(Score FOR ScoreNumber IN
  (Score1, Score2, Score3)
) AS unpvt)
,
B AS (
SELECT ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Score) AS RowNum,
     ID,
     Score,
     ScoreNumber
FROM A)
SELECT ID,Score AS FinalScore,ScoreNumber FROM B
WHERE RowNum=2

Didn't think about this as a Median value, but I guess that's what it is. The final score could be defined as "a number between 1 and 5 that is met or exceeded by at least 2 of the 3 scores."

The order of the scores isn't meaningful and the column that contains the middle score isn't important either; just the final calculated score.

I got the code below to work and it uses a larger sample, but what you provided is more compact & slightly faster.

Curious to know if there are any more compact ways to do it.

Thank you for the answer

--Create table IF OBJECT_ID('tempdb..#tmp') IS NOT NULL DROP TABLE #tmp; CREATE TABLE #tmp (ID int, Score1 int, Score2 int, Score3 int, FinalScore int);

--Populate with random data INSERT INTO #tmp (ID, Score1, Score2, Score3) SELECT TOP 100000 ROW_NUMBER() OVER(ORDER BY o1.[Object_id]) AS ID ,CAST(RAND(CHECKSUM(NEWID())) * 5 as INT) + 1 as Score1 ,CAST(RAND(CHECKSUM(NEWID())) * 5 as INT) + 1 as Score2 ,CAST(RAND(CHECKSUM(NEWID())) * 5 as INT) + 1 as Score3 from sys.all_objects o1 cross join sys.all_objects o2;

--Works, but it's ugly WITH Unpvt as (SELECT ID, Score FROM #tmp UNPIVOT ( Score for Col in(Score1, Score2, Score3)) as u) ,Counts as ( SELECT ID ,sum(case when Score >=1 then 1 else 0 end) as L1 ,sum(case when Score >=2 then 1 else 0 end) as L2 ,sum(case when Score >=3 then 1 else 0 end) as L3 ,sum(case when Score >=4 then 1 else 0 end) as L4 ,sum(case when Score >=5 then 1 else 0 end) as L5 FROM Unpvt GROUP BY ID ) SELECT t.ID, t.Score1, t.Score2, t.Score3, CASE WHEN c.L5 >= 2 THEN 5 WHEN c.L4 >= 2 THEN 4 WHEN c.L3 >= 2 THEN 3 WHEN c.L2 >= 2 THEN 2 WHEN c.L1 >= 2 THEN 1 else 0 END as FinalScore FROM #tmp t INNER JOIN Counts c on t.id = c.id;

ravdep SSC Journeyman Points: 99 More actions · Answer 3

Not sure if I will call the solution below more compact but it is definitely faster. Basically, it is taking the average of 3 scores and determining the final score as the closest score to the average value.

;WITH A AS (SELECT *,(Score1+Score2+Score3)/3.0 AS Average FROM #Tmp),
B AS (SELECT *,
    ABS(Score1-Average) AS Score1b,
    ABS(Score2-Average) AS Score2b,
    ABS(Score3-Average) AS Score3b
FROM A)
SELECT ID,Score1,Score2,Score3, CASE WHEN Score1b <= Score2b And Score1b <= Score3b THEN Score1
WHEN Score2b <= Score1b And Score2b <= Score3b THEN Score2
ELSE Score3
END As FinalScore
FROM B
ORDER BY [ID]

Jacob Wilkins One Orange Chip Points: 27976 More actions · Answer 4

If the only things we care about are getting the ID and the desired score and being concise, and don't have any NULLs (fairly artificial, but hey :)), here's one fairly concise attempt:

SELECT ID, desired_score=SUM(x)-(MIN(x)+MAX(x)) FROM #tmp t CROSS APPLY (VALUES(Score1),(Score2),(Score3))x(x) GROUP BY ID;

ScottPletcher SSC Guru Points: 101134 More actions · Answer 5

The second-highest score should always give you the desired value:

SELECT t.ID, t.Score1, t.Score2, t.Score3, ca1.FinalScore FROM #tmp t CROSS APPLY ( SELECT TOP (1) Score AS FinalScore FROM ( SELECT TOP (2) Score FROM ( VALUES(Score1),(Score2),(Score3) ) AS derived(score) ORDER BY Score DESC ) AS derived2 ORDER BY Score ) AS ca1

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

ArcticEd32 Say Hey Kid Points: 694 More actions · Answer 6

Jacob Wilkins - Tuesday, May 23, 2017 11:53 AM
If the only things we care about are getting the ID and the desired score and being concise, and don't have any NULLs (fairly artificial, but hey :)), here's one fairly concise attempt:
SELECT ID, desired_score=SUM(x)-(MIN(x)+MAX(x)) FROM #tmp t CROSS APPLY (VALUES(Score1),(Score2),(Score3))x(x) GROUP BY ID;

That's the math I couldn't think of this morning (sum - Max + Min), thank you. Ravdep's version is about 30% faster than mine, and yours is another 6% faster than that using 10 runs on 1,000,000 rows. Nulls aren't allowed so this works.
Thanks again to both of you!