May 23, 2017 at 8:44 am
Consider a table consisting of an ID number and 3 columns of scores (Score1, Score2, Score3). The final "score" for each record is the highest value supported by 2 of the 3 scores. Scores range from 1 through 5.
Examples, showing the three scores and final score:
1, 3, 5 -> 3
3, 4, 5 -> 4
4, 4, 4 -> 4
1, 1, 2 -> 1
Trying to find a way to calculate the final score for each record in a single pass through the table. Not opposed to a function, but there could be 5 records in the table or there could be 100,000 and it needs to be efficient. The actual data has two sets of score "clusters" that result in two independent final scores; don't know if that matters or changes the pathway.
My searching came up with how to use the VALUES constructor to get the max/min of the three columns, but can't wrap my head around the best way to get highest 2 of 3. I started down a path of using UNPIVOT to count the number of scores that were greater than each possible value but have gotten lost.
Any assistance would be greatly appreciated!
What I've worked on so far:
--Create table & insert some records
IF OBJECT_ID('tempdb..#tmp') IS NOT NULL DROP TABLE #tmp;
CREATE TABLE #tmp (ID int, Score1 int, Score2 int, Score3 int);
INSERT INTO #tmp
VALUES
((1),(1),(3),(5)),
((2),(3),(4),(5)),
((3),(4),(4),(4)),
((4),(1),(1),(2));
--This gets the highest value, but not the highest value
--supported by 2 of 3 scores
SELECT
ID, Score1, Score2, Score3,
(
SELECT
Max(s)
FROM
(
VALUES (Score1), (Score2), (Score3)
) AS Scores(s)
) as MaxScore
FROM
#tmp;
--Tried to count number of scores >= each possible score but don't know if this is a workable path
WITH Unpvt as (SELECT ID, Score FROM #tmp UNPIVOT ( Score for Col in(Score1, Score2, Score3)) as u)
SELECT
ID
,sum(case when Score >=1 then 1 else 0 end) as L1
,sum(case when Score >=2 then 1 else 0 end) as L2
,sum(case when Score >=3 then 1 else 0 end) as L3
,sum(case when Score >=4 then 1 else 0 end) as L4
,sum(case when Score >=5 then 1 else 0 end) as L5
from
Unpvt
group by
ID;
May 23, 2017 at 9:50 am
It seems like you are trying to find the median of the 3 scores. If yes, then a query like below should work. If that is not what you need then please give better examples. Your current example shows score2 as the final score in all scenarios.
WITH A AS (SELECT ID, ScoreNumber,Score
FROM
(SELECT ID, Score1, Score2, Score3
FROM #tmp) p
UNPIVOT
(Score FOR ScoreNumber IN
(Score1, Score2, Score3)
) AS unpvt)
,
B AS (
SELECT ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Score) AS RowNum,
ID,
Score,
ScoreNumber
FROM A)
SELECT ID,Score AS FinalScore,ScoreNumber FROM B
WHERE RowNum=2
May 23, 2017 at 10:31 am
ravdep - Tuesday, May 23, 2017 9:50 AMIt seems like you are trying to find the median of the 3 scores. If yes, then a query like below should work. If that is not what you need then please give better examples. Your current example shows score2 as the final score in all scenarios.WITH A AS (SELECT ID, ScoreNumber,Score
FROM
(SELECT ID, Score1, Score2, Score3
FROM #tmp) p
UNPIVOT
(Score FOR ScoreNumber IN
(Score1, Score2, Score3)
) AS unpvt)
,
B AS (
SELECT ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Score) AS RowNum,
ID,
Score,
ScoreNumber
FROM A)
SELECT ID,Score AS FinalScore,ScoreNumber FROM B
WHERE RowNum=2
Didn't think about this as a Median value, but I guess that's what it is. The final score could be defined as "a number between 1 and 5 that is met or exceeded by at least 2 of the 3 scores."
The order of the scores isn't meaningful and the column that contains the middle score isn't important either; just the final calculated score.
I got the code below to work and it uses a larger sample, but what you provided is more compact & slightly faster.
Curious to know if there are any more compact ways to do it.
Thank you for the answer
--Create table
IF OBJECT_ID('tempdb..#tmp') IS NOT NULL DROP TABLE #tmp;
CREATE TABLE #tmp (ID int, Score1 int, Score2 int, Score3 int, FinalScore int);
--Populate with random data
INSERT INTO #tmp (ID, Score1, Score2, Score3)
SELECT TOP 100000
ROW_NUMBER() OVER(ORDER BY o1.[Object_id]) AS ID
,CAST(RAND(CHECKSUM(NEWID())) * 5 as INT) + 1 as Score1
,CAST(RAND(CHECKSUM(NEWID())) * 5 as INT) + 1 as Score2
,CAST(RAND(CHECKSUM(NEWID())) * 5 as INT) + 1 as Score3
from
sys.all_objects o1
cross join sys.all_objects o2;
--Works, but it's ugly
WITH
Unpvt as (SELECT ID, Score FROM #tmp UNPIVOT ( Score for Col in(Score1, Score2, Score3)) as u)
,Counts as
(
SELECT
ID
,sum(case when Score >=1 then 1 else 0 end) as L1
,sum(case when Score >=2 then 1 else 0 end) as L2
,sum(case when Score >=3 then 1 else 0 end) as L3
,sum(case when Score >=4 then 1 else 0 end) as L4
,sum(case when Score >=5 then 1 else 0 end) as L5
FROM
Unpvt
GROUP BY
ID
)
SELECT
t.ID, t.Score1, t.Score2, t.Score3,
CASE
WHEN c.L5 >= 2 THEN 5
WHEN c.L4 >= 2 THEN 4
WHEN c.L3 >= 2 THEN 3
WHEN c.L2 >= 2 THEN 2
WHEN c.L1 >= 2 THEN 1
else 0
END as FinalScore
FROM
#tmp t
INNER JOIN Counts c on
t.id = c.id;
May 23, 2017 at 11:39 am
Not sure if I will call the solution below more compact but it is definitely faster. Basically, it is taking the average of 3 scores and determining the final score as the closest score to the average value.
;WITH A AS (SELECT *,(Score1+Score2+Score3)/3.0 AS Average FROM #Tmp),
B AS (SELECT *,
ABS(Score1-Average) AS Score1b,
ABS(Score2-Average) AS Score2b,
ABS(Score3-Average) AS Score3b
FROM A)
SELECT ID,Score1,Score2,Score3, CASE WHEN Score1b <= Score2b And Score1b <= Score3b THEN Score1
WHEN Score2b <= Score1b And Score2b <= Score3b THEN Score2
ELSE Score3
END As FinalScore
FROM B
ORDER BY [ID]
May 23, 2017 at 11:53 am
If the only things we care about are getting the ID and the desired score and being concise, and don't have any NULLs (fairly artificial, but hey :)), here's one fairly concise attempt:
SELECT ID, desired_score=SUM(x)-(MIN(x)+MAX(x))
FROM #tmp t
CROSS APPLY
(VALUES(Score1),(Score2),(Score3))x(x)
GROUP BY ID;
May 23, 2017 at 12:12 pm
The second-highest score should always give you the desired value:
SELECT t.ID, t.Score1, t.Score2, t.Score3, ca1.FinalScore
FROM #tmp t
CROSS APPLY (
SELECT TOP (1) Score AS FinalScore
FROM (
SELECT TOP (2) Score
FROM ( VALUES(Score1),(Score2),(Score3) ) AS derived(score)
ORDER BY Score DESC
) AS derived2
ORDER BY Score
) AS ca1
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
May 23, 2017 at 12:13 pm
Jacob Wilkins - Tuesday, May 23, 2017 11:53 AMIf the only things we care about are getting the ID and the desired score and being concise, and don't have any NULLs (fairly artificial, but hey :)), here's one fairly concise attempt:
SELECT ID, desired_score=SUM(x)-(MIN(x)+MAX(x))
FROM #tmp t
CROSS APPLY
(VALUES(Score1),(Score2),(Score3))x(x)
GROUP BY ID;
That's the math I couldn't think of this morning (sum - Max + Min), thank you. Ravdep's version is about 30% faster than mine, and yours is another 6% faster than that using 10 runs on 1,000,000 rows. Nulls aren't allowed so this works.
Thanks again to both of you!
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply