Highest value supported by 2 of 3 columns

  • Consider a table consisting of an ID number and 3 columns of scores (Score1, Score2, Score3).  The final "score" for each record is the highest value supported by 2 of the 3 scores.  Scores range from 1 through 5.  

    Examples, showing the three scores and final score:
    1, 3, 5 -> 3
    3, 4, 5 -> 4
    4, 4, 4 -> 4
    1, 1, 2 -> 1

    Trying to find a way to calculate the final score for each record in a single pass through the table.  Not opposed to a function, but there could be 5 records in the table or there could be 100,000 and it needs to be efficient.  The actual data has two sets of score "clusters" that result in two independent final scores; don't know if that matters or changes the pathway.

    My searching came up with how to use the VALUES constructor to get the max/min of the three columns, but can't wrap my head around the best way to get highest 2 of 3.  I started down a path of using UNPIVOT to count the number of scores that were greater than each possible value but have gotten lost.  

    Any assistance would be greatly appreciated!

    What I've worked on so far:

    --Create table & insert some records
    IF OBJECT_ID('tempdb..#tmp') IS NOT NULL DROP TABLE #tmp;
    CREATE TABLE #tmp (ID int, Score1 int, Score2 int, Score3 int);
    INSERT INTO #tmp
    VALUES
    ((1),(1),(3),(5)),
    ((2),(3),(4),(5)),
    ((3),(4),(4),(4)),
    ((4),(1),(1),(2));

    --This gets the highest value, but not the highest value
    --supported by 2 of 3 scores
    SELECT
        ID, Score1, Score2, Score3,
        (
        SELECT
            Max(s)
        FROM
            (
            VALUES (Score1), (Score2), (Score3)
            ) AS Scores(s)
        ) as MaxScore
    FROM
        #tmp;

    --Tried to count number of scores >= each possible score but don't know if this is a workable path
    WITH Unpvt as (SELECT ID, Score FROM #tmp UNPIVOT ( Score for Col in(Score1, Score2, Score3)) as u)
    SELECT
        ID
        ,sum(case when Score >=1 then 1 else 0 end) as L1
        ,sum(case when Score >=2 then 1 else 0 end) as L2
        ,sum(case when Score >=3 then 1 else 0 end) as L3
        ,sum(case when Score >=4 then 1 else 0 end) as L4
        ,sum(case when Score >=5 then 1 else 0 end) as L5
    from
        Unpvt
    group by
        ID;

  • It seems like you are trying to find the median of the 3 scores. If yes, then a query like below should work. If that is not what you need then please give better examples. Your current example shows score2 as the final score in all scenarios.

    WITH A AS (SELECT ID, ScoreNumber,Score
    FROM
     (SELECT ID, Score1, Score2, Score3
     FROM #tmp) p
    UNPIVOT
     (Score FOR ScoreNumber IN
      (Score1, Score2, Score3)
    ) AS unpvt)
    ,
    B AS (
    SELECT ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Score) AS RowNum,
         ID,
         Score,
         ScoreNumber
    FROM A)
    SELECT ID,Score AS FinalScore,ScoreNumber FROM B
    WHERE RowNum=2

  • ravdep - Tuesday, May 23, 2017 9:50 AM

    It seems like you are trying to find the median of the 3 scores. If yes, then a query like below should work. If that is not what you need then please give better examples. Your current example shows score2 as the final score in all scenarios.

    WITH A AS (SELECT ID, ScoreNumber,Score
    FROM
     (SELECT ID, Score1, Score2, Score3
     FROM #tmp) p
    UNPIVOT
     (Score FOR ScoreNumber IN
      (Score1, Score2, Score3)
    ) AS unpvt)
    ,
    B AS (
    SELECT ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Score) AS RowNum,
         ID,
         Score,
         ScoreNumber
    FROM A)
    SELECT ID,Score AS FinalScore,ScoreNumber FROM B
    WHERE RowNum=2

    Didn't think about this as a Median value, but I guess that's what it is.  The final score could be defined as "a number between 1 and 5 that is met or exceeded by at least 2 of the 3 scores."

    The order of the scores isn't meaningful and the column that contains the middle score isn't important either; just the final calculated score.

    I got the code below to work and it uses a larger sample, but what you provided is more compact & slightly faster.  

    Curious to know if there are any more compact ways to do it.

    Thank you for the answer


    --Create table
    IF OBJECT_ID('tempdb..#tmp') IS NOT NULL DROP TABLE #tmp;
    CREATE TABLE #tmp (ID int, Score1 int, Score2 int, Score3 int, FinalScore int);

    --Populate with random data
    INSERT INTO #tmp (ID, Score1, Score2, Score3)
    SELECT TOP 100000
        ROW_NUMBER() OVER(ORDER BY o1.[Object_id]) AS ID
        ,CAST(RAND(CHECKSUM(NEWID())) * 5 as INT) + 1 as Score1
        ,CAST(RAND(CHECKSUM(NEWID())) * 5 as INT) + 1 as Score2
        ,CAST(RAND(CHECKSUM(NEWID())) * 5 as INT) + 1 as Score3
    from
        sys.all_objects o1
        cross join sys.all_objects o2;

    --Works, but it's ugly
    WITH
        Unpvt as (SELECT ID, Score FROM #tmp UNPIVOT ( Score for Col in(Score1, Score2, Score3)) as u)
        ,Counts as
            (
            SELECT
                ID
                ,sum(case when Score >=1 then 1 else 0 end) as L1
                ,sum(case when Score >=2 then 1 else 0 end) as L2
                ,sum(case when Score >=3 then 1 else 0 end) as L3
                ,sum(case when Score >=4 then 1 else 0 end) as L4
                ,sum(case when Score >=5 then 1 else 0 end) as L5
            FROM
                Unpvt
            GROUP BY
                ID
            )
    SELECT
        t.ID, t.Score1, t.Score2, t.Score3,
        CASE
            WHEN c.L5 >= 2 THEN 5
            WHEN c.L4 >= 2 THEN 4
            WHEN c.L3 >= 2 THEN 3
            WHEN c.L2 >= 2 THEN 2
            WHEN c.L1 >= 2 THEN 1
            else 0
        END as FinalScore
    FROM
        #tmp t
        INNER JOIN Counts c on
            t.id = c.id;

  • Not sure if I will call the solution below more compact but it is definitely faster. Basically, it is taking the average of 3 scores and determining the final score as the closest score to the average value. 

    ;WITH A AS (SELECT *,(Score1+Score2+Score3)/3.0 AS Average FROM #Tmp),
    B AS (SELECT *,
        ABS(Score1-Average) AS Score1b,
        ABS(Score2-Average) AS Score2b,
        ABS(Score3-Average) AS Score3b
      FROM A)
    SELECT ID,Score1,Score2,Score3, CASE WHEN Score1b <= Score2b And Score1b <= Score3b THEN Score1
      WHEN Score2b <= Score1b And Score2b <= Score3b THEN Score2
      ELSE Score3
      END As FinalScore
    FROM B
    ORDER BY [ID]

  • If the only things we care about are getting the ID and the desired score and being concise, and don't have any NULLs (fairly artificial, but hey :)), here's one fairly concise attempt:

    SELECT   ID, desired_score=SUM(x)-(MIN(x)+MAX(x))
    FROM     #tmp t
             CROSS APPLY
             (VALUES(Score1),(Score2),(Score3))x(x)
    GROUP BY ID;

  • The second-highest score should always give you the desired value:


    SELECT t.ID, t.Score1, t.Score2, t.Score3, ca1.FinalScore
    FROM #tmp t
    CROSS APPLY (
        SELECT TOP (1) Score AS FinalScore
        FROM (
            SELECT TOP (2) Score
            FROM ( VALUES(Score1),(Score2),(Score3) ) AS derived(score)
            ORDER BY Score DESC
        ) AS derived2
        ORDER BY Score
    ) AS ca1

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Jacob Wilkins - Tuesday, May 23, 2017 11:53 AM

    If the only things we care about are getting the ID and the desired score and being concise, and don't have any NULLs (fairly artificial, but hey :)), here's one fairly concise attempt:

    SELECT   ID, desired_score=SUM(x)-(MIN(x)+MAX(x))
    FROM     #tmp t
             CROSS APPLY
             (VALUES(Score1),(Score2),(Score3))x(x)
    GROUP BY ID;

    That's the math I couldn't think of this morning (sum - Max + Min), thank you.  Ravdep's version is about 30% faster than mine, and yours is another 6% faster than that using 10 runs on 1,000,000 rows.  Nulls aren't allowed so this works.
    Thanks again to both of you!

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply