Issue with assigning candidates to a review list

  • OK, this gets rid of the Cursor and still assigns candidates to board members pretty evenly (+/- 3 out of 720):

    /*

    RULES:

    1. A selection board will convene, consisting of 3 to N # of board members.

    2. This selection board will review eligible candidates for scholarships.

    3. Each candidate will be assigned to be reviewed by 3, and only 3, of the board members.

    4. Each board member's "list" of candidates to review should be evenly distributed

    (doesn't have to be exact, but should be close to equal).

    and the kicker...

    5. The candidate should not be assigned to a board members list if that board member

    is one of the top 5 schools that the candidate wants a scholarship to. However, enforcement

    of rule #3 takes precedence over this rule... meaning that if there are 4 board members,

    and 2 of them are from the top 5 school choices of a candidate,

    then the candidate still needs to be assigned to three board member's list.

    */

    declare @BrdDt datetime; set @BrdDt = '07/29/2009';

    --====== get the board members. Minimum of 3, no maximum (reality says < 10) ======

    if OBJECT_ID('tempdb..#BrdMbrs') IS NOT NULL DROP TABLE #BrdMbrs

    CREATE TABLE #BrdMbrs (VoterID int PRIMARY KEY CLUSTERED, SSN_ID char(9), Board_DT datetime, SchoolCd char(6))

    INSERT INTO #BrdMbrs (VoterID) select top 5 VoterID = ROW_NUMBER() OVER (ORDER BY object_id) from sys.objects

    --====== get the candidates that are ready for the board selection process ======

    if OBJECT_ID('tempdb..#Candidates') IS NOT NULL DROP TABLE #Candidates

    CREATE TABLE #Candidates (CandidateID int PRIMARY KEY CLUSTERED)

    INSERT INTO #Candidates select top 1200 CandidateID = ROW_NUMBER() OVER (ORDER BY s1.object_id) from sys.objects s1, sys.objects s2

    --====== get the school choices that the candidates want to go to ======

    if OBJECT_ID('tempdb..#CandidateSchoolChoices') IS NOT NULL DROP TABLE #CandidateSchoolChoices

    CREATE TABLE #CandidateSchoolChoices (CandidateID int, ChoiceNbr tinyint, SchoolChoice char(6), PRIMARY KEY (CandidateID, ChoiceNbr))

    insert into #CandidateSchoolChoices

    select CandidateID, dt.ChoiceNbr, right('000000' + convert(varchar(6), ABS(CHECKSUM(NEWID())) %1000000+ 1 ), 6)

    from #Candidates

    CROSS APPLY (select top 5 ChoiceNbr = ROW_NUMBER() OVER (ORDER BY OBJECT_ID) from sys.objects) dt

    declare @VoterCount tinyint, @Vote tinyint

    select @VoterCount = COUNT(*) from #BrdMbrs

    --====== get the three most popular school choices, and "assign" a board member to that school ======

    -- in reality, the board member is from a school, and this code won't be performed.

    ;with CTE AS

    (

    select top (@VoterCount) SchoolChoice, COUNT(*) qty

    from #CandidateSchoolChoices

    group by SchoolChoice

    order by qty desc

    )

    , CTE2 AS

    (

    select SchoolChoice, VoterID = ROW_NUMBER() OVER (ORDER BY (select 1))

    from CTE

    )

    update b

    set b.SchoolCd = CTE2.SchoolChoice

    from #BrdMbrs b INNER JOIN CTE2 ON b.VoterID = CTE2.VoterID

    ----====== each candidate will be assigned to 3, and only three lists. ======

    -- temp table to see how many candidates are assigned to each voter

    if OBJECT_ID('tempdb..#VoterList') IS NOT NULL DROP TABLE #VoterList

    CREATE TABLE #VoterList (

    CandidateID INT NOT NULL,

    BrdMbrNbr INT NOT NULL,

    SSN_ID char(9),

    Board_DT datetime,

    BrdMbrVote tinyint,

    PRIMARY KEY CLUSTERED (CandidateID, BrdMbrNbr))

    -- (No more cursors!) --

    --====== set based method to (almost) evenly assign candidates to voters.

    ;WITH CTE AS

    (

    SELECT c.CandidateID,

    b.VoterID,

    Same_School_Flag = (SELECT COUNT(*)

    FROM #CandidateSchoolChoices cs

    where cs.CandidateID = c.CandidateID

    and cs.SchoolChoice = b.SchoolCd),

    --== This does not work because there's nothing in it yet ==

    --VoterQty = (SELECT COUNT(*)

    -- FROM #VoterList

    -- where BrdMbrNbr = b.VoterID)

    -- =================================================

    VoterQty = (c.CandidateID + b.VoterID) % @VoterCount

    --+

    from #Candidates c

    CROSS APPLY #BrdMbrs b

    )

    , CTE2 AS

    (

    SELECT CandidateID,

    VoterID,

    Same_School_Flag,

    VoterQty,

    RowNbr = ROW_NUMBER() OVER (PARTITION BY CandidateID ORDER BY Same_School_Flag, VoterQty)

    FROM CTE

    )

    INSERT INTO #VoterList (CandidateID, BrdMbrNbr, Board_DT)

    SELECT CTE2.CandidateID, CTE2.VoterID, @BrdDt

    FROM CTE2

    WHERE CTE2.RowNbr between 1 and 3

    select BrdMbrNbr, COUNT(*) from #VoterList group by BrdMbrNbr order by BrdMbrNbr

    select * from #VoterList

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Barry,

    Nice... real nice. I like the approximate number calc, which was what I was missing.

    I did run into one issue in implementing it... and it's my fault. The data I provided for the candidates gave you a sequentially incrementing number for the CandidateID, which was not valid. This resulted in getting lists with a range of > 60 when running with the real data, which wasn't acceptable.

    I corrected this by adding an Identity column to the #Candidates table, and using that column in the VoterQty calc instead of the CandidateID column.

    For performance ...

    with 652 candidates, and 6 board members:

    cursor: 18 secs

    set - < 1 sec

    Barry, thank you VERY MUCH for looking into this.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Good job on the workaround, Wayne.

    Heh... Barry... nice job. You deserve the title Mr. "There must be 15 ways to..."

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Wayne, Jeff.

    I played around with it a little using what I felt was more naturally distributed data (making some schools much more popular than others) and it seems that the variation of differences in how many candidates each reviewer is assigned to with mine is very close to how popular that reviewers school is. In other words, reviewers from very popular schools may get significantly less assignments with my algorithm.

    If this turns out to be too large with the real data let us know, there may be slightly more complicated set-based solutions that give a more even result.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • EDIT: RBarry beat me to the post while I was playing with the code and writing this up. Hope it's still useful.

    -----------------------

    I'm not sure I fully understand how the approximate number calc is intended to work. It seems that it depends on an even distribution of school choices. If there's one or two highly desirable schools represented by a board member, it looks as though the distribution gets skewed, with the board members from the less highly sought-after schools getting more than their share of review packets and the Harvard and Stanford reps getting fewer. I tried seeding the candidate choice table with such a case by having every tenth candidate's first choice be one school.-- Existing test data build....

    insert into #CandidateSchoolChoices

    select CandidateID, dt.ChoiceNbr, right('000000' + convert(varchar(6), ABS(CHECKSUM(NEWID())) %1000000+ 1 ), 6)

    from #Candidates

    CROSS APPLY (select top 5 ChoiceNbr = ROW_NUMBER() OVER (ORDER BY OBJECT_ID) from sys.objects) dt

    -- Then override every 10th candidate's first choice to be a choice for 123456, our "Harvard"

    update #CandidateSchoolChoices

    set Schoolchoice = 123456

    where ChoiceNbr = 1

    and CandidateID % 10 = 0

    ----------

    My final output then had these assignment counts (I added a join to #BrdMbrs to get the school number in that query):

    select v.BrdMbrNbr,b.SchoolCd, COUNT(*)as Qty

    from #VoterList v

    join #BrdMbrs b on b.VoterID = v.BrdMbrNbr

    group by BrdMbrNbr,b.SchoolCd

    order by BrdMbrNbr

    . . . .

    BrdMbrNbr SchoolCd Qty

    ----------- -------- -----------

    1 123456 601

    2 623968 721

    3 125090 839

    4 486669 720

    5 361024 719 Have I missed something here?

  • Yes, that's similar to what I saw.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • RBarryYoung (8/17/2009)


    Thanks Wayne, Jeff.

    I played around with it a little using what I felt was more naturally distributed data (making some schools much more popular than others) and it seems that the variation of differences in how many candidates each reviewer is assigned to with mine is very close to how popular that reviewers school is. In other words, reviewers from very popular schools may get significantly less assignments with my algorithm.

    If this turns out to be too large with the real data let us know, there may be slightly more complicated set-based solutions that give a more even result.

    Well, for the data that we have right now, it's giving us a spread of 10 across 8 board members. Which is well within the acceptable range. What is unknown is how this will change... it's kinda hard to predict what schools scholarship applicants will want to go to... so I can see this popping up again in the future as being a problem that needs to be fixed.

    That being said, this is also something that would be nice to wrap up properly the first time around. If something needs changing, around here it could take 2 months for it to move through all of the testing environments to get to production.

    If you're willing to help, I'm attaching a script that has about 650 real CandidateID numbers. I also like what John did to give a heavy preference towards one school.

    One thing I will say about the cursor based way... the counts per board member were very close together... total spread on the same data is only 3.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • WayneS: I've been trying to improve on this but it's going to take a while. It's a tough problem to improve on where it is, plus I've been pretty doped up on pain meds the last 4 days (back problems) which is making everything fuzzy....

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Barry, I appreciate you looking into this. Take care of yourself first, and come back to this when you're up to it.

    That "tough problem" part intrigues (and scares) me. If there's any area that you can point me in, I'll see what I can do with it.

    Thanks!

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

Viewing 9 posts - 16 through 23 (of 23 total)

You must be logged in to reply to this topic. Login to reply