Issue with assigning candidates to a review list

Question

Post reply

Issue with assigning candidates to a review list

Viewing 9 posts - 16 through 23 (of 23 total)

You must be logged in to reply to this topic. Login to reply

RBarryYoung SSC Guru Points: 143329 More actions · Answer 1

OK, this gets rid of the Cursor and still assigns candidates to board members pretty evenly (+/- 3 out of 720):

/*

RULES:

1. A selection board will convene, consisting of 3 to N # of board members.

2. This selection board will review eligible candidates for scholarships.

3. Each candidate will be assigned to be reviewed by 3, and only 3, of the board members.

4. Each board member's "list" of candidates to review should be evenly distributed

(doesn't have to be exact, but should be close to equal).

and the kicker...

5. The candidate should not be assigned to a board members list if that board member

is one of the top 5 schools that the candidate wants a scholarship to. However, enforcement

of rule #3 takes precedence over this rule... meaning that if there are 4 board members,

and 2 of them are from the top 5 school choices of a candidate,

then the candidate still needs to be assigned to three board member's list.

*/

declare @BrdDt datetime; set @BrdDt = '07/29/2009';

--====== get the board members. Minimum of 3, no maximum (reality says < 10) ======

if OBJECT_ID('tempdb..#BrdMbrs') IS NOT NULL DROP TABLE #BrdMbrs

CREATE TABLE #BrdMbrs (VoterID int PRIMARY KEY CLUSTERED, SSN_ID char(9), Board_DT datetime, SchoolCd char(6))

INSERT INTO #BrdMbrs (VoterID) select top 5 VoterID = ROW_NUMBER() OVER (ORDER BY object_id) from sys.objects

--====== get the candidates that are ready for the board selection process ======

if OBJECT_ID('tempdb..#Candidates') IS NOT NULL DROP TABLE #Candidates

CREATE TABLE #Candidates (CandidateID int PRIMARY KEY CLUSTERED)

INSERT INTO #Candidates select top 1200 CandidateID = ROW_NUMBER() OVER (ORDER BY s1.object_id) from sys.objects s1, sys.objects s2

--====== get the school choices that the candidates want to go to ======

if OBJECT_ID('tempdb..#CandidateSchoolChoices') IS NOT NULL DROP TABLE #CandidateSchoolChoices

CREATE TABLE #CandidateSchoolChoices (CandidateID int, ChoiceNbr tinyint, SchoolChoice char(6), PRIMARY KEY (CandidateID, ChoiceNbr))

insert into #CandidateSchoolChoices

select CandidateID, dt.ChoiceNbr, right('000000' + convert(varchar(6), ABS(CHECKSUM(NEWID())) %1000000+ 1 ), 6)

from #Candidates

CROSS APPLY (select top 5 ChoiceNbr = ROW_NUMBER() OVER (ORDER BY OBJECT_ID) from sys.objects) dt

declare @VoterCount tinyint, @Vote tinyint

select @VoterCount = COUNT(*) from #BrdMbrs

--====== get the three most popular school choices, and "assign" a board member to that school ======

-- in reality, the board member is from a school, and this code won't be performed.

;with CTE AS

(

select top (@VoterCount) SchoolChoice, COUNT(*) qty

from #CandidateSchoolChoices

group by SchoolChoice

order by qty desc

)

, CTE2 AS

(

select SchoolChoice, VoterID = ROW_NUMBER() OVER (ORDER BY (select 1))

from CTE

)

update b

set b.SchoolCd = CTE2.SchoolChoice

from #BrdMbrs b INNER JOIN CTE2 ON b.VoterID = CTE2.VoterID

----====== each candidate will be assigned to 3, and only three lists. ======

-- temp table to see how many candidates are assigned to each voter

if OBJECT_ID('tempdb..#VoterList') IS NOT NULL DROP TABLE #VoterList

CREATE TABLE #VoterList (

CandidateID INT NOT NULL,

BrdMbrNbr INT NOT NULL,

SSN_ID char(9),

Board_DT datetime,

BrdMbrVote tinyint,

PRIMARY KEY CLUSTERED (CandidateID, BrdMbrNbr))

-- (No more cursors!) --

--====== set based method to (almost) evenly assign candidates to voters.

;WITH CTE AS

(

SELECT c.CandidateID,

b.VoterID,

Same_School_Flag = (SELECT COUNT(*)

FROM #CandidateSchoolChoices cs

where cs.CandidateID = c.CandidateID

and cs.SchoolChoice = b.SchoolCd),

--== This does not work because there's nothing in it yet ==

--VoterQty = (SELECT COUNT(*)

-- FROM #VoterList

-- where BrdMbrNbr = b.VoterID)

-- =================================================

VoterQty = (c.CandidateID + b.VoterID) % @VoterCount

--+

from #Candidates c

CROSS APPLY #BrdMbrs b

)

, CTE2 AS

(

SELECT CandidateID,

VoterID,

Same_School_Flag,

VoterQty,

RowNbr = ROW_NUMBER() OVER (PARTITION BY CandidateID ORDER BY Same_School_Flag, VoterQty)

FROM CTE

)

INSERT INTO #VoterList (CandidateID, BrdMbrNbr, Board_DT)

SELECT CTE2.CandidateID, CTE2.VoterID, @BrdDt

FROM CTE2

WHERE CTE2.RowNbr between 1 and 3

select BrdMbrNbr, COUNT(*) from #VoterList group by BrdMbrNbr order by BrdMbrNbr

select * from #VoterList

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

WayneS SSC Guru Points: 95461 More actions · Answer 2

Barry,

Nice... real nice. I like the approximate number calc, which was what I was missing.

I did run into one issue in implementing it... and it's my fault. The data I provided for the candidates gave you a sequentially incrementing number for the CandidateID, which was not valid. This resulted in getting lists with a range of > 60 when running with the real data, which wasn't acceptable.

I corrected this by adding an Identity column to the #Candidates table, and using that column in the VoterQty calc instead of the CandidateID column.

For performance ...

with 652 candidates, and 6 board members:

cursor: 18 secs

set - < 1 sec

Barry, thank you VERY MUCH for looking into this.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

Jeff Moden SSC Guru Points: 1004436 More actions · Answer 3

Good job on the workaround, Wayne.

Heh... Barry... nice job. You deserve the title Mr. "There must be 15 ways to..."

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

RBarryYoung SSC Guru Points: 143329 More actions · Answer 4

Thanks Wayne, Jeff.

I played around with it a little using what I felt was more naturally distributed data (making some schools much more popular than others) and it seems that the variation of differences in how many candidates each reviewer is assigned to with mine is very close to how popular that reviewers school is. In other words, reviewers from very popular schools may get significantly less assignments with my algorithm.

If this turns out to be too large with the real data let us know, there may be slightly more complicated set-based solutions that give a more even result.

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

john.arnott SSChampion Points: 11882 More actions · Answer 5

EDIT: RBarry beat me to the post while I was playing with the code and writing this up. Hope it's still useful.

-----------------------

I'm not sure I fully understand how the approximate number calc is intended to work. It seems that it depends on an even distribution of school choices. If there's one or two highly desirable schools represented by a board member, it looks as though the distribution gets skewed, with the board members from the less highly sought-after schools getting more than their share of review packets and the Harvard and Stanford reps getting fewer. I tried seeding the candidate choice table with such a case by having every tenth candidate's first choice be one school.-- Existing test data build....

insert into #CandidateSchoolChoices

select CandidateID, dt.ChoiceNbr, right('000000' + convert(varchar(6), ABS(CHECKSUM(NEWID())) %1000000+ 1 ), 6)

from #Candidates

CROSS APPLY (select top 5 ChoiceNbr = ROW_NUMBER() OVER (ORDER BY OBJECT_ID) from sys.objects) dt

-- Then override every 10th candidate's first choice to be a choice for 123456, our "Harvard"

update #CandidateSchoolChoices

set Schoolchoice = 123456

where ChoiceNbr = 1

and CandidateID % 10 = 0

----------

My final output then had these assignment counts (I added a join to #BrdMbrs to get the school number in that query):

select v.BrdMbrNbr,b.SchoolCd, COUNT(*)as Qty

from #VoterList v

join #BrdMbrs b on b.VoterID = v.BrdMbrNbr

group by BrdMbrNbr,b.SchoolCd

order by BrdMbrNbr

. . . .

BrdMbrNbr SchoolCd Qty

----------- -------- -----------

1 123456 601

2 623968 721

3 125090 839

4 486669 720

5 361024 719 Have I missed something here?

RBarryYoung SSC Guru Points: 143329 More actions · Answer 6

Yes, that's similar to what I saw.

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

WayneS SSC Guru Points: 95461 More actions · Answer 7

RBarryYoung (8/17/2009)
Thanks Wayne, Jeff.
I played around with it a little using what I felt was more naturally distributed data (making some schools much more popular than others) and it seems that the variation of differences in how many candidates each reviewer is assigned to with mine is very close to how popular that reviewers school is. In other words, reviewers from very popular schools may get significantly less assignments with my algorithm.
If this turns out to be too large with the real data let us know, there may be slightly more complicated set-based solutions that give a more even result.

Well, for the data that we have right now, it's giving us a spread of 10 across 8 board members. Which is well within the acceptable range. What is unknown is how this will change... it's kinda hard to predict what schools scholarship applicants will want to go to... so I can see this popping up again in the future as being a problem that needs to be fixed.

That being said, this is also something that would be nice to wrap up properly the first time around. If something needs changing, around here it could take 2 months for it to move through all of the testing environments to get to production.

If you're willing to help, I'm attaching a script that has about 650 real CandidateID numbers. I also like what John did to give a heavy preference towards one school.

One thing I will say about the cursor based way... the counts per board member were very close together... total spread on the same data is only 3.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

RBarryYoung SSC Guru Points: 143329 More actions · Answer 8

WayneS: I've been trying to improve on this but it's going to take a while. It's a tough problem to improve on where it is, plus I've been pretty doped up on pain meds the last 4 days (back problems) which is making everything fuzzy....

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

WayneS SSC Guru Points: 95461 More actions · Answer 9

Barry, I appreciate you looking into this. Take care of yourself first, and come back to this when you're up to it.

That "tough problem" part intrigues (and scares) me. If there's any area that you can point me in, I'll see what I can do with it.

Thanks!

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2