August 16, 2009 at 2:34 pm
OK, this gets rid of the Cursor and still assigns candidates to board members pretty evenly (+/- 3 out of 720):
/*
RULES:
1. A selection board will convene, consisting of 3 to N # of board members.
2. This selection board will review eligible candidates for scholarships.
3. Each candidate will be assigned to be reviewed by 3, and only 3, of the board members.
4. Each board member's "list" of candidates to review should be evenly distributed
(doesn't have to be exact, but should be close to equal).
and the kicker...
5. The candidate should not be assigned to a board members list if that board member
is one of the top 5 schools that the candidate wants a scholarship to. However, enforcement
of rule #3 takes precedence over this rule... meaning that if there are 4 board members,
and 2 of them are from the top 5 school choices of a candidate,
then the candidate still needs to be assigned to three board member's list.
*/
declare @BrdDt datetime; set @BrdDt = '07/29/2009';
--====== get the board members. Minimum of 3, no maximum (reality says < 10) ======
if OBJECT_ID('tempdb..#BrdMbrs') IS NOT NULL DROP TABLE #BrdMbrs
CREATE TABLE #BrdMbrs (VoterID int PRIMARY KEY CLUSTERED, SSN_ID char(9), Board_DT datetime, SchoolCd char(6))
INSERT INTO #BrdMbrs (VoterID) select top 5 VoterID = ROW_NUMBER() OVER (ORDER BY object_id) from sys.objects
--====== get the candidates that are ready for the board selection process ======
if OBJECT_ID('tempdb..#Candidates') IS NOT NULL DROP TABLE #Candidates
CREATE TABLE #Candidates (CandidateID int PRIMARY KEY CLUSTERED)
INSERT INTO #Candidates select top 1200 CandidateID = ROW_NUMBER() OVER (ORDER BY s1.object_id) from sys.objects s1, sys.objects s2
--====== get the school choices that the candidates want to go to ======
if OBJECT_ID('tempdb..#CandidateSchoolChoices') IS NOT NULL DROP TABLE #CandidateSchoolChoices
CREATE TABLE #CandidateSchoolChoices (CandidateID int, ChoiceNbr tinyint, SchoolChoice char(6), PRIMARY KEY (CandidateID, ChoiceNbr))
insert into #CandidateSchoolChoices
select CandidateID, dt.ChoiceNbr, right('000000' + convert(varchar(6), ABS(CHECKSUM(NEWID())) %1000000+ 1 ), 6)
from #Candidates
CROSS APPLY (select top 5 ChoiceNbr = ROW_NUMBER() OVER (ORDER BY OBJECT_ID) from sys.objects) dt
declare @VoterCount tinyint, @Vote tinyint
select @VoterCount = COUNT(*) from #BrdMbrs
--====== get the three most popular school choices, and "assign" a board member to that school ======
-- in reality, the board member is from a school, and this code won't be performed.
;with CTE AS
(
select top (@VoterCount) SchoolChoice, COUNT(*) qty
from #CandidateSchoolChoices
group by SchoolChoice
order by qty desc
)
, CTE2 AS
(
select SchoolChoice, VoterID = ROW_NUMBER() OVER (ORDER BY (select 1))
from CTE
)
update b
set b.SchoolCd = CTE2.SchoolChoice
from #BrdMbrs b INNER JOIN CTE2 ON b.VoterID = CTE2.VoterID
----====== each candidate will be assigned to 3, and only three lists. ======
-- temp table to see how many candidates are assigned to each voter
if OBJECT_ID('tempdb..#VoterList') IS NOT NULL DROP TABLE #VoterList
CREATE TABLE #VoterList (
CandidateID INT NOT NULL,
BrdMbrNbr INT NOT NULL,
SSN_ID char(9),
Board_DT datetime,
BrdMbrVote tinyint,
PRIMARY KEY CLUSTERED (CandidateID, BrdMbrNbr))
-- (No more cursors!) --
--====== set based method to (almost) evenly assign candidates to voters.
;WITH CTE AS
(
SELECT c.CandidateID,
b.VoterID,
Same_School_Flag = (SELECT COUNT(*)
FROM #CandidateSchoolChoices cs
where cs.CandidateID = c.CandidateID
and cs.SchoolChoice = b.SchoolCd),
--== This does not work because there's nothing in it yet ==
--VoterQty = (SELECT COUNT(*)
-- FROM #VoterList
-- where BrdMbrNbr = b.VoterID)
-- =================================================
VoterQty = (c.CandidateID + b.VoterID) % @VoterCount
--+
from #Candidates c
CROSS APPLY #BrdMbrs b
)
, CTE2 AS
(
SELECT CandidateID,
VoterID,
Same_School_Flag,
VoterQty,
RowNbr = ROW_NUMBER() OVER (PARTITION BY CandidateID ORDER BY Same_School_Flag, VoterQty)
FROM CTE
)
INSERT INTO #VoterList (CandidateID, BrdMbrNbr, Board_DT)
SELECT CTE2.CandidateID, CTE2.VoterID, @BrdDt
FROM CTE2
WHERE CTE2.RowNbr between 1 and 3
select BrdMbrNbr, COUNT(*) from #VoterList group by BrdMbrNbr order by BrdMbrNbr
select * from #VoterList
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
August 17, 2009 at 9:32 am
Barry,
Nice... real nice. I like the approximate number calc, which was what I was missing.
I did run into one issue in implementing it... and it's my fault. The data I provided for the candidates gave you a sequentially incrementing number for the CandidateID, which was not valid. This resulted in getting lists with a range of > 60 when running with the real data, which wasn't acceptable.
I corrected this by adding an Identity column to the #Candidates table, and using that column in the VoterQty calc instead of the CandidateID column.
For performance ...
with 652 candidates, and 6 board members:
cursor: 18 secs
set - < 1 sec
Barry, thank you VERY MUCH for looking into this.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
August 17, 2009 at 9:58 am
Good job on the workaround, Wayne.
Heh... Barry... nice job. You deserve the title Mr. "There must be 15 ways to..."
--Jeff Moden
Change is inevitable... Change for the better is not.
August 17, 2009 at 10:13 am
Thanks Wayne, Jeff.
I played around with it a little using what I felt was more naturally distributed data (making some schools much more popular than others) and it seems that the variation of differences in how many candidates each reviewer is assigned to with mine is very close to how popular that reviewers school is. In other words, reviewers from very popular schools may get significantly less assignments with my algorithm.
If this turns out to be too large with the real data let us know, there may be slightly more complicated set-based solutions that give a more even result.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
August 17, 2009 at 10:49 am
EDIT: RBarry beat me to the post while I was playing with the code and writing this up. Hope it's still useful.
-----------------------
I'm not sure I fully understand how the approximate number calc is intended to work. It seems that it depends on an even distribution of school choices. If there's one or two highly desirable schools represented by a board member, it looks as though the distribution gets skewed, with the board members from the less highly sought-after schools getting more than their share of review packets and the Harvard and Stanford reps getting fewer. I tried seeding the candidate choice table with such a case by having every tenth candidate's first choice be one school.-- Existing test data build....
insert into #CandidateSchoolChoices
select CandidateID, dt.ChoiceNbr, right('000000' + convert(varchar(6), ABS(CHECKSUM(NEWID())) %1000000+ 1 ), 6)
from #Candidates
CROSS APPLY (select top 5 ChoiceNbr = ROW_NUMBER() OVER (ORDER BY OBJECT_ID) from sys.objects) dt
-- Then override every 10th candidate's first choice to be a choice for 123456, our "Harvard"
update #CandidateSchoolChoices
set Schoolchoice = 123456
where ChoiceNbr = 1
and CandidateID % 10 = 0
----------
My final output then had these assignment counts (I added a join to #BrdMbrs to get the school number in that query):
select v.BrdMbrNbr,b.SchoolCd, COUNT(*)as Qty
from #VoterList v
join #BrdMbrs b on b.VoterID = v.BrdMbrNbr
group by BrdMbrNbr,b.SchoolCd
order by BrdMbrNbr
. . . .
BrdMbrNbr SchoolCd Qty
----------- -------- -----------
1 123456 601
2 623968 721
3 125090 839
4 486669 720
5 361024 719 Have I missed something here?
August 17, 2009 at 3:16 pm
Yes, that's similar to what I saw.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
August 17, 2009 at 3:20 pm
RBarryYoung (8/17/2009)
Thanks Wayne, Jeff.I played around with it a little using what I felt was more naturally distributed data (making some schools much more popular than others) and it seems that the variation of differences in how many candidates each reviewer is assigned to with mine is very close to how popular that reviewers school is. In other words, reviewers from very popular schools may get significantly less assignments with my algorithm.
If this turns out to be too large with the real data let us know, there may be slightly more complicated set-based solutions that give a more even result.
Well, for the data that we have right now, it's giving us a spread of 10 across 8 board members. Which is well within the acceptable range. What is unknown is how this will change... it's kinda hard to predict what schools scholarship applicants will want to go to... so I can see this popping up again in the future as being a problem that needs to be fixed.
That being said, this is also something that would be nice to wrap up properly the first time around. If something needs changing, around here it could take 2 months for it to move through all of the testing environments to get to production.
If you're willing to help, I'm attaching a script that has about 650 real CandidateID numbers. I also like what John did to give a heavy preference towards one school.
One thing I will say about the cursor based way... the counts per board member were very close together... total spread on the same data is only 3.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
August 18, 2009 at 1:59 pm
WayneS: I've been trying to improve on this but it's going to take a while. It's a tough problem to improve on where it is, plus I've been pretty doped up on pain meds the last 4 days (back problems) which is making everything fuzzy....
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
August 18, 2009 at 3:31 pm
Barry, I appreciate you looking into this. Take care of yourself first, and come back to this when you're up to it.
That "tough problem" part intrigues (and scares) me. If there's any area that you can point me in, I'll see what I can do with it.
Thanks!
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
Viewing 9 posts - 16 through 23 (of 23 total)
You must be logged in to reply to this topic. Login to reply