August 11, 2014 at 2:38 pm
Hi, i've been looking at moving one of our processed from excel (+vba) into t-sql to make life easier but am stuck.
We have lots of tasks that are assigned to work groups which we want to distribute evenly across the work groups. This is a simple task for ntile.. However when these tasks are no longer required they are removed which leaves the groups uneven. When new tasks are added we want to still try to keep these groups balanced.
EG Existing groups :
GroupName - Task Count
Group1 - 1000
Group2 - 999
Group3 - 998
If we were to add 6 new tasks they would have more assigned to Group 2 & 3 as they have less than group 1.
Task 1 - Group3
Task 2 - Group3
Task 3 -Group2
Task 4 - Group1
Task 5 - Group2
Task 6 - Group3
Sample tables
Create table GroupTable
(GroupID int, Name varchar(200) )
Insert into GroupTable values (1,'Group1')
Insert into GroupTable values (2,'Group2')
Insert into GroupTable values (3,'Group3')
Create table Jobs(jobid int identity(1,1), name varchar(100),Groupid int)
--Existing tasks
Insert into Jobs(name,Groupid) values ('Task1',1)
Insert into Jobs(name,Groupid) values ('Task2',1)
Insert into Jobs(name,Groupid) values ('Task3',1)
Insert into Jobs(name,Groupid) values ('Task4',1)
Insert into Jobs(name,Groupid) values ('Task5',2)
Insert into Jobs(name,Groupid) values ('Task6',2)
Insert into Jobs(name,Groupid) values ('Task6',2)
Insert into Jobs(name,Groupid) values ('Task7',3)
-- New tasks
Insert into Jobs(name) values ('TaskA')
Insert into Jobs(name) values ('TaskB')
Insert into Jobs(name) values ('TaskC')
Insert into Jobs(name) values ('TaskD')
Insert into Jobs(name) values ('TaskE')
Insert into Jobs(name) values ('TaskF')
This gives us 6 unassigned tasks and a uneven group assignment
GROUPNAME TASK_COUNT
<none> 6
Group1 4
Group2 3
Group3 2
This means the new tasks will be assigned like this
TaskA - Group3
TaskB - Group3
TaskC - Group2
TaskD - Group1
TaskE - Group2
TaskF - Group3
Can anyone help?
Thanks
Dan
August 11, 2014 at 2:45 pm
Are tasks added one at a time, or will you get batches of a lot of tasks in one insert?
Is it a crisis if the groups aren't perfectly balanced?
Just so you know, doing this properly is Hard (in the computer science definition of the word), so it may be necessary to do a 'good enough' solution.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 11, 2014 at 3:05 pm
Hi, The jobs are added in bulk so we may have 10 or 101 added at once.
They don't need to be perfectly allocated across all the groups tbh its probably not possible because were likely to have a uneven number of tasks.
'Good enough' is all we need, we don't need anything perfect and we can look at redistributing regularly as well, though we'd like to avoid this if we can.
Thanks
Dan
August 11, 2014 at 3:26 pm
The only way I've thought about doing is to work out the number of difference between the groups and use a cursor to allocate to these groups before using ntile to split the rest over the remaining groups.
Dan
August 11, 2014 at 6:56 pm
First let's create a function called topn...
CREATE FUNCTION topn(@n int, @x int)
RETURNS TABLE AS
RETURN
(
WITH
E1(n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t(n)),--10
E2(n) AS (SELECT 1 FROM E1 a CROSS JOIN E1 b),--100
iTally(n) AS
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT $))
FROM E2 a CROSS JOIN E2 b
)--10,000
SELECT TOP(@n) x = @x
FROM iTally
);
GO
-- example: this will give me three 5's.
SELECT * FROM topn(3,5)
Now lets build a temp table to retrieve missing values. Note that a temp table is not not required, I am doing it this way so that you understand my solution.
DECLARE @rows int = (SELECT count(*) FROM jobs),
@tiles int = (SELECT count(*) FROM groupTable);
WITH
tinyTally AS
(
SELECT TOP (100) n = row_number() over (order by (select null))
FROM (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))t(n)
),
calculate_tiles AS
(
SELECT n,
tile = (@rows/@tiles) +
case
when n <= (@rows%@tiles) then 1 else 0
end
FROM tinyTally
WHERE n <= @tiles
),
gm AS
(
SELECT Groupid, members = count(*)
FROM Jobs
GROUP by Groupid
)
SELECTGroupid = gm.Groupid, -- these are the guys that don't have a groupid
required_tiles = tile,
actual_tiles = members,
diff = tile-members
INTO #gm
FROM gm
LEFT JOIN calculate_tiles ct ON Groupid=n;
-- review the results
SELECT * FROM #gm;
GO
Now that we have our missing groupid's in #gm lets feed the missing values to my topn function and re-insert them...
WITH
Required_tiles AS
(
SELECT row_num = ROW_NUMBER() OVER (ORDER BY (SELECT null)), gx = Groupid
FROM #gm
CROSS APPLY topn(Groupid, diff)
WHERE Groupid IS NOT NULL
),
nullGroups AS
(
SELECT rn = ROW_NUMBER() OVER (ORDER BY (SELECT null)), *
FROM Jobs
WHERE Groupid IS NULL
)
UPDATE nullGroups
SET Groupid = gx
FROM Required_tiles
WHERE rn = row_num;
No cursors, no loops. 😎
Edit: code formatting was messed.
-- Itzik Ben-Gan 2001
August 12, 2014 at 2:18 pm
Great that looks v good.
Still trying to understand it all 🙂
Is the CTE Required_tiles wrong?
Should
CROSS APPLY topn(Groupid, diff)
be
CROSS APPLY topn(diff, Groupid)
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply