Distribute data into groups based on existing numbers

Question

Post reply

Distribute data into groups based on existing numbers

daniel gibbs

SSC Journeyman

Points: 82
More actions
August 11, 2014 at 2:38 pm

#291888

Hi, i've been looking at moving one of our processed from excel (+vba) into t-sql to make life easier but am stuck.
We have lots of tasks that are assigned to work groups which we want to distribute evenly across the work groups. This is a simple task for ntile.. However when these tasks are no longer required they are removed which leaves the groups uneven. When new tasks are added we want to still try to keep these groups balanced.
EG Existing groups :
GroupName - Task Count
Group1 - 1000
Group2 - 999
Group3 - 998
If we were to add 6 new tasks they would have more assigned to Group 2 & 3 as they have less than group 1.
Task 1 - Group3
Task 2 - Group3
Task 3 -Group2
Task 4 - Group1
Task 5 - Group2
Task 6 - Group3
Sample tables
Create table GroupTable
(GroupID int, Name varchar(200) )
Insert into GroupTable values (1,'Group1')
Insert into GroupTable values (2,'Group2')
Insert into GroupTable values (3,'Group3')
Create table Jobs(jobid int identity(1,1), name varchar(100),Groupid int)
--Existing tasks
Insert into Jobs(name,Groupid) values ('Task1',1)
Insert into Jobs(name,Groupid) values ('Task2',1)
Insert into Jobs(name,Groupid) values ('Task3',1)
Insert into Jobs(name,Groupid) values ('Task4',1)
Insert into Jobs(name,Groupid) values ('Task5',2)
Insert into Jobs(name,Groupid) values ('Task6',2)
Insert into Jobs(name,Groupid) values ('Task6',2)
Insert into Jobs(name,Groupid) values ('Task7',3)
-- New tasks
Insert into Jobs(name) values ('TaskA')
Insert into Jobs(name) values ('TaskB')
Insert into Jobs(name) values ('TaskC')
Insert into Jobs(name) values ('TaskD')
Insert into Jobs(name) values ('TaskE')
Insert into Jobs(name) values ('TaskF')
This gives us 6 unassigned tasks and a uneven group assignment
GROUPNAME TASK_COUNT
<none> 6
Group1 4
Group2 3
Group3 2
This means the new tasks will be assigned like this
TaskA - Group3
TaskB - Group3
TaskC - Group2
TaskD - Group1
TaskE - Group2
TaskF - Group3
Can anyone help?
Thanks
Dan

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 1

Are tasks added one at a time, or will you get batches of a lot of tasks in one insert?

Is it a crisis if the groups aren't perfectly balanced?

Just so you know, doing this properly is Hard (in the computer science definition of the word), so it may be necessary to do a 'good enough' solution.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

daniel gibbs SSC Journeyman Points: 82 More actions · Answer 2

Hi, The jobs are added in bulk so we may have 10 or 101 added at once.

They don't need to be perfectly allocated across all the groups tbh its probably not possible because were likely to have a uneven number of tasks.

'Good enough' is all we need, we don't need anything perfect and we can look at redistributing regularly as well, though we'd like to avoid this if we can.

Thanks

Dan

daniel gibbs SSC Journeyman Points: 82 More actions · Answer 3

The only way I've thought about doing is to work out the number of difference between the groups and use a cursor to allocate to these groups before using ntile to split the rest over the remaining groups.

Dan

Alan Burstein SSC Guru Points: 61141 More actions · Answer 4

First let's create a function called topn...

CREATE FUNCTION topn(@n int, @x int)

RETURNS TABLE AS

RETURN

(

WITH

E1(n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t(n)),--10

E2(n) AS (SELECT 1 FROM E1 a CROSS JOIN E1 b),--100

iTally(n) AS

(

SELECT ROW_NUMBER() OVER (ORDER BY (SELECT $))

FROM E2 a CROSS JOIN E2 b

)--10,000

SELECT TOP(@n) x = @x

FROM iTally

);

GO

-- example: this will give me three 5's.

SELECT * FROM topn(3,5)

Now lets build a temp table to retrieve missing values. Note that a temp table is not not required, I am doing it this way so that you understand my solution.

DECLARE @rows int = (SELECT count(*) FROM jobs),

@tiles int = (SELECT count(*) FROM groupTable);

WITH

tinyTally AS

(

SELECT TOP (100) n = row_number() over (order by (select null))

FROM (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))t(n)

),

calculate_tiles AS

(

SELECT n,

tile = (@rows/@tiles) +

case

when n <= (@rows%@tiles) then 1 else 0

end

FROM tinyTally

WHERE n <= @tiles

),

gm AS

(

SELECT Groupid, members = count(*)

FROM Jobs

GROUP by Groupid

)

SELECTGroupid = gm.Groupid, -- these are the guys that don't have a groupid

required_tiles = tile,

actual_tiles = members,

diff = tile-members

INTO #gm

FROM gm

LEFT JOIN calculate_tiles ct ON Groupid=n;

-- review the results

SELECT * FROM #gm;

GO

Now that we have our missing groupid's in #gm lets feed the missing values to my topn function and re-insert them...

WITH

Required_tiles AS

(

SELECT row_num = ROW_NUMBER() OVER (ORDER BY (SELECT null)), gx = Groupid

FROM #gm

CROSS APPLY topn(Groupid, diff)

WHERE Groupid IS NOT NULL

),

nullGroups AS

(

SELECT rn = ROW_NUMBER() OVER (ORDER BY (SELECT null)), *

FROM Jobs

WHERE Groupid IS NULL

)

UPDATE nullGroups

SET Groupid = gx

FROM Required_tiles

WHERE rn = row_num;

No cursors, no loops. 😎

Edit: code formatting was messed.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

daniel gibbs SSC Journeyman Points: 82 More actions · Answer 5

Great that looks v good.

Still trying to understand it all 🙂

Is the CTE Required_tiles wrong?

Should

CROSS APPLY topn(Groupid, diff)

be

CROSS APPLY topn(diff, Groupid)