Distribute data into groups based on existing numbers

  • Hi, i've been looking at moving one of our processed from excel (+vba) into t-sql to make life easier but am stuck.

    We have lots of tasks that are assigned to work groups which we want to distribute evenly across the work groups. This is a simple task for ntile.. However when these tasks are no longer required they are removed which leaves the groups uneven. When new tasks are added we want to still try to keep these groups balanced.

    EG Existing groups :

    GroupName - Task Count

    Group1 - 1000

    Group2 - 999

    Group3 - 998

    If we were to add 6 new tasks they would have more assigned to Group 2 & 3 as they have less than group 1.

    Task 1 - Group3

    Task 2 - Group3

    Task 3 -Group2

    Task 4 - Group1

    Task 5 - Group2

    Task 6 - Group3

    Sample tables

    Create table GroupTable

    (GroupID int, Name varchar(200) )

    Insert into GroupTable values (1,'Group1')

    Insert into GroupTable values (2,'Group2')

    Insert into GroupTable values (3,'Group3')

    Create table Jobs(jobid int identity(1,1), name varchar(100),Groupid int)

    --Existing tasks

    Insert into Jobs(name,Groupid) values ('Task1',1)

    Insert into Jobs(name,Groupid) values ('Task2',1)

    Insert into Jobs(name,Groupid) values ('Task3',1)

    Insert into Jobs(name,Groupid) values ('Task4',1)

    Insert into Jobs(name,Groupid) values ('Task5',2)

    Insert into Jobs(name,Groupid) values ('Task6',2)

    Insert into Jobs(name,Groupid) values ('Task6',2)

    Insert into Jobs(name,Groupid) values ('Task7',3)

    -- New tasks

    Insert into Jobs(name) values ('TaskA')

    Insert into Jobs(name) values ('TaskB')

    Insert into Jobs(name) values ('TaskC')

    Insert into Jobs(name) values ('TaskD')

    Insert into Jobs(name) values ('TaskE')

    Insert into Jobs(name) values ('TaskF')

    This gives us 6 unassigned tasks and a uneven group assignment

    GROUPNAME TASK_COUNT

    <none> 6

    Group1 4

    Group2 3

    Group3 2

    This means the new tasks will be assigned like this

    TaskA - Group3

    TaskB - Group3

    TaskC - Group2

    TaskD - Group1

    TaskE - Group2

    TaskF - Group3

    Can anyone help?

    Thanks

    Dan

  • Are tasks added one at a time, or will you get batches of a lot of tasks in one insert?

    Is it a crisis if the groups aren't perfectly balanced?

    Just so you know, doing this properly is Hard (in the computer science definition of the word), so it may be necessary to do a 'good enough' solution.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hi, The jobs are added in bulk so we may have 10 or 101 added at once.

    They don't need to be perfectly allocated across all the groups tbh its probably not possible because were likely to have a uneven number of tasks.

    'Good enough' is all we need, we don't need anything perfect and we can look at redistributing regularly as well, though we'd like to avoid this if we can.

    Thanks

    Dan

  • The only way I've thought about doing is to work out the number of difference between the groups and use a cursor to allocate to these groups before using ntile to split the rest over the remaining groups.

    Dan

  • First let's create a function called topn...

    CREATE FUNCTION topn(@n int, @x int)

    RETURNS TABLE AS

    RETURN

    (

    WITH

    E1(n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t(n)),--10

    E2(n) AS (SELECT 1 FROM E1 a CROSS JOIN E1 b),--100

    iTally(n) AS

    (

    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT $))

    FROM E2 a CROSS JOIN E2 b

    )--10,000

    SELECT TOP(@n) x = @x

    FROM iTally

    );

    GO

    -- example: this will give me three 5's.

    SELECT * FROM topn(3,5)

    Now lets build a temp table to retrieve missing values. Note that a temp table is not not required, I am doing it this way so that you understand my solution.

    DECLARE @rows int = (SELECT count(*) FROM jobs),

    @tiles int = (SELECT count(*) FROM groupTable);

    WITH

    tinyTally AS

    (

    SELECT TOP (100) n = row_number() over (order by (select null))

    FROM (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))t(n)

    ),

    calculate_tiles AS

    (

    SELECT n,

    tile = (@rows/@tiles) +

    case

    when n <= (@rows%@tiles) then 1 else 0

    end

    FROM tinyTally

    WHERE n <= @tiles

    ),

    gm AS

    (

    SELECT Groupid, members = count(*)

    FROM Jobs

    GROUP by Groupid

    )

    SELECTGroupid = gm.Groupid, -- these are the guys that don't have a groupid

    required_tiles = tile,

    actual_tiles = members,

    diff = tile-members

    INTO #gm

    FROM gm

    LEFT JOIN calculate_tiles ct ON Groupid=n;

    -- review the results

    SELECT * FROM #gm;

    GO

    Now that we have our missing groupid's in #gm lets feed the missing values to my topn function and re-insert them...

    WITH

    Required_tiles AS

    (

    SELECT row_num = ROW_NUMBER() OVER (ORDER BY (SELECT null)), gx = Groupid

    FROM #gm

    CROSS APPLY topn(Groupid, diff)

    WHERE Groupid IS NOT NULL

    ),

    nullGroups AS

    (

    SELECT rn = ROW_NUMBER() OVER (ORDER BY (SELECT null)), *

    FROM Jobs

    WHERE Groupid IS NULL

    )

    UPDATE nullGroups

    SET Groupid = gx

    FROM Required_tiles

    WHERE rn = row_num;

    No cursors, no loops. 😎

    Edit: code formatting was messed.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Great that looks v good.

    Still trying to understand it all 🙂

    Is the CTE Required_tiles wrong?

    Should

    CROSS APPLY topn(Groupid, diff)

    be

    CROSS APPLY topn(diff, Groupid)

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply