pick random (shuffle by dept) employees from employees table and update main table with employee name

  • Hi,
    I have a task which I have been working on but am struggling finding the end result.
    Task Description
    Check each row in Main table
    If column a = ‘x’ then assign one of the 4 ‘x’ employees from employee table picked randomly to that row
    If column a = ‘y’ then get one of the 4 ‘y’ Standards officers from employee table picked randomly to that row

    Employee Table
    Id   Dept               Employee
    1              x              x1
    2              x              x2
    3              x              x3
    4              x              x4
    5              y              y1
    6              y              y2
    7              y              y3
    8              y              y4
    Main table (DATA) contains many columns and I need end column (Employee) to be populated by a Employee based on Dept and shauffled in employee table to randomise also.
    Hope you can help
    Thank you

  • This should get you started. Since you're new here, I created your table and added some data. If you want a tested answer, you should provide this.
    CREATE TABLE Employee (
        ID INT,
        Dept CHAR,
        EmpID CHAR(2)
    );
    GO
    INSERT INTO Employee
    VALUES (1, 'x', 'x1'),
    (2, 'x', 'x2'),
    (3, 'x', 'x3'),
    (4, 'x', 'x4'),
    (5, 'y', 'y1'),
    (6, 'y', 'y2'),
    (7, 'y', 'y3'),
    (8, 'y', 'y4');

    SELECT TOP (4) *
    FROM Employee
    ORDER BY NEWID();

  • At first glance, you could do this with a correlated subquery.  This would pick a random X or Y employee for each DataTable row.

    UPDATE dbo.DataTable SET Employee = (
      SELECT TOP (1) Employee
      FROM dbo.EmployeeTable e
      WHERE e.Dept = DataTable.a
      ORDER BY NEWID() );

    The problem here is, if the Employee table is large, the subquery could get expensive.  You also don't say whether there are more than 4 employees in the X and Y departments to choose from.

    If you want to first select a set of 4 random X employees and a set of 4 random Y employees, then randomly assign from those groups to each DataTable row, it might look like this.  The column "PK" would be the primary key of the Data table.

    WITH xemp AS (
      -- Pick 4 X employees at random
      SELECT TOP (4) Employee
      FROM dbo.EmployeeTable
      WHERE Dept = 'x'
      ORDER BY NEWID() ),
    yemp AS (
      -- Pick 4 Y employees at random
      SELECT TOP (4) Employee
      FROM dbo.EmployeeTable
      WHERE Dept = 'y'
      ORDER BY NEWID() )
    UPDATE d SET Employee = e.Employee
    FROM dbo.DataTable d
    INNER JOIN (
      -- Generate 4 matches for each DataTable row, order them randomly to pick one
      SELECT dt.PK, Employee = ISNULL(x.Employee, y.Employee),
        PickOne = ROW_NUMBER() OVER (PARTITION BY dt.PK ORDER BY (SELECT NEWID()))
      FROM dbo.DataTable dt
      LEFT JOIN xemp ON dt.a = 'x'
      LEFT JOIN yemp ON dt.a = 'y'
    ) e ON d.PK = e.PK AND PickOne = 1;

  • I'm pretty sure that no matter what happens, there's going to be a table or index scan involved and that there will also be a SORT involved.  Borrowing heavily on the test harness that pietlinden provided, we can do it all for any number of departments in a single scan of the table and without having to make helper tables.

    naveed.tuf - Thursday, January 24, 2019 8:54 AM

    Hi,
    I have a task which I have been working on but am struggling finding the end result.
    Task Description
    Check each row in Main table
    If column a = ‘x’ then assign one of the 4 ‘x’ employees from employee table picked randomly to that row
    If column a = ‘y’ then get one of the 4 ‘y’ Standards officers from employee table picked randomly to that row

    Employee Table
    Id   Dept               Employee
    1              x              x1
    2              x              x2
    3              x              x3
    4              x              x4
    5              y              y1
    6              y              y2
    7              y              y3
    8              y              y4
    Main table (DATA) contains many columns and I need end column (Employee) to be populated by a Employee based on Dept and shauffled in employee table to randomise also.
    Hope you can help
    Thank you

    How many times is Dept repeated in the  "Main Table"?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • It occurred to me that instead of randomly reordering the Employees over and over again for each DataTable row, the NTILE function could be used to randomly divide DataTable into four groups in one pass.

    UPDATE d SET Employee = e.Employee
    FROM dbo.DataTable d
    INNER JOIN (
        -- Assign each DataTable row to one of four groups randomly
        SELECT dt.PK, Dept = dt.a,
                n = NTILE(4) OVER (PARTITION BY a ORDER BY (SELECT NEWID()))
        FROM dbo.DataTable dt
        WHERE dt.a IN ('x', 'y') 
    ) dn ON d.PK = dn.PK
    INNER JOIN (
        -- Pick X employees and Y employees at random, only the top 4 of each will match an NTILE value
        SELECT Dept, Employee,
                n = ROW_NUMBER() OVER (PARTITION BY Dept ORDER BY (SELECT NEWID()))
        FROM dbo.EmployeeTable
        WHERE Dept IN ('x', 'y')
    ) e ON dn.Dept = e.Dept AND dn.n = e.n

  • naveed.tuf - Thursday, January 24, 2019 8:54 AM

    Hi,
    I have a task which I have been working on but am struggling finding the end result.
    Task Description
    Check each row in Main table
    If column a = ‘x’ then assign one of the 4 ‘x’ employees from employee table picked randomly to that row
    If column a = ‘y’ then get one of the 4 ‘y’ Standards officers from employee table picked randomly to that row

    Employee Table
    Id   Dept               Employee
    1              x              x1
    2              x              x2
    3              x              x3
    4              x              x4
    5              y              y1
    6              y              y2
    7              y              y3
    8              y              y4
    Main table (DATA) contains many columns and I need end column (Employee) to be populated by a Employee based on Dept and shauffled in employee table to randomise also.
    Hope you can help
    Thank you

    We're operating blind here.  Please answer my previous question.  Is the Dept column in the "Main table (DATA)" unique or not?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hello all,

    Thank you for your responses.

    I have been able to resolve this by adding a random column (random values generated between 1 to 4) to my main table and then concatenated the dept column and random column .

    For anyone that may be looking for a similar solution the TSQL is:

    "update [DATA]
    set rnd = CAST(SUBSTRING(CAST(NEWID() AS binary(16)),1, 3) AS int) % 5
    "

    Appreciate all your responses, i am sure I will be using these at some later stage.

    Thank you again!

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply