Duplicate Employee Rows

  • I am trying to structure a stored procedure to do the following:

    I have an employee table and an employment table. Employees are linked to employments by employeeId from employee table. Employees can have more than one employment.

    The problem is that we have duplicate employee records within Employee table.

    I need to write a script that will determine which employee record has the highest Employee Number. For each of the other duplicate records change the employments to reflect the employeeId of the record with the highest Employee Number. Then remove the employee record(s) with the lower Employee Number(s).

    Can anyone suggest a good way of doing this.

  • Can you show some example in each table?

    Employee table has duplicate records, what is the difference between employeeid and Employee Number?

    Employee and Employment is linked by employeeid. Does one employeeid have multiple Employment records?

  • inside a cursor (I must note this is the general idea a cursor is not always the best way to go) Call something like ---->

    --OUTSIDE CURSOR

    CREATE TABLE #Temp1

    (MaxID INT

    ,MinID INT

    ,LastName VARCHAR (50)

    ,FirstName VARCHAR (50))

    DECLARE @a VARCHAR (50) --(OR Whatever Type)

    ,@B VARCHAR (50) --(OR Whatever Type)

    --INSDIE CURSOR

    DECLARE CURSOR...

    OPEN...

    INTO @a, @b-2

    IF ...

    BEGIN

    INSERT INTO#Temp1...

    SELECT MAX(EMPLOYEEID), MIN(EMPLOYEEID), @a, @b-2

    FROM EMPLOYEE

    WHERE LastName = @a

    AND FirstName = @b-2

    INTO @a, @B...

    ...

    END

    CLOSE...

    This will help you compile a list.

    I should note that the name field can be replaced with any identifier that should be the same between the two.

    I should also note that many people have the same name so any unique identifer two all the records in the result set should be used.

    Also using the same concept as laid out above you should be able to change it to suit your needs. If you have more then 2 rows per employee then this will have to be altered or at least ran again after you remove the rows until all row have been found <--- That may take some time all depending.

    then when done with the clean up create a unique constraint on the unique identifier field --> not the table identity column <-- that would not work. That should eliminate the need for a stored procedure.

    A great resource for this is Books online <--- Comes with SQL Server 2000

  • Non _ cursor method....(If I made the wrong assumption on which is the duplicate employee_id (highest or lowest), change max to min and > to <)

    select min(e1.employee_id) 'employee_id', e1.employee_fname 'employee_fname',

    e1.employee_lname 'employee_lname', e2.employee_id 'Duplicate_employee_id',

    e2.employee_fname 'Duplicate_employee_fName', e2.employee_lname 'Duplicate employee_lname'

    from employee e1

    join employee e2 on e1.employee_id < e2.employee_id and e1.employee_fname = e2.employee_fname

    and e1.employee_lname = e2.employee_lname

    group by e1.employee_fname, e1. employee_lname, e2.employee_id, e2.employee_fname,

    e2.employee_lname

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply