How to remove dulpicates caused by other columns?

  •  

    Hi everyone,

    Newbie here.

    How to remove duplicates caused by DateFrom and DateTo columns? I have this issue if empId ,dept, IDCode have groups. There should only be one distinct record of empID, Dept and IDCode combination.

    1. If DateTo is not null Get the record with the latest DateTo(within the group) and ignore whatever the value of DateFrom.
    2. If DateTo is null, look into the DateFrom and get the record with the latest DateFrom
    drop table id;

    -- create
    CREATE TABLE ID (
    empId int,
    dept varchar(100),
    IDCode varchar(50),
    DateFrom date,
    DateTo date
    );
    -- insert
    INSERT INTO ID(empId,dept,IDCode,DateFrom,DateTo ) VALUES (0001, 'Sales', 'S', '01/05/2020','03/05/2019');
    INSERT INTO ID(empId,dept,IDCode,DateFrom,DateTo ) VALUES (0001, 'Prod', 'P', null, '09/11/2024');
    INSERT INTO ID(empId,dept,IDCode,DateFrom,DateTo ) VALUES (0001, 'Sales', 'S', '01/05/2000','03/05/2015');
    INSERT INTO ID(empId,dept,IDCode,DateFrom,DateTo ) VALUES (0001, 'Sales', 'S', null,null);
    INSERT INTO ID(empId,dept,IDCode,DateFrom,DateTo ) VALUES (0001, 'Sales', 'S', null, '09/11/2025');
    INSERT INTO ID(empId,dept,IDCode,DateFrom,DateTo ) VALUES (0002, 'Sales', 'S', null,'11/30/2021');
    INSERT INTO ID(empId,dept,IDCode,DateFrom,DateTo ) VALUES (0003, 'Prod', 'P', '03/15/1999',null);
    INSERT INTO ID(empId,dept,IDCode,DateFrom,DateTo ) VALUES (0003, 'Prod', 'P', '01/05/2020',null);
    INSERT INTO ID(empId,dept,IDCode,DateFrom,DateTo ) VALUES (0003, 'Tech', 'T', '01/05/2020','03/05/2023');
    INSERT INTO ID(empId,dept,IDCode,DateFrom,DateTo ) VALUES (0004, 'Finance', 'F', '01/05/2023',null);
    INSERT INTO ID(empId,dept,IDCode,DateFrom,DateTo ) VALUES (0004, 'Finance', 'F', '01/05/2020',null);
    INSERT INTO ID(empId,dept,IDCode,DateFrom,DateTo ) VALUES (0005, 'Finance', 'F', '01/05/2023',null);
    INSERT INTO ID(empId,dept,IDCode,DateFrom,DateTo ) VALUES (0005, 'Finance', 'F', '04/08/2020',null);
    INSERT INTO ID(empId,dept,IDCode,DateFrom,DateTo ) VALUES (0005, 'Finance', 'F', null,null);
    INSERT INTO ID(empId,dept,IDCode,DateFrom,DateTo ) VALUES (0006, 'R and D', 'R', '01/05/2010','03/05/2019');
    INSERT INTO ID(empId,dept,IDCode,DateFrom,DateTo ) VALUES (0007, 'R and D', 'R', '04/15/2000','09/07/2018');
    INSERT INTO ID(empId,dept,IDCode,DateFrom,DateTo ) VALUES (0008, 'Util', 'U', '11/05/2017','03/05/2022');
    INSERT INTO ID(empId,dept,IDCode,DateFrom,DateTo ) VALUES (0009, 'Util', 'U', '08/09/2016',null);
    INSERT INTO ID(empId,dept,IDCode,DateFrom,DateTo ) VALUES (0010, 'Finance', 'F', '01/05/1093','01/05/2025');
    INSERT INTO ID(empId,dept,IDCode,DateFrom,DateTo ) VALUES (0010, 'Finance', 'F', '01/05/2020',null);
    INSERT INTO ID(empId,dept,IDCode,DateFrom,DateTo ) VALUES (0011, 'Finance', 'F', null,null);
    INSERT INTO ID(empId,dept,IDCode,DateFrom,DateTo ) VALUES (0012, 'Finance', 'F', '08/15/2020',null);
    INSERT INTO ID(empId,dept,IDCode,DateFrom,DateTo ) VALUES (0012, 'Finance', 'F', null,null);

    If the filter is applied, below should be the result

    sqlResult

    Thanks for the help. Insert script is attached.

     

     

  • ;WITH CTE AS 
    (
    SELECT empId,
    dept,
    IDCode,
    DateFrom,
    DateTo,
    ROW_NUMBER() OVER (PARTITION BY empId, dept, IDCode ORDER BY DateTo DESC, DateFrom DESC) AS RowNum
    FROM ID
    )
    SELECT empId,
    dept,
    IDCode,
    DateFrom,
    DateTo
    FROM CTE
    WHERE RowNum = 1
    ORDER BY 1, 2, 3
    ;
  • Thank you Sir! It worked like magic. Im now familiar with the use of 'with, partition, row' because of your reply

  • majime wrote:

    Thank you Sir! It worked like magic. Im now familiar with the use of 'with, partition, row' because of your reply

    and well done to you for posting your first question with good DDL and DML.

  • This was removed by the editor as SPAM

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply