Delete Duplicates

  • HI all

    Am trying to work on deleting duplicate rows from a table and am kind of stuck. Could i please get some help.

    I have a table with staff and training data

    create table temptable (empno varchar(10), name varchar(50), cid varchar(10), grade varchar(10), enddate datetime

    Insert into temptable values ('1234', 'apple', 'hr7890', 'Completed', '2004-03-01 00:00:00.000' )

    Insert into temptable values ('1234', 'apple', 'rb7820', 'Did not complete', '2004-03-01 00:00:00.000')

    Insert into temptable values ('1234', 'apple', 'pt7830', 'Cancelled', '2004-04-05 00:00:00.000')

    Insert into temptable values ('1234', 'apple', 'sp7840', 'Completed', '2004-10-30 00:00:00.000')

    Insert into temptable values ('1234', 'apple', 'rb7820', 'Completed', '2005-12-01 00:00:00.000')

    Insert into temptable values ('2344', 'apple', 'hr7890', 'Completed', '2004-03-01 00:00:00.000')

    Insert into temptable values ('2344', 'apple', 'rb7820', 'Completed', '2004-03-01 00:00:00.000')

    Insert into temptable values ('2344', 'apple', 'pt7830', 'Completed', '2004-03-01 00:00:00.000')

    Insert into temptable values ('2344', 'apple', 'sp7840', 'Completed', '2004-03-01 00:00:00.000')

    Currently

    In my current query, i pivot the data so the course names are the headings of table with one row of employee 1234 and the appropriate value under each course name. Am using max for the grades to pick up the highest of the grade for each course name

    however since 1234 has a completed and did not complete entry for course rb7820 using max results in Did not attend for that course in the final query. Tried using min but it complicates the other data.

    I would like to find out alternative way of deleting the Did not Complete row from the temptable so i would only have the one row for the course rb7820 for employee 1234 and that is Completed.

    Can someone please suggest any ideas

    thanks

    vani

  • Try this

    with cte as (

    select *,

    row_number() over(partition by empno,cid order by enddate desc) as rn

    from temptable)

    delete from cte where rn>1;

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Hi Mark

    thank you for your prompt response. However, the way to delete the rows is based on a combination of empid, courseid and grade.

    With the CTE query am unable to find this combo.

    Am i looking at something incorrectly. please advise

    thanks

    vani

  • Hi all

    Is there a way to check for the existence of a row and then delete another row based on the value of the row checked earlier.

    e.g. In the table above there is a row for emp 1234 for a course rb7820 which at first has Did not complete, then complete

    I would like to check first if the row exists for that employee with the grade = 'Complete'. If it exists then check to see if there is another row for the same employee with the same course but with the a different grade. If it does then i would like to delete the other entries except Completed. Else they can remain

    thanks

    vani

  • vani_r14 (5/8/2012)


    Hi all

    Is there a way to check for the existence of a row and then delete another row based on the value of the row checked earlier.

    e.g. In the table above there is a row for emp 1234 for a course rb7820 which at first has Did not complete, then complete

    I would like to check first if the row exists for that employee with the grade = 'Complete'. If it exists then check to see if there is another row for the same employee with the same course but with the a different grade. If it does then i would like to delete the other entries except Completed. Else they can remain

    thanks

    vani

    One way to do it

    DELETET_O

    FROMtemptable T_O

    WHERET_O.grade != 'Completed'

    AND EXISTS( SELECT * FROM temptable TI WHERE TI.empno = T_O.empno AND TI.cid = T_O.cid AND TI.grade = 'Completed' )


    Kingston Dhasian

    How to post data/code on a forum to get the best help - Jeff Moden
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • One of the ways I have removed duplicate rows in the past is by selecting distinct which just gives you one row per duplicated rows and insert that into a temp table or a hold table. Delete all rows that where duplicated and then insert the distinct rows (held in the temp or hold table) back into the table.

  • Hi all

    thank you for your help. I think i have it worked out finally. I included a counter column which helped a lot for now

    Cheers

    vani

  • vani_r14 (5/15/2012)


    Hi all

    thank you for your help. I think i have it worked out finally. I included a counter column which helped a lot for now

    Cheers

    vani

    Two way street here, Vani. If you've come up with a solution, post it so we might learn something different. Thanks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff

    I have always shared my solution and will post this one as well when i have finalised it. 🙂

    PS - I found your tally tables very handy in a recent query that i had to design

    Thanks

    vani

  • vani_r14 (5/15/2012)


    Hi Jeff

    I have always shared my solution and will post this one as well when i have finalised it. 🙂

    PS - I found your tally tables very handy in a recent query that i had to design

    Thanks

    vani

    Thanks for the feedback especially on Tally Tables. I appreciate it.

    Have you been able to finalize your query yet?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • hi all

    here is the idea i finally ended up using -

    First priorities the grades

    1. Completed / Pass

    2. All other grades

    Original Data

    **********

    in the database was split into multiple tables with different status

    ************************************************************************************

    Solution i applied that works in this instance : This solution works because i did not use the Dates

    ************************************************************************************

    -- Extract all Course data for all employees who have completed as grade

    Select distinct empno, name, cid, grade, empno + cid as "unique_id" into #temptable1

    from tablename where grade = 'Completed'

    -- Select remaining courses where the unique_id is not already there in the temptable1 created before

    Insert into #temptable1

    Select distinct empno, name, cid, grade, empno + cid as "unique_id"

    from tablename where grade <> 'Completed' and empno + cid not in (Select distinct unique_id from tablename)

    -- Select all enrollment data

    Insert into #temptable1

    Select distinct empno, name, cid, grade, empno + cid as "unique_id"

    from tablename where Status <> 'Confirmed' and grade = ''

    and empno + cid not in (Select distinct unique_id from tablename)

    -- Check for Duplicates in the newly created #temptable1

    Select distinct unique_id, count(unique_id)

    from #temptable1

    Group by unique_id

    having count(unique_id) > 1

    -- Drop all temp tables created

    Drop table #temptable1

    Am not sure if this makes any sense, please let me know if i can clarify any thing

    thanks

    Vani

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply