Delete Duplicates

Question

Post reply

Delete Duplicates

vani_r14

SSCrazy

Points: 2843
More actions
May 8, 2012 at 6:57 am

#98461

HI all
Am trying to work on deleting duplicate rows from a table and am kind of stuck. Could i please get some help.
I have a table with staff and training data
create table temptable (empno varchar(10), name varchar(50), cid varchar(10), grade varchar(10), enddate datetime
Insert into temptable values ('1234', 'apple', 'hr7890', 'Completed', '2004-03-01 00:00:00.000' )
Insert into temptable values ('1234', 'apple', 'rb7820', 'Did not complete', '2004-03-01 00:00:00.000')
Insert into temptable values ('1234', 'apple', 'pt7830', 'Cancelled', '2004-04-05 00:00:00.000')
Insert into temptable values ('1234', 'apple', 'sp7840', 'Completed', '2004-10-30 00:00:00.000')
Insert into temptable values ('1234', 'apple', 'rb7820', 'Completed', '2005-12-01 00:00:00.000')
Insert into temptable values ('2344', 'apple', 'hr7890', 'Completed', '2004-03-01 00:00:00.000')
Insert into temptable values ('2344', 'apple', 'rb7820', 'Completed', '2004-03-01 00:00:00.000')
Insert into temptable values ('2344', 'apple', 'pt7830', 'Completed', '2004-03-01 00:00:00.000')
Insert into temptable values ('2344', 'apple', 'sp7840', 'Completed', '2004-03-01 00:00:00.000')
Currently
In my current query, i pivot the data so the course names are the headings of table with one row of employee 1234 and the appropriate value under each course name. Am using max for the grades to pick up the highest of the grade for each course name
however since 1234 has a completed and did not complete entry for course rb7820 using max results in Did not attend for that course in the final query. Tried using min but it complicates the other data.
I would like to find out alternative way of deleting the Did not Complete row from the temptable so i would only have the one row for the course rb7820 for employee 1234 and that is Completed.
Can someone please suggest any ideas
thanks
vani

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Mark Cowne One Orange Chip Points: 26968 More actions · Answer 1

Try this

with cte as (

select *,

row_number() over(partition by empno,cid order by enddate desc) as rn

from temptable)

delete from cte where rn>1;

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

vani_r14 SSCrazy Points: 2843 More actions · Answer 2

Hi Mark

thank you for your prompt response. However, the way to delete the rows is based on a combination of empid, courseid and grade.

With the CTE query am unable to find this combo.

Am i looking at something incorrectly. please advise

thanks

vani

vani_r14 SSCrazy Points: 2843 More actions · Answer 3

Hi all

Is there a way to check for the existence of a row and then delete another row based on the value of the row checked earlier.

e.g. In the table above there is a row for emp 1234 for a course rb7820 which at first has Did not complete, then complete

I would like to check first if the row exists for that employee with the grade = 'Complete'. If it exists then check to see if there is another row for the same employee with the same course but with the a different grade. If it does then i would like to delete the other entries except Completed. Else they can remain

thanks

vani

Kingston Dhasian SSCoach Points: 19799 More actions · Answer 4

vani_r14 (5/8/2012)
Hi all
Is there a way to check for the existence of a row and then delete another row based on the value of the row checked earlier.
e.g. In the table above there is a row for emp 1234 for a course rb7820 which at first has Did not complete, then complete
I would like to check first if the row exists for that employee with the grade = 'Complete'. If it exists then check to see if there is another row for the same employee with the same course but with the a different grade. If it does then i would like to delete the other entries except Completed. Else they can remain
thanks
vani

One way to do it

DELETET_O

FROMtemptable T_O

WHERET_O.grade != 'Completed'

AND EXISTS( SELECT * FROM temptable TI WHERE TI.empno = T_O.empno AND TI.cid = T_O.cid AND TI.grade = 'Completed' )

Kingston Dhasian

How to post data/code on a forum to get the best help - Jeff Moden
http://www.sqlservercentral.com/articles/Best+Practices/61537/

DBA_Dom SSCrazy Points: 2902 More actions · Answer 5

One of the ways I have removed duplicate rows in the past is by selecting distinct which just gives you one row per duplicated rows and insert that into a temp table or a hold table. Delete all rows that where duplicated and then insert the distinct rows (held in the temp or hold table) back into the table.

vani_r14 SSCrazy Points: 2843 More actions · Answer 6

Hi all

thank you for your help. I think i have it worked out finally. I included a counter column which helped a lot for now

Cheers

vani

Jeff Moden SSC Guru Points: 1004432 More actions · Answer 7

vani_r14 (5/15/2012)
Hi all
thank you for your help. I think i have it worked out finally. I included a counter column which helped a lot for now
Cheers
vani

Two way street here, Vani. If you've come up with a solution, post it so we might learn something different. Thanks.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

vani_r14 SSCrazy Points: 2843 More actions · Answer 8

Hi Jeff

I have always shared my solution and will post this one as well when i have finalised it. 🙂

PS - I found your tally tables very handy in a recent query that i had to design

Thanks

vani

Jeff Moden SSC Guru Points: 1004432 More actions · Answer 9

vani_r14 (5/15/2012)
Hi Jeff
I have always shared my solution and will post this one as well when i have finalised it. 🙂
PS - I found your tally tables very handy in a recent query that i had to design
Thanks
vani

Thanks for the feedback especially on Tally Tables. I appreciate it.

Have you been able to finalize your query yet?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

vani_r14 SSCrazy Points: 2843 More actions · Answer 10

hi all

here is the idea i finally ended up using -

First priorities the grades

1. Completed / Pass

2. All other grades

Original Data

**********

in the database was split into multiple tables with different status

************************************************************************************

Solution i applied that works in this instance : This solution works because i did not use the Dates

************************************************************************************

-- Extract all Course data for all employees who have completed as grade

Select distinct empno, name, cid, grade, empno + cid as "unique_id" into #temptable1

from tablename where grade = 'Completed'

-- Select remaining courses where the unique_id is not already there in the temptable1 created before

Insert into #temptable1

Select distinct empno, name, cid, grade, empno + cid as "unique_id"

from tablename where grade <> 'Completed' and empno + cid not in (Select distinct unique_id from tablename)

-- Select all enrollment data

Insert into #temptable1

Select distinct empno, name, cid, grade, empno + cid as "unique_id"

from tablename where Status <> 'Confirmed' and grade = ''

and empno + cid not in (Select distinct unique_id from tablename)

-- Check for Duplicates in the newly created #temptable1

Select distinct unique_id, count(unique_id)

from #temptable1

Group by unique_id

having count(unique_id) > 1

-- Drop all temp tables created

Drop table #temptable1

Am not sure if this makes any sense, please let me know if i can clarify any thing

thanks

Vani