May 8, 2012 at 6:57 am
HI all
Am trying to work on deleting duplicate rows from a table and am kind of stuck. Could i please get some help.
I have a table with staff and training data
create table temptable (empno varchar(10), name varchar(50), cid varchar(10), grade varchar(10), enddate datetime
Insert into temptable values ('1234', 'apple', 'hr7890', 'Completed', '2004-03-01 00:00:00.000' )
Insert into temptable values ('1234', 'apple', 'rb7820', 'Did not complete', '2004-03-01 00:00:00.000')
Insert into temptable values ('1234', 'apple', 'pt7830', 'Cancelled', '2004-04-05 00:00:00.000')
Insert into temptable values ('1234', 'apple', 'sp7840', 'Completed', '2004-10-30 00:00:00.000')
Insert into temptable values ('1234', 'apple', 'rb7820', 'Completed', '2005-12-01 00:00:00.000')
Insert into temptable values ('2344', 'apple', 'hr7890', 'Completed', '2004-03-01 00:00:00.000')
Insert into temptable values ('2344', 'apple', 'rb7820', 'Completed', '2004-03-01 00:00:00.000')
Insert into temptable values ('2344', 'apple', 'pt7830', 'Completed', '2004-03-01 00:00:00.000')
Insert into temptable values ('2344', 'apple', 'sp7840', 'Completed', '2004-03-01 00:00:00.000')
Currently
In my current query, i pivot the data so the course names are the headings of table with one row of employee 1234 and the appropriate value under each course name. Am using max for the grades to pick up the highest of the grade for each course name
however since 1234 has a completed and did not complete entry for course rb7820 using max results in Did not attend for that course in the final query. Tried using min but it complicates the other data.
I would like to find out alternative way of deleting the Did not Complete row from the temptable so i would only have the one row for the course rb7820 for employee 1234 and that is Completed.
Can someone please suggest any ideas
thanks
vani
May 8, 2012 at 7:08 am
Try this
with cte as (
select *,
row_number() over(partition by empno,cid order by enddate desc) as rn
from temptable)
delete from cte where rn>1;
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537May 8, 2012 at 5:27 pm
Hi Mark
thank you for your prompt response. However, the way to delete the rows is based on a combination of empid, courseid and grade.
With the CTE query am unable to find this combo.
Am i looking at something incorrectly. please advise
thanks
vani
May 8, 2012 at 11:30 pm
Hi all
Is there a way to check for the existence of a row and then delete another row based on the value of the row checked earlier.
e.g. In the table above there is a row for emp 1234 for a course rb7820 which at first has Did not complete, then complete
I would like to check first if the row exists for that employee with the grade = 'Complete'. If it exists then check to see if there is another row for the same employee with the same course but with the a different grade. If it does then i would like to delete the other entries except Completed. Else they can remain
thanks
vani
May 10, 2012 at 4:09 am
vani_r14 (5/8/2012)
Hi allIs there a way to check for the existence of a row and then delete another row based on the value of the row checked earlier.
e.g. In the table above there is a row for emp 1234 for a course rb7820 which at first has Did not complete, then complete
I would like to check first if the row exists for that employee with the grade = 'Complete'. If it exists then check to see if there is another row for the same employee with the same course but with the a different grade. If it does then i would like to delete the other entries except Completed. Else they can remain
thanks
vani
One way to do it
DELETET_O
FROMtemptable T_O
WHERET_O.grade != 'Completed'
AND EXISTS( SELECT * FROM temptable TI WHERE TI.empno = T_O.empno AND TI.cid = T_O.cid AND TI.grade = 'Completed' )
How to post data/code on a forum to get the best help - Jeff Moden
http://www.sqlservercentral.com/articles/Best+Practices/61537/
May 15, 2012 at 2:13 pm
One of the ways I have removed duplicate rows in the past is by selecting distinct which just gives you one row per duplicated rows and insert that into a temp table or a hold table. Delete all rows that where duplicated and then insert the distinct rows (held in the temp or hold table) back into the table.
May 15, 2012 at 4:24 pm
Hi all
thank you for your help. I think i have it worked out finally. I included a counter column which helped a lot for now
Cheers
vani
May 15, 2012 at 6:44 pm
vani_r14 (5/15/2012)
Hi allthank you for your help. I think i have it worked out finally. I included a counter column which helped a lot for now
Cheers
vani
Two way street here, Vani. If you've come up with a solution, post it so we might learn something different. Thanks.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 15, 2012 at 7:11 pm
Hi Jeff
I have always shared my solution and will post this one as well when i have finalised it. 🙂
PS - I found your tally tables very handy in a recent query that i had to design
Thanks
vani
May 25, 2012 at 5:41 pm
vani_r14 (5/15/2012)
Hi JeffI have always shared my solution and will post this one as well when i have finalised it. 🙂
PS - I found your tally tables very handy in a recent query that i had to design
Thanks
vani
Thanks for the feedback especially on Tally Tables. I appreciate it.
Have you been able to finalize your query yet?
--Jeff Moden
Change is inevitable... Change for the better is not.
May 28, 2012 at 4:12 pm
hi all
here is the idea i finally ended up using -
First priorities the grades
1. Completed / Pass
2. All other grades
Original Data
**********
in the database was split into multiple tables with different status
************************************************************************************
Solution i applied that works in this instance : This solution works because i did not use the Dates
************************************************************************************
-- Extract all Course data for all employees who have completed as grade
Select distinct empno, name, cid, grade, empno + cid as "unique_id" into #temptable1
from tablename where grade = 'Completed'
-- Select remaining courses where the unique_id is not already there in the temptable1 created before
Insert into #temptable1
Select distinct empno, name, cid, grade, empno + cid as "unique_id"
from tablename where grade <> 'Completed' and empno + cid not in (Select distinct unique_id from tablename)
-- Select all enrollment data
Insert into #temptable1
Select distinct empno, name, cid, grade, empno + cid as "unique_id"
from tablename where Status <> 'Confirmed' and grade = ''
and empno + cid not in (Select distinct unique_id from tablename)
-- Check for Duplicates in the newly created #temptable1
Select distinct unique_id, count(unique_id)
from #temptable1
Group by unique_id
having count(unique_id) > 1
-- Drop all temp tables created
Drop table #temptable1
Am not sure if this makes any sense, please let me know if i can clarify any thing
thanks
Vani
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply