January 3, 2008 at 11:03 am
Hi all,
I have a question which I am sure has a simple solution;
I have a table which contains duplicate rows, each duplicate row is only duplicated twice, but not all rows are duplicated. There is only one column which is not duplicated in the duplicated rows and that is the first column which is an identity type primary key field. Strictly speaking since the primary key is an identity field we don't have a duplicate row, but the rest of the data in the row is duplicated in another row, and the users want to eliminate the duplication, seems it is causing a problem with sorting in their application.
I was thinking of using select distinct but am not sure how to handle the primary key issue. As far as I know the primary is not a foregin key in another table. What I want to do is pull everything but the primary key field into a new table. Does this sound reasonable?
Thanks in advance for your help
Ken
January 3, 2008 at 2:58 pm
A real quick and dirty way would be to use a group by on all but the primary key field (min max that if there are just 2 rows max) and filter having a count * > 1.
Something like
delete table
where primary_key in (select min\max(primary_key) from table group by a,b,c having count(*) >1)
Of course you could replace in with exists.
Quick, easy 1 time use kind of thing.
January 3, 2008 at 5:45 pm
Kwitzell,
For those rows that are duplicated, which row would you like to keep... that is, do you want to keep the one with the largest ID or the smallest?
--Jeff Moden
Change is inevitable... Change for the better is not.
January 4, 2008 at 7:52 am
Jeff,
Thanks for your response, the user does not care, I would think the smaller would do. I was told the numbers in the primary key are not referneced in the application. So, really it does not matter.
Ken
January 4, 2008 at 7:53 am
Thank you for your reply, will give it a look.
Ken
January 4, 2008 at 9:13 am
Considering you're throwing all of the fields in - the performance will likely suck pretty badly on this one.
L_voren's solution will get rid of one dupe per group, so you'd have to keep running that until you got 0 records back.
A one-shot way is something more along the lines of (performance notwithstanding)
delete t1
from mytable t1
where
exists (
select *
from mytable t2
where
t1.a=t2.a
and t1.b=t2.b
and etc...
and t1.primarykey>t2.primarykey
)
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
January 4, 2008 at 9:51 am
Thanks Matt,
Your soulution is kind of what I was thinking, I just planned on creating a new table and putting in everything sans dupes and primary key, figured to create a new primary key column and renumber the rows.
Ken
January 4, 2008 at 10:33 am
I'm sorry I must have misunderstood something here. If there are a max of 2 rows per group why would the delete need to be run more than once?
January 4, 2008 at 12:18 pm
I came across the attached document which may help.
David
Best Regards,
~David
January 4, 2008 at 1:03 pm
David thank you for the article, we will have to give it a try,
Ken
January 4, 2008 at 2:22 pm
L_voren (1/4/2008)
I'm sorry I must have misunderstood something here. If there are a max of 2 rows per group why would the delete need to be run more than once?
If there's exactly just one duplicate, then you're right - only one run through would suffice. I'm just never fortunate enough to have only one dupe....:)
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
January 4, 2008 at 3:47 pm
Yes we only have one dup per row. I guess we lucked out this time!
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply