Eliminating duplicate rows

Question

Eliminating duplicate rows

kwitzell

SSCrazy

Points: 2470
More actions
January 3, 2008 at 11:03 am

#177889

Hi all,
I have a question which I am sure has a simple solution;
I have a table which contains duplicate rows, each duplicate row is only duplicated twice, but not all rows are duplicated. There is only one column which is not duplicated in the duplicated rows and that is the first column which is an identity type primary key field. Strictly speaking since the primary key is an identity field we don't have a duplicate row, but the rest of the data in the row is duplicated in another row, and the users want to eliminate the duplication, seems it is causing a problem with sorting in their application.
I was thinking of using select distinct but am not sure how to handle the primary key issue. As far as I know the primary is not a foregin key in another table. What I want to do is pull everything but the primary key field into a new table. Does this sound reasonable?
Thanks in advance for your help
Ken

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

L_voren SSC Rookie Points: 28 More actions · Answer 1

A real quick and dirty way would be to use a group by on all but the primary key field (min max that if there are just 2 rows max) and filter having a count * > 1.

Something like

delete table

where primary_key in (select min\max(primary_key) from table group by a,b,c having count(*) >1)

Of course you could replace in with exists.

Quick, easy 1 time use kind of thing.

Jeff Moden SSC Guru Points: 1004340 More actions · Answer 2

Kwitzell,

For those rows that are duplicated, which row would you like to keep... that is, do you want to keep the one with the largest ID or the smallest?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

kwitzell SSCrazy Points: 2470 More actions · Answer 3

Jeff,

Thanks for your response, the user does not care, I would think the smaller would do. I was told the numbers in the primary key are not referneced in the application. So, really it does not matter.

Ken

kwitzell SSCrazy Points: 2470 More actions · Answer 4

Thank you for your reply, will give it a look.

Ken

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 5

Considering you're throwing all of the fields in - the performance will likely suck pretty badly on this one.

L_voren's solution will get rid of one dupe per group, so you'd have to keep running that until you got 0 records back.

A one-shot way is something more along the lines of (performance notwithstanding)

delete t1

from mytable t1

where

exists (

select *

from mytable t2

where

t1.a=t2.a

and t1.b=t2.b

and etc...

and t1.primarykey>t2.primarykey

)

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

kwitzell SSCrazy Points: 2470 More actions · Answer 6

Thanks Matt,

Your soulution is kind of what I was thinking, I just planned on creating a new table and putting in everything sans dupes and primary key, figured to create a new primary key column and renumber the rows.

Ken

L_voren SSC Rookie Points: 28 More actions · Answer 7

I'm sorry I must have misunderstood something here. If there are a max of 2 rows per group why would the delete need to be run more than once?

David Kranes SSCrazy Points: 2523 More actions · Answer 8

I came across the attached document which may help.

David

Best Regards,

~David

kwitzell SSCrazy Points: 2470 More actions · Answer 9

David thank you for the article, we will have to give it a try,

Ken

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 10

L_voren (1/4/2008)
I'm sorry I must have misunderstood something here. If there are a max of 2 rows per group why would the delete need to be run more than once?

If there's exactly just one duplicate, then you're right - only one run through would suffice. I'm just never fortunate enough to have only one dupe....:)

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

kwitzell SSCrazy Points: 2470 More actions · Answer 11

Yes we only have one dup per row. I guess we lucked out this time!