How can I find duplicate Entries in a table?

Question

Post reply

How can I find duplicate Entries in a table?

meichner

SSCertifiable

Points: 5563
More actions
November 24, 2009 at 8:54 am

#214055

I have a table that has CustomerNumber as one of its fields (CustomerNumber is not a key field). I was hoping that there was a way that I could find duplicate entries. Sort of like:
select Not(Distinct(CustomerNumber)) From CustomerContactsXref
Thanks

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply

Nabha SSCrazy Eights Points: 8534 More actions · Answer 1

Select * from (SELECT *,

Row_Number() OVER(PARTITION BY CustomerNumber ORDER BY

CustomerNUmber) as row_no

FRom CustomerContactsXref) t

Where row_no > 1

---------------------------------------------------------------------------------

SSSolice SSCommitted Points: 1907 More actions · Answer 2

Probably the simplest way to do this.

SELECT mt.CustomerNumber

FROM MyTable AS mt

GROUP BY mt.CustomerNumber

HAVING COUNT(mt.CustomerNumber) > 1

Hope This Helps!

meichner SSCertifiable Points: 5563 More actions · Answer 3

Thanks so much to the both of you. Both statements worked perfectly. You saved me hours of time.

SSSolice SSCommitted Points: 1907 More actions · Answer 4

Nabha (11/24/2009)
Select * from (SELECT *, Row_Number() OVER(PARTITION BY CustomerNumber ORDER BY CustomerNUmber) as row_no FRom CustomerContactsXref) t Where row_no > 1

Hi Nabha, I was curious about any differences in perfomance using the Group By and Having clauses vs. the windowing functions. While doing this I get some unexpected results. Please help me understand the behavior of the windowing functions better. If I Select * I get a particular result back, if I Select CustomerNumber, Date (for example) , I get a completely different result returned. Thanks.

Scott.

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

I'm Sorry... I had my date filter misplaced.

Nabha SSCrazy Eights Points: 8534 More actions · Answer 5

SSSolice (11/24/2009)
Nabha (11/24/2009)
Select * from (SELECT *, Row_Number() OVER(PARTITION BY CustomerNumber ORDER BY CustomerNUmber) as row_no FRom CustomerContactsXref) t Where row_no > 1
Hi Nabha, I was curious about any differences in perfomance using the Group By and Having clauses vs. the windowing functions. While doing this I get some unexpected results. Please help me understand the behavior of the windowing functions better. If I Select * I get a particular result back, if I Select CustomerNumber, Date (for example) , I get a completely different result returned. Thanks.
Scott.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
I'm Sorry... I had my date filter misplaced.

Performance difference; I believe there is not much but its again debatable. This is the closest topic that discussed about the difference that I could find out. ( I guess there is plenty over here)

http://www.sqlservercentral.com/Forums/Topic711987-360-1.aspx

Personally I prefer row_number() over group by in this case because, i can see all the other records in the table along with their rank and you would have to do a self join back in group by case to see that.

---------------------------------------------------------------------------------

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 6

Heh... just a "forum, survival tip"... I wouldn't make any such performance claims unless I had code to back it up. I've been burned without the code.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Nabha SSCrazy Eights Points: 8534 More actions · Answer 7

Jeff Moden (11/25/2009)
Heh... just a "forum, survival tip"... I wouldn't make any such performance claims unless I had code to back it up. I've been burned without the code.

🙂 I knew this was coming! Yes sure Jeff. Thanks.

I should not have commented about the performance because,

1. I think its not even fair to compare these two queries here as they

are doing two different things. Row_number() would get me all those

rows except the first one. group by would get me just those customer

numbers which are duplicates. So two different result sets.

Gears start shifting when you actually want to delete it.

Then we may start the comparision, and then comes performance difference.

2. I was lazy enough(and not completely sure on how)to setup the test

data for this case and use these two approaches to create script to delete duplicates and test performance.

Thanks Jeff, will be careful in making performance claims in future :hehe:

EDIT: Deleted the last line in my previous post about the performance of windowing function against the subquery in select clause 🙂

---------------------------------------------------------------------------------

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 8

Heh... no problem, Nabha... just didn't want to see a food fight break out. I'm not perfect either. I miss stuff just like on that previous row problem I just stuck my foot in my mouth with. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

atwite Newbie Points: 6 More actions · Answer 9

Hi Nabha and Jeff,

How would you change the query to find duplicates if the entries were numbers? I have been tasked with creating a query that will pull out duplicate entries but, the entries are numbers. We have a process that automates the entry of meters into a database. Unfortunately there are times when the same meter is entered. IF I knew how to write a query to find the duplicates I would be able to alert users when a duplicate is entered. Another issue is that the amount of meter entries will vary. Is there a way to write a query that will look at each piece of equipment individually and compare it's meters and then, if there is a meter that is duplicated, return the equipment ID? The last issue would be that I only want to compare the two latest (most recent) entries. Any suggestions would be greatly appreciated.

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 10

For something like that, why not just put a unique index on the meter column and use the thrown error as an alert?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

atwite Newbie Points: 6 More actions · Answer 11

I may not understand what you mean but, I do not think that will work in the database I work on. We have one table that has thousands of equipment records and another table that has tens or hundreds of meters for each piece of equipment. The table that has the meters in it has meters for each piece of equipment. I need a way to group the meters for each piece of equipment and then take the last two meters and compare them to see if they are duplicates.

So for instance say the database has 5,000 pieces of equipment. Each piece of equipment will have 1 to 150 meters. I have to figure out a way to compare the last two meters for each piece of equipment to see if they are the same.

SQLRNNR SSC Guru Points: 281334 More actions · Answer 12

atwite (12/18/2009)
Hi Nabha and Jeff,
How would you change the query to find duplicates if the entries were numbers? I have been tasked with creating a query that will pull out duplicate entries but, the entries are numbers. We have a process that automates the entry of meters into a database. Unfortunately there are times when the same meter is entered. IF I knew how to write a query to find the duplicates I would be able to alert users when a duplicate is entered. Another issue is that the amount of meter entries will vary. Is there a way to write a query that will look at each piece of equipment individually and compare it's meters and then, if there is a meter that is duplicated, return the equipment ID? The last issue would be that I only want to compare the two latest (most recent) entries. Any suggestions would be greatly appreciated.

Change the * to CustomerNumber. Same principle, just specify the columns. Otherwise, the queries are already grouped by CustomerNumber.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 13

atwite (12/18/2009)
I may not understand what you mean but, I do not think that will work in the database I work on. We have one table that has thousands of equipment records and another table that has tens or hundreds of meters for each piece of equipment. The table that has the meters in it has meters for each piece of equipment. I need a way to group the meters for each piece of equipment and then take the last two meters and compare them to see if they are duplicates.
So for instance say the database has 5,000 pieces of equipment. Each piece of equipment will have 1 to 150 meters. I have to figure out a way to compare the last two meters for each piece of equipment to see if they are the same.

I guess I don't understand why a piece of equipment would have more than one meter associated with it during the lifetime of the equipment. Whatever...

See the article in the first link in my signature line below. If you were to provide table creation and readily consumable data statements in the format indicated in that article, someone would be able to help you lickety-split.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

homebrew01 SSC Guru Points: 55548 More actions · Answer 14

Jeff Moden (12/18/2009)
For something like that, why not just put a unique index on the meter column and use the thrown error as an alert?

Yes, prevent the duplicates in the first place. If for some reason you can't use a unique index because duplicates are allowed in special situations, then the insert statement could be coded to prevent adding a duplicate by doing something like WHERE x not in (select y from ...)