November 24, 2009 at 8:54 am
I have a table that has CustomerNumber as one of its fields (CustomerNumber is not a key field). I was hoping that there was a way that I could find duplicate entries. Sort of like:
select Not(Distinct(CustomerNumber)) From CustomerContactsXref
Thanks
November 24, 2009 at 9:01 am
Select * from (SELECT *,
Row_Number() OVER(PARTITION BY CustomerNumber ORDER BY
CustomerNUmber) as row_no
FRom CustomerContactsXref) t
Where row_no > 1
---------------------------------------------------------------------------------
November 24, 2009 at 9:08 am
Probably the simplest way to do this.
SELECT mt.CustomerNumber
FROM MyTable AS mt
GROUP BY mt.CustomerNumber
HAVING COUNT(mt.CustomerNumber) > 1
Hope This Helps!
November 24, 2009 at 9:17 am
Thanks so much to the both of you. Both statements worked perfectly. You saved me hours of time.
November 24, 2009 at 10:00 am
Nabha (11/24/2009)
Select * from (SELECT *,
Row_Number() OVER(PARTITION BY CustomerNumber ORDER BY
CustomerNUmber) as row_no
FRom CustomerContactsXref) t
Where row_no > 1
Hi Nabha, I was curious about any differences in perfomance using the Group By and Having clauses vs. the windowing functions. While doing this I get some unexpected results. Please help me understand the behavior of the windowing functions better. If I Select * I get a particular result back, if I Select CustomerNumber, Date (for example) , I get a completely different result returned. Thanks.
Scott.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
I'm Sorry... I had my date filter misplaced.
November 25, 2009 at 12:18 am
SSSolice (11/24/2009)
Nabha (11/24/2009)
Select * from (SELECT *,
Row_Number() OVER(PARTITION BY CustomerNumber ORDER BY
CustomerNUmber) as row_no
FRom CustomerContactsXref) t
Where row_no > 1
Hi Nabha, I was curious about any differences in perfomance using the Group By and Having clauses vs. the windowing functions. While doing this I get some unexpected results. Please help me understand the behavior of the windowing functions better. If I Select * I get a particular result back, if I Select CustomerNumber, Date (for example) , I get a completely different result returned. Thanks.
Scott.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
I'm Sorry... I had my date filter misplaced.
Performance difference; I believe there is not much but its again debatable. This is the closest topic that discussed about the difference that I could find out. ( I guess there is plenty over here)
http://www.sqlservercentral.com/Forums/Topic711987-360-1.aspx
Personally I prefer row_number() over group by in this case because, i can see all the other records in the table along with their rank and you would have to do a self join back in group by case to see that.
---------------------------------------------------------------------------------
November 25, 2009 at 8:30 pm
Heh... just a "forum, survival tip"... I wouldn't make any such performance claims unless I had code to back it up. I've been burned without the code.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 26, 2009 at 3:54 am
Jeff Moden (11/25/2009)
Heh... just a "forum, survival tip"... I wouldn't make any such performance claims unless I had code to back it up. I've been burned without the code.
🙂 I knew this was coming! Yes sure Jeff. Thanks.
I should not have commented about the performance because,
1. I think its not even fair to compare these two queries here as they
are doing two different things. Row_number() would get me all those
rows except the first one. group by would get me just those customer
numbers which are duplicates. So two different result sets.
Gears start shifting when you actually want to delete it.
Then we may start the comparision, and then comes performance difference.
2. I was lazy enough(and not completely sure on how)to setup the test
data for this case and use these two approaches to create script to delete duplicates and test performance.
Thanks Jeff, will be careful in making performance claims in future :hehe:
EDIT: Deleted the last line in my previous post about the performance of windowing function against the subquery in select clause 🙂
---------------------------------------------------------------------------------
November 26, 2009 at 12:00 pm
Heh... no problem, Nabha... just didn't want to see a food fight break out. I'm not perfect either. I miss stuff just like on that previous row problem I just stuck my foot in my mouth with. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
December 18, 2009 at 11:13 am
Hi Nabha and Jeff,
How would you change the query to find duplicates if the entries were numbers? I have been tasked with creating a query that will pull out duplicate entries but, the entries are numbers. We have a process that automates the entry of meters into a database. Unfortunately there are times when the same meter is entered. IF I knew how to write a query to find the duplicates I would be able to alert users when a duplicate is entered. Another issue is that the amount of meter entries will vary. Is there a way to write a query that will look at each piece of equipment individually and compare it's meters and then, if there is a meter that is duplicated, return the equipment ID? The last issue would be that I only want to compare the two latest (most recent) entries. Any suggestions would be greatly appreciated.
December 18, 2009 at 11:59 am
For something like that, why not just put a unique index on the meter column and use the thrown error as an alert?
--Jeff Moden
Change is inevitable... Change for the better is not.
December 18, 2009 at 12:29 pm
I may not understand what you mean but, I do not think that will work in the database I work on. We have one table that has thousands of equipment records and another table that has tens or hundreds of meters for each piece of equipment. The table that has the meters in it has meters for each piece of equipment. I need a way to group the meters for each piece of equipment and then take the last two meters and compare them to see if they are duplicates.
So for instance say the database has 5,000 pieces of equipment. Each piece of equipment will have 1 to 150 meters. I have to figure out a way to compare the last two meters for each piece of equipment to see if they are the same.
December 18, 2009 at 12:36 pm
atwite (12/18/2009)
Hi Nabha and Jeff,How would you change the query to find duplicates if the entries were numbers? I have been tasked with creating a query that will pull out duplicate entries but, the entries are numbers. We have a process that automates the entry of meters into a database. Unfortunately there are times when the same meter is entered. IF I knew how to write a query to find the duplicates I would be able to alert users when a duplicate is entered. Another issue is that the amount of meter entries will vary. Is there a way to write a query that will look at each piece of equipment individually and compare it's meters and then, if there is a meter that is duplicated, return the equipment ID? The last issue would be that I only want to compare the two latest (most recent) entries. Any suggestions would be greatly appreciated.
Change the * to CustomerNumber. Same principle, just specify the columns. Otherwise, the queries are already grouped by CustomerNumber.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
December 18, 2009 at 4:28 pm
atwite (12/18/2009)
I may not understand what you mean but, I do not think that will work in the database I work on. We have one table that has thousands of equipment records and another table that has tens or hundreds of meters for each piece of equipment. The table that has the meters in it has meters for each piece of equipment. I need a way to group the meters for each piece of equipment and then take the last two meters and compare them to see if they are duplicates.So for instance say the database has 5,000 pieces of equipment. Each piece of equipment will have 1 to 150 meters. I have to figure out a way to compare the last two meters for each piece of equipment to see if they are the same.
I guess I don't understand why a piece of equipment would have more than one meter associated with it during the lifetime of the equipment. Whatever...
See the article in the first link in my signature line below. If you were to provide table creation and readily consumable data statements in the format indicated in that article, someone would be able to help you lickety-split.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 21, 2009 at 10:15 am
Jeff Moden (12/18/2009)
For something like that, why not just put a unique index on the meter column and use the thrown error as an alert?
Yes, prevent the duplicates in the first place. If for some reason you can't use a unique index because duplicates are allowed in special situations, then the insert statement could be coded to prevent adding a duplicate by doing something like WHERE x not in (select y from ...)
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply