Removing duplicates and keeping 1 record with no PK

Question

Removing duplicates and keeping 1 record with no PK

Viewing 8 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply

olie480 SSC-Addicted Points: 443 More actions · Answer 1

What I do know is that if I have 4 'John Smiths' and if they were all different people, the Address column would be different for each of those entities. So if I know that I can use the Address column as a Psudeo Primary Key column, what steps could I take to delete records according to the Address column?

The only problem that will arise is an inconsistant value because of sales reps, ie. 2205 Vine St. or 2205 Vine St (without period) which really is not a problem, as long as I get the majority of records deleted, I would much rather sift through the ones that did not get deleted, then going through the whole database and manually delete every single duplicate.

SQLRNNR SSC Guru Points: 281334 More actions · Answer 2

olie480 (4/5/2010)
What I do know is that if I have 4 'John Smiths' and if they were all different people, the Address column would be different for each of those entities. So if I know that I can use the Address column as a Psudeo Primary Key column, what steps could I take to delete records according to the Address column?
The only problem that will arise is an inconsistant value because of sales reps, ie. 2205 Vine St. or 2205 Vine St (without period) which really is not a problem, as long as I get the majority of records deleted, I would much rather sift through the ones that did not get deleted, then going through the whole database and manually delete every single duplicate.

That is an assumption. Some people do maintain several addresses, thus address is not always a good unique identifier. Also, what if the person moved? Both addresses would be valid, but you would not know which was the best address to use.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

olie480 SSC-Addicted Points: 443 More actions · Answer 3

CirquedeSQLeil (4/5/2010)
olie480 (4/5/2010)
What I do know is that if I have 4 'John Smiths' and if they were all different people, the Address column would be different for each of those entities. So if I know that I can use the Address column as a Psudeo Primary Key column, what steps could I take to delete records according to the Address column?
The only problem that will arise is an inconsistant value because of sales reps, ie. 2205 Vine St. or 2205 Vine St (without period) which really is not a problem, as long as I get the majority of records deleted, I would much rather sift through the ones that did not get deleted, then going through the whole database and manually delete every single duplicate.
That is an assumption. Some people do maintain several addresses, thus address is not always a good unique identifier. Also, what if the person moved? Both addresses would be valid, but you would not know which was the best address to use.

Which is a total valid point. Luckily for us, addresses are not really a concern for our stores, since we do not have any mailers or anything to that nature. I did just run a query on my sample database, and it came out fine. It deleted duplicates and kept 1 record. Although like I said, I will have to sift through the records, and just delete SOME duplicates, but it's not like I have many now to delete.

Here is the following code I ran.

WHILE EXISTS (SELECT COUNT(*) FROM __table__ GROUP BY Zip, FirstName, Address HAVING COUNT(*) > 1)

BEGIN

DELETE FROM __table__ WHERE AccountNumber IN

(

SELECT MIN(AccountNumber) as [DeleteID]

FROM __table__

GROUP BY Zip, FirstName, Address

HAVING COUNT(*) > 1

)

END

Thanks for everyones input. In time I will get on the right track with SQL programming, but first I need someone to lay the tracks first, and I appreciate every minute of it.

SQLRNNR SSC Guru Points: 281334 More actions · Answer 4

You're welcome.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

PaulB-TheOneAndOnly SSC Guru Points: 66772 More actions · Answer 5

I've seen so many "duplicate rows" postings lately that I feel somebody has to tell people Microsoft is not charging a license fee per unique index.

Once you have a SQL Server license you can create as many unique indexes as needed... it's free! 😀

_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

SQLRNNR SSC Guru Points: 281334 More actions · Answer 6

PaulB-TheOneAndOnly (4/5/2010)
I've seen so many "duplicate rows" postings lately that I feel somebody has to tell people Microsoft is not charging a license fee per unique index.
Once you have a SQL Server license you can create as many unique indexes as needed... it's free! 😀

LOL

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

olie480 SSC-Addicted Points: 443 More actions · Answer 7

PaulB-TheOneAndOnly (4/5/2010)
I've seen so many "duplicate rows" postings lately that I feel somebody has to tell people Microsoft is not charging a license fee per unique index.
Once you have a SQL Server license you can create as many unique indexes as needed... it's free! 😀

Lol. I understand, it's just the logical code to try to remove duplicates based on no unique index key.

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 8

olie480 (4/5/2010)
CirquedeSQLeil (4/5/2010)
olie480 (4/5/2010)
What I do know is that if I have 4 'John Smiths' and if they were all different people, the Address column would be different for each of those entities. So if I know that I can use the Address column as a Psudeo Primary Key column, what steps could I take to delete records according to the Address column?
The only problem that will arise is an inconsistant value because of sales reps, ie. 2205 Vine St. or 2205 Vine St (without period) which really is not a problem, as long as I get the majority of records deleted, I would much rather sift through the ones that did not get deleted, then going through the whole database and manually delete every single duplicate.
That is an assumption. Some people do maintain several addresses, thus address is not always a good unique identifier. Also, what if the person moved? Both addresses would be valid, but you would not know which was the best address to use.
Which is a total valid point. Luckily for us, addresses are not really a concern for our stores, since we do not have any mailers or anything to that nature. I did just run a query on my sample database, and it came out fine. It deleted duplicates and kept 1 record. Although like I said, I will have to sift through the records, and just delete SOME duplicates, but it's not like I have many now to delete.
Here is the following code I ran.
WHILE EXISTS (SELECT COUNT(*) FROM __table__ GROUP BY Zip, FirstName, Address HAVING COUNT(*) > 1)
BEGIN
DELETE FROM __table__ WHERE AccountNumber IN
(
SELECT MIN(AccountNumber) as [DeleteID]
FROM __table__
GROUP BY Zip, FirstName, Address
HAVING COUNT(*) > 1
)
END
Thanks for everyones input. In time I will get on the right track with SQL programming, but first I need someone to lay the tracks first, and I appreciate every minute of it.

Nope... you need to lay your own tracks. It's all up to you.

I will help you lay one track though... the proverbial "3rd rail" is to delete data that you can't recover. Touch that track and, just like the real 3rd rail, your career can come to a very sudden end. I hope you squirreled the data you deleted away somewhere.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)