April 5, 2010 at 9:58 am
What I do know is that if I have 4 'John Smiths' and if they were all different people, the Address column would be different for each of those entities. So if I know that I can use the Address column as a Psudeo Primary Key column, what steps could I take to delete records according to the Address column?
The only problem that will arise is an inconsistant value because of sales reps, ie. 2205 Vine St. or 2205 Vine St (without period) which really is not a problem, as long as I get the majority of records deleted, I would much rather sift through the ones that did not get deleted, then going through the whole database and manually delete every single duplicate.
April 5, 2010 at 10:08 am
olie480 (4/5/2010)
What I do know is that if I have 4 'John Smiths' and if they were all different people, the Address column would be different for each of those entities. So if I know that I can use the Address column as a Psudeo Primary Key column, what steps could I take to delete records according to the Address column?The only problem that will arise is an inconsistant value because of sales reps, ie. 2205 Vine St. or 2205 Vine St (without period) which really is not a problem, as long as I get the majority of records deleted, I would much rather sift through the ones that did not get deleted, then going through the whole database and manually delete every single duplicate.
That is an assumption. Some people do maintain several addresses, thus address is not always a good unique identifier. Also, what if the person moved? Both addresses would be valid, but you would not know which was the best address to use.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
April 5, 2010 at 10:27 am
CirquedeSQLeil (4/5/2010)
olie480 (4/5/2010)
What I do know is that if I have 4 'John Smiths' and if they were all different people, the Address column would be different for each of those entities. So if I know that I can use the Address column as a Psudeo Primary Key column, what steps could I take to delete records according to the Address column?The only problem that will arise is an inconsistant value because of sales reps, ie. 2205 Vine St. or 2205 Vine St (without period) which really is not a problem, as long as I get the majority of records deleted, I would much rather sift through the ones that did not get deleted, then going through the whole database and manually delete every single duplicate.
That is an assumption. Some people do maintain several addresses, thus address is not always a good unique identifier. Also, what if the person moved? Both addresses would be valid, but you would not know which was the best address to use.
Which is a total valid point. Luckily for us, addresses are not really a concern for our stores, since we do not have any mailers or anything to that nature. I did just run a query on my sample database, and it came out fine. It deleted duplicates and kept 1 record. Although like I said, I will have to sift through the records, and just delete SOME duplicates, but it's not like I have many now to delete.
Here is the following code I ran.
WHILE EXISTS (SELECT COUNT(*) FROM __table__ GROUP BY Zip, FirstName, Address HAVING COUNT(*) > 1)
BEGIN
DELETE FROM __table__ WHERE AccountNumber IN
(
SELECT MIN(AccountNumber) as [DeleteID]
FROM __table__
GROUP BY Zip, FirstName, Address
HAVING COUNT(*) > 1
)
END
Thanks for everyones input. In time I will get on the right track with SQL programming, but first I need someone to lay the tracks first, and I appreciate every minute of it.
April 5, 2010 at 10:41 am
You're welcome.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
April 5, 2010 at 11:31 am
I've seen so many "duplicate rows" postings lately that I feel somebody has to tell people Microsoft is not charging a license fee per unique index.
Once you have a SQL Server license you can create as many unique indexes as needed... it's free! 😀
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.April 5, 2010 at 11:38 am
PaulB-TheOneAndOnly (4/5/2010)
I've seen so many "duplicate rows" postings lately that I feel somebody has to tell people Microsoft is not charging a license fee per unique index.Once you have a SQL Server license you can create as many unique indexes as needed... it's free! 😀
LOL
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
April 5, 2010 at 11:49 am
PaulB-TheOneAndOnly (4/5/2010)
I've seen so many "duplicate rows" postings lately that I feel somebody has to tell people Microsoft is not charging a license fee per unique index.Once you have a SQL Server license you can create as many unique indexes as needed... it's free! 😀
Lol. I understand, it's just the logical code to try to remove duplicates based on no unique index key.
April 5, 2010 at 6:14 pm
olie480 (4/5/2010)
CirquedeSQLeil (4/5/2010)
olie480 (4/5/2010)
What I do know is that if I have 4 'John Smiths' and if they were all different people, the Address column would be different for each of those entities. So if I know that I can use the Address column as a Psudeo Primary Key column, what steps could I take to delete records according to the Address column?The only problem that will arise is an inconsistant value because of sales reps, ie. 2205 Vine St. or 2205 Vine St (without period) which really is not a problem, as long as I get the majority of records deleted, I would much rather sift through the ones that did not get deleted, then going through the whole database and manually delete every single duplicate.
That is an assumption. Some people do maintain several addresses, thus address is not always a good unique identifier. Also, what if the person moved? Both addresses would be valid, but you would not know which was the best address to use.
Which is a total valid point. Luckily for us, addresses are not really a concern for our stores, since we do not have any mailers or anything to that nature. I did just run a query on my sample database, and it came out fine. It deleted duplicates and kept 1 record. Although like I said, I will have to sift through the records, and just delete SOME duplicates, but it's not like I have many now to delete.
Here is the following code I ran.
WHILE EXISTS (SELECT COUNT(*) FROM __table__ GROUP BY Zip, FirstName, Address HAVING COUNT(*) > 1)
BEGIN
DELETE FROM __table__ WHERE AccountNumber IN
(
SELECT MIN(AccountNumber) as [DeleteID]
FROM __table__
GROUP BY Zip, FirstName, Address
HAVING COUNT(*) > 1
)
END
Thanks for everyones input. In time I will get on the right track with SQL programming, but first I need someone to lay the tracks first, and I appreciate every minute of it.
Nope... you need to lay your own tracks. It's all up to you.
I will help you lay one track though... the proverbial "3rd rail" is to delete data that you can't recover. Touch that track and, just like the real 3rd rail, your career can come to a very sudden end. I hope you squirreled the data you deleted away somewhere.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 8 posts - 16 through 22 (of 22 total)
You must be logged in to reply to this topic. Login to reply