June 2, 2009 at 2:21 pm
How do i find duplicate records in a database of 1.3 TB where in we dont have any PK's ( this might surprise you). I am trying to pull count of duplicate records and measure how much space i can save from it and then if possible put a PK after deleting them.
June 2, 2009 at 8:35 pm
do group and having count(*) > 1...wouldn't worry about space i be more interested in getting rid of duplicates. You can copy out to a temp table do the delete on duplicates and look at space sp_spaceused before and after. Many duplicate script examples just do search..sorry not got example here not on my own laptop.
June 2, 2009 at 10:20 pm
Hi,
Here's a handy query for finding duplicates in a table. Suppose you want to find all email addresses in a table that exist more than once:
SELECT email,
COUNT(email) AS NumOccurrences
FROM users
GROUP BY email
HAVING ( COUNT(email) > 1 )
You could also use this technique to find rows that occur exactly once:
SELECT email
FROM users
GROUP BY email
HAVING ( COUNT(email) = 1 )
Regards
GURSETHI
June 2, 2009 at 11:34 pm
THIS IS AN EXAMPLE WHICH I HAVE USED...
I HOPE THIS SCRIPT WILL HELP YOU OUT....
-- Delete duplicate records with out temporary tables.
CREATE TABLE #phonebook (
[phonenumber] [varchar] (30),
[firstname] [varchar] (30),
[lastname] [varchar] (30),
[company] [varchar] (100))
INSERT #phonebook SELECT '902','syed','iqbal','sm soft'
INSERT #phonebook SELECT '905','john','ksjl89','company llc'
-- Duplicate insert 1
INSERT #phonebook SELECT '902','syed','iqbal','sm soft'
INSERT #phonebook SELECT '905','john','ksjl89','company llc'
-- Duplicate insert 2
INSERT #phonebook SELECT '902','syed','iqbal','sm soft'
INSERT #phonebook SELECT '905','john','ksjl89','company llc'
--Show Duplicate Phonenumbers in Phonebook
SELECT phonenumber, COUNT(*) FROM #phonebook
GROUP BY phonenumber HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC
SET ROWCOUNT 1
SELECT @@rowcount
WHILE @@rowcount > 0
DELETE pb FROM #phonebook as pb
INNER JOIN
(SELECT phonenumber
FROM #phonebook
GROUP BY phonenumber HAVING count(*) > 1)
AS c ON c.phonenumber = pb.phonenumber
SET ROWCOUNT 0
SELECT * FROM #phonebook
DROP TABLE #phonebook
----------ENJOY USING SQL SERVERCENTRAL.COM-------------
June 3, 2009 at 6:59 am
You need to establish what makes each record unique (ie should every email address or employee number) be unique.
If there's a single column which determins this then use something like
SELECT emailaddress, COUNT(emailaddress)
FROM MyTable
GROUP BY emailaddress
HAVING COUNT(emailaddress) > 1
Deleting can be done a number of ways depending on how many records are there. DELETE TOP 1, ROWCOUNT, temporary tables etc..
If you need multiple columns to detmine your uniqueness for each record then it gets more complicated and I have been known to resort to a (grits teeth) CURSOR.
June 4, 2009 at 2:57 am
Create Table #Data
(
FirstNameVARCHAR(50),
EMailVARCHAR(50)
)
GO
Insert #Data Values('AAAAA','AAAAA@a.com')
Insert #Data Values('BBBBB','BBBBB@a.com')
Insert #Data Values('AAAAA','AAAAA@a.com')
Insert #Data Values('CCCCC','CCCCC@a.com')
Insert #Data Values('AAAAA','AAAAA@a.com')
Insert #Data Values('CCCCC','CCCCC@a.com')
GO
Select FirstName,EMail From #Data
GO
SELECT FirstName,EMail,
CASE WHEN DENSE_RANK() OVER(PARTITION BY EMail ORDER BY NEWID()) >1 THEN 'Duplicate' ELSE 'Non Duplicate' END 'Status'
FROM #Data
GO
Drop Table #Data
Regards
June 4, 2009 at 8:24 am
1) the size of the database is actually irrelevant. You could have a 10TB database and a 2 row table that you are looking for dupes in. 🙂
2) eliminating duplicates on a table without some form of 'pointer' such as a PK or sequence is going to be horribly inefficient. Best of luck with that! :w00t:
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
June 4, 2009 at 10:00 am
Tara (6/2/2009)
How do i find duplicate records in a database of 1.3 TB where in we dont have any PK's ( this might surprise you). I am trying to pull count of duplicate records and measure how much space i can save from it and then if possible put a PK after deleting them.
Hi,
This will help you
http://www.sqlservercentral.com/articles/duplicates/65916/
This will solve your problems !!! 🙂
Hope this helps,
\\K
______________________________________________________________________________________________________________________________________________________________________________________
HTH !
Kin
MCTS : 2005, 2008
Active SQL Server Community Contributor 🙂
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply