How to delete duplicate rows form SQL 2008 database

Question

How to delete duplicate rows form SQL 2008 database

techzone12

SSCommitted

Points: 1798
More actions
December 10, 2010 at 1:07 pm

#244085

I have a database with hundreds of tables. The original deisgn did not have primary keys. As a result the database now has duplicate rows.
Example"
Date_Stamp KW1 KW2
12/10/2010 5.3 3.1
12/10/2010 5.3 3.1
12/9/2010 4 2
12/8/2010 3 1
12/7/2010 7.4 5
12/7/2010 7.4 5
12/7/2010 7.4 5
...
I need to delete duplcate rows (and leave one behind). I want the end result to look like the output of using "select destinct * tableName)
I know how to do this, but I need to be able to automatically scroll throught tables, find the duplicate and delete the extra rows. I thought about using a cursor, but not sure about the T-sql?
The code should do this:
scroll through tables
and for each table check to see if it has duplicate rows
if true delete the extra rows, leaving one behind
How can I do this?

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply

GSquared SSC Guru Points: 260824 More actions · Answer 1

It'll take a cursor to do multiple tables efficiently. You could use sp_MSForEachTable, but that's just a hidden cursor.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

techzone12 SSCommitted Points: 1798 More actions · Answer 2

I think "sp_MSForEachTable" is a good option. I will look into it.

techzone12 SSCommitted Points: 1798 More actions · Answer 3

with cte

as (select row_number() over(partition by Date_Stamp order by Date_Stamp) as rn

from mytable)

delete from cte

where rn>1

This works for a single table

No I need to think about a way to scroll through tables

GSquared SSC Guru Points: 260824 More actions · Answer 4

Will your table ever have two records on the same day that you want to keep?

Try turning that into a dynamic command as a string, and plug in the table name to the CTE's from clause, then use a cursor to step through it. It's quick and dirty and will work.

Make a backup of the database before you do it, just in case.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Ron McCullough SSC Guru Points: 63877 More actions · Answer 5

Without using sp_ForEachTable you might want to try this code:

DECLARE @Table NVARCHAR(200),@Column NVARCHAR(200),@Sql NVARCHAR(2000)

DECLARE Table_Cursor CURSOR FOR

SELECT so.Name AS 'Table',sc.Name AS 'Column'

FROM sys.sysobjects so

LEFT OUTER JOIN sys.syscolumns sc

ON sc.id = so.id

WHERE so.type = 'u'AND sc.name = 'Date_Stamp'

--or use AND sc.xtype = 40 for all columns with a data type of Date

OPEN Table_Cursor

FETCH NEXT FROM Table_Cursor

INTO @Table,@Column

WHILE @@FETCH_STATUs = 0

BEGIN

--PRINT @Table +' ' + @Column -- for testing only

SET @Sql = ';with cte

as (select row_number() over(partition by '+ @Column +' order by ' + @Column +') AS rn

from Dcustomer)

SELECT * FROM cte' --For testing replace with a DELETE statement

-- After you have verified the results

--print @sql

--print '-------------------------'

EXECUTE SP_EXECUTESQL @Sql

FETCH NEXT FROM Table_Cursor

INTO @Table,@Column

--PRINT @Table +' ' + @Column for testing only

END

CLOSE Table_Cursor

DEALLOCATE Table_Cursor

As Gsquared has said

Make a backup of the database before you do it, just in case

If everything seems to be going well, you have obviously overlooked something.

Ron

Please help us, help you -before posting a question please read[/url]
Before posting a performance problem please read[/url]

techzone12 SSCommitted Points: 1798 More actions · Answer 6

Thanks for the post. Your query worked.

DECLARE @Table NVARCHAR(200),@Column NVARCHAR(200),@Sql NVARCHAR(2000)

DECLARE Table_Cursor CURSOR FOR

SELECT so.Name AS 'Table',sc.Name AS 'Column'

FROM sys.sysobjects so

LEFT OUTER JOIN sys.syscolumns sc

ON sc.id = so.id

WHERE so.type = 'u'AND sc.name = 'Date_Stamp'

--or use AND sc.xtype = 40 for all columns with a data type of Date

OPEN Table_Cursor

FETCH NEXT FROM Table_Cursor

INTO @Table,@Column

WHILE @@FETCH_STATUs = 0

BEGIN

--PRINT @Table +' ' + @Column -- for testing only

SET @sql = ';with cte

as (select row_number() over(partition by '+ @Column +' order by ' + @Column +') AS rn

from ' + @Table + ')

delete FROM cte where rn>1 ' --For testing replace with a DELETE statement

-- After you have verified the results

print @sql

print '-------------------------'

EXECUTE SP_EXECUTESQL @sql

FETCH NEXT FROM Table_Cursor

INTO @Table,@Column

--PRINT @Table +' ' + @Column for testing only

END

CLOSE Table_Cursor

DEALLOCATE Table_Cursor

I tired it on a test database.

Make sure that you have the condition "where rn > 1".

Cleaning up the DB was going to take me a loooong time!. Now with this query I just have to run it and go do something else. Of course, I have to backup the DB (20 GB) before we attempt this.

This is my first post on this Forum!. Thanks a lot.

Ron McCullough SSC Guru Points: 63877 More actions · Answer 7

techzone12

Thanks for the feed back.

Now for everyone who wants to help you, the next time you post a question to a forum please provide the table(s) structure, sample data and what you may have attempted following the instructions contained in the first link in my signature block. In that way you are more than likely to receive tested answers .. in other words, help us to help you

Again thanks for the feed back.

If everything seems to be going well, you have obviously overlooked something.

Ron

Please help us, help you -before posting a question please read[/url]
Before posting a performance problem please read[/url]

Eralper SSCarpal Tunnel Points: 4438 More actions · Answer 8

My solution would be creating a stored procedure with table name as input parameter.

And within the SP, I would create a dynamic sql which will delete dublicate rows. You can refer to How to delete duplicate records or rows among identical rows in a table where no primary key exists for sample codes.

And I think it is easy to use sp_MSForEachTable to call the above SP for each table in database.

Eralper
SQL Server and T-SQL Tutorials and Articles
Microsoft Certification and Certification Exams

ns_18 SSC Veteran Points: 214 More actions · Answer 9

I want to implement the below scenario.

If ther is any duplicate data then my SQL should run and find the duplicate rows and then delete them automatically.

Do you have a dynamic SQl which identifies and delete the duplicate records automatically. In this case the table contains primary key. Thanks in advance!

Ron McCullough SSC Guru Points: 63877 More actions · Answer 10

ns_18 (12/21/2010)
I want to implement the below scenario.
If ther is any duplicate data then my SQL should run and find the duplicate rows and then delete them automatically.
Do you have a dynamic SQl which identifies and delete the duplicate records automatically. In this case the table contains primary key. Thanks in advance!

Scroll back up about four postings .... read and understand what that T-SQL will do ... and then test it... if you have any additional questions... post them in a new forum.

If everything seems to be going well, you have obviously overlooked something.

Ron

Please help us, help you -before posting a question please read[/url]
Before posting a performance problem please read[/url]

parthi-1705 SSCrazy Eights Points: 9277 More actions · Answer 11

ns_18 (12/21/2010)
I want to implement the below scenario.
If ther is any duplicate data then my SQL should run and find the duplicate rows and then delete them automatically.
Do you have a dynamic SQl which identifies and delete the duplicate records automatically. In this case the table contains primary key. Thanks in advance!

IF the table has Primary Key then how cum duplicate records will be there in the Table;-) 😉

Thanks

Parthi

Thanks
Parthi

ns_18 SSC Veteran Points: 214 More actions · Answer 12

I want to implement the below scenario.

If ther is any duplicate data then my SQL should run and find the duplicate rows and then delete them automatically.

Do you have a dynamic SQl which identifies and delete the duplicate records automatically. In this case the table contains primary key. Thanks in advance!

ns_18 SSC Veteran Points: 214 More actions · Answer 13

I want to implement the below scenario.

If ther is any duplicate data then my SQL should run and find the duplicate rows and then delete them automatically.

Do you have a dynamic SQl which identifies and delete the duplicate records automatically. In this case the table contains primary key. Thanks in advance!

parthi-1705 SSCrazy Eights Points: 9277 More actions · Answer 14

ns_18 (12/21/2010)
I want to implement the below scenario.
If ther is any duplicate data then my SQL should run and find the duplicate rows and then delete them automatically.
Do you have a dynamic SQl which identifies and delete the duplicate records automatically. In this case the table contains primary key. Thanks in advance!

have a look at the same post Eralper http://www.sqlservercentral.com/Forums/Topic1033202-392-1.aspx has given the solution try the way he said looks good with no primary key

In this case the table contains primary key. again i am asking

IF the table has Primary Key then how cum duplicate records will be there in the Table????????

If the table contains PY Key then it will not have Duplicate records

Thanks

Parthi

Thanks
Parthi