How to delete duplicate rows form SQL 2008 database

  • I have a database with hundreds of tables. The original deisgn did not have primary keys. As a result the database now has duplicate rows.

    Example"

    Date_Stamp KW1 KW2

    12/10/2010 5.3 3.1

    12/10/2010 5.3 3.1

    12/9/2010 4 2

    12/8/2010 3 1

    12/7/2010 7.4 5

    12/7/2010 7.4 5

    12/7/2010 7.4 5

    ...

    I need to delete duplcate rows (and leave one behind). I want the end result to look like the output of using "select destinct * tableName)

    I know how to do this, but I need to be able to automatically scroll throught tables, find the duplicate and delete the extra rows. I thought about using a cursor, but not sure about the T-sql?

    The code should do this:

    scroll through tables

    and for each table check to see if it has duplicate rows

    if true delete the extra rows, leaving one behind

    How can I do this?

  • It'll take a cursor to do multiple tables efficiently. You could use sp_MSForEachTable, but that's just a hidden cursor.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I think "sp_MSForEachTable" is a good option. I will look into it.

  • with cte

    as (select row_number() over(partition by Date_Stamp order by Date_Stamp) as rn

    from mytable)

    delete from cte

    where rn>1

    This works for a single table

    No I need to think about a way to scroll through tables

  • Will your table ever have two records on the same day that you want to keep?

    Try turning that into a dynamic command as a string, and plug in the table name to the CTE's from clause, then use a cursor to step through it. It's quick and dirty and will work.

    Make a backup of the database before you do it, just in case.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Without using sp_ForEachTable you might want to try this code:

    DECLARE @Table NVARCHAR(200),@Column NVARCHAR(200),@Sql NVARCHAR(2000)

    DECLARE Table_Cursor CURSOR FOR

    SELECT so.Name AS 'Table',sc.Name AS 'Column'

    FROM sys.sysobjects so

    LEFT OUTER JOIN sys.syscolumns sc

    ON sc.id = so.id

    WHERE so.type = 'u'AND sc.name = 'Date_Stamp'

    --or use AND sc.xtype = 40 for all columns with a data type of Date

    OPEN Table_Cursor

    FETCH NEXT FROM Table_Cursor

    INTO @Table,@Column

    WHILE @@FETCH_STATUs = 0

    BEGIN

    --PRINT @Table +' ' + @Column -- for testing only

    SET @Sql = ';with cte

    as (select row_number() over(partition by '+ @Column +' order by ' + @Column +') AS rn

    from Dcustomer)

    SELECT * FROM cte' --For testing replace with a DELETE statement

    -- After you have verified the results

    --print @sql

    --print '-------------------------'

    EXECUTE SP_EXECUTESQL @Sql

    FETCH NEXT FROM Table_Cursor

    INTO @Table,@Column

    --PRINT @Table +' ' + @Column for testing only

    END

    CLOSE Table_Cursor

    DEALLOCATE Table_Cursor

    As Gsquared has said

    Make a backup of the database before you do it, just in case

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • Thanks for the post. Your query worked.

    DECLARE @Table NVARCHAR(200),@Column NVARCHAR(200),@Sql NVARCHAR(2000)

    DECLARE Table_Cursor CURSOR FOR

    SELECT so.Name AS 'Table',sc.Name AS 'Column'

    FROM sys.sysobjects so

    LEFT OUTER JOIN sys.syscolumns sc

    ON sc.id = so.id

    WHERE so.type = 'u'AND sc.name = 'Date_Stamp'

    --or use AND sc.xtype = 40 for all columns with a data type of Date

    OPEN Table_Cursor

    FETCH NEXT FROM Table_Cursor

    INTO @Table,@Column

    WHILE @@FETCH_STATUs = 0

    BEGIN

    --PRINT @Table +' ' + @Column -- for testing only

    SET @Sql = ';with cte

    as (select row_number() over(partition by '+ @Column +' order by ' + @Column +') AS rn

    from ' + @Table + ')

    delete FROM cte where rn>1 ' --For testing replace with a DELETE statement

    -- After you have verified the results

    print @sql

    print '-------------------------'

    EXECUTE SP_EXECUTESQL @Sql

    FETCH NEXT FROM Table_Cursor

    INTO @Table,@Column

    --PRINT @Table +' ' + @Column for testing only

    END

    CLOSE Table_Cursor

    DEALLOCATE Table_Cursor

    I tired it on a test database.

    Make sure that you have the condition "where rn > 1".

    Cleaning up the DB was going to take me a loooong time!. Now with this query I just have to run it and go do something else. Of course, I have to backup the DB (20 GB) before we attempt this.

    This is my first post on this Forum!. Thanks a lot.

  • techzone12

    Thanks for the feed back.

    Now for everyone who wants to help you, the next time you post a question to a forum please provide the table(s) structure, sample data and what you may have attempted following the instructions contained in the first link in my signature block. In that way you are more than likely to receive tested answers .. in other words, help us to help you

    Again thanks for the feed back.

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • My solution would be creating a stored procedure with table name as input parameter.

    And within the SP, I would create a dynamic sql which will delete dublicate rows. You can refer to How to delete duplicate records or rows among identical rows in a table where no primary key exists for sample codes.

    And I think it is easy to use sp_MSForEachTable to call the above SP for each table in database.

  • I want to implement the below scenario.

    If ther is any duplicate data then my SQL should run and find the duplicate rows and then delete them automatically.

    Do you have a dynamic SQl which identifies and delete the duplicate records automatically. In this case the table contains primary key. Thanks in advance!

  • ns_18 (12/21/2010)


    I want to implement the below scenario.

    If ther is any duplicate data then my SQL should run and find the duplicate rows and then delete them automatically.

    Do you have a dynamic SQl which identifies and delete the duplicate records automatically. In this case the table contains primary key. Thanks in advance!

    Scroll back up about four postings .... read and understand what that T-SQL will do ... and then test it... if you have any additional questions... post them in a new forum.

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • ns_18 (12/21/2010)


    I want to implement the below scenario.

    If ther is any duplicate data then my SQL should run and find the duplicate rows and then delete them automatically.

    Do you have a dynamic SQl which identifies and delete the duplicate records automatically. In this case the table contains primary key. Thanks in advance!

    IF the table has Primary Key then how cum duplicate records will be there in the Table;-) 😉

    Thanks

    Parthi

    Thanks
    Parthi

  • I want to implement the below scenario.

    If ther is any duplicate data then my SQL should run and find the duplicate rows and then delete them automatically.

    Do you have a dynamic SQl which identifies and delete the duplicate records automatically. In this case the table contains primary key. Thanks in advance!

  • I want to implement the below scenario.

    If ther is any duplicate data then my SQL should run and find the duplicate rows and then delete them automatically.

    Do you have a dynamic SQl which identifies and delete the duplicate records automatically. In this case the table contains primary key. Thanks in advance!

  • ns_18 (12/21/2010)


    I want to implement the below scenario.

    If ther is any duplicate data then my SQL should run and find the duplicate rows and then delete them automatically.

    Do you have a dynamic SQl which identifies and delete the duplicate records automatically. In this case the table contains primary key. Thanks in advance!

    have a look at the same post Eralper http://www.sqlservercentral.com/Forums/Topic1033202-392-1.aspx has given the solution try the way he said looks good with no primary key

    In this case the table contains primary key. again i am asking

    IF the table has Primary Key then how cum duplicate records will be there in the Table????????

    If the table contains PY Key then it will not have Duplicate records

    Thanks

    Parthi

    Thanks
    Parthi

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply