How can we delete duplicate records... conditions apply.

  • Hi ALL!

    How can we delete duplicate records from a TABLE in SQL Server

    Following points are to be consider while writing query:

    • Use of other table, temporary table, table variable is not allowed.
    • Table has no primary key

    Example:

    TABLE

    Name Amount

    --------------

    A 15000

    A 15000

    A 15000

    B 12000

    B 12000

    B 12000

    B 12000

    C 20000

    C 20000

    Output After deletion of duplicate records:

    Name Amount

    --------------

    A 15000

    B 12000

    C 20000

  • with t as

    (

    select

    rn= row_number() over (order by col1,col2),

    col1,

    col2

    from tst1

    )

    delete t

    where rn not in

    (

    select min(rn)

    from t

    group by col1, col2

    )

  • this type of post should be in newbies not performance tuning.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • And what is with these restrictions?  Sounds like homework to me   No temp tables?  In any case you have your answer.

  • 1) Why do you restrict the use of temp tables or table variables?

    2) I think this is a better version of the rownumber code posted above. It avoids the cost of the min and grouping stuff. UNTESTED:

    with t as

    (

    select

    rn= row_number() over (order by col1,col2),

    col1,

    col2

    from tst1

    )

    delete t

    where rn > 1

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • 1. Probably the author intended to post SQL puzzle. Maybe site administrators should create such kind of forum, SQL puzzles ?

    2. Your code actually deletes all the rows except the first one. We need to delete just the duplicates.

  • CREATE TABLE xxxx(

    Name varchar(20), Amount int

    )

    INSERT INTO xxxx (name, Amount) Values ('A', 15000)

    INSERT INTO xxxx (name, Amount) Values ('A', 15000)

    INSERT INTO xxxx (name, Amount) Values ('A', 15000)

    INSERT INTO xxxx (name, Amount) Values ('B', 12000)

    INSERT INTO xxxx (name, Amount) Values ('B', 12000)

    INSERT INTO xxxx (name, Amount) Values ('B', 12000)

    INSERT INTO xxxx (name, Amount) Values ('B', 12000)

    INSERT INTO xxxx (name, Amount) Values ('C', 20000)

    INSERT INTO xxxx (name, Amount) Values ('C', 20000)

    with t (rn, name, Amount) as

    (

    select

    rn= row_number() over (partition by name, Amount order by name, Amount),

    name, Amount

    from xxxx

    )

    delete t

    where rn > 1

    select * from xxxx


    * Noel

  • good correction noeld. I missed that the first example didn't have the partitioning.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Partition as same thing as grouping, in my tests both queries showed same estimated cost

  • Not sure what tests you did. Try this one. As you can see, the single-pass mechanism used by the partition query is significantly more efficient than the second method.

    set nocount on

    CREATE TABLE #t(

    Name varchar(20), Amount int

    )

    INSERT INTO #t (name, Amount) Values ('A', 15000)

    INSERT INTO #t (name, Amount) Values ('A', 15000)

    INSERT INTO #t (name, Amount) Values ('A', 15000)

    INSERT INTO #t (name, Amount) Values ('B', 12000)

    INSERT INTO #t (name, Amount) Values ('B', 12000)

    INSERT INTO #t (name, Amount) Values ('B', 12000)

    INSERT INTO #t (name, Amount) Values ('B', 12000)

    INSERT INTO #t (name, Amount) Values ('C', 20000)

    INSERT INTO #t (name, Amount) Values ('C', 20000)

    ;with t (rn, name, Amount) as

    (

    select

    rn= row_number() over (partition by name, Amount order by name, Amount),

    name, Amount

    from #t

    )

    delete t

    where rn > 1

    --1 scan, 7 reads, cost 0.0247

    truncate table #t

    INSERT INTO #t (name, Amount) Values ('A', 15000)

    INSERT INTO #t (name, Amount) Values ('A', 15000)

    INSERT INTO #t (name, Amount) Values ('A', 15000)

    INSERT INTO #t (name, Amount) Values ('B', 12000)

    INSERT INTO #t (name, Amount) Values ('B', 12000)

    INSERT INTO #t (name, Amount) Values ('B', 12000)

    INSERT INTO #t (name, Amount) Values ('B', 12000)

    INSERT INTO #t (name, Amount) Values ('C', 20000)

    INSERT INTO #t (name, Amount) Values ('C', 20000)

    ;with t as

    (

    select

    rn= row_number() over (order by name,amount),

    name,

    amount

    from #t

    )

    delete t

    where rn not in

    (

    select min(rn)

    from t

    group by name, amount

    )

    --2 scans, 36 reads, cost 0.0638

    drop table #t

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Yes, you're right. Now I got same numbers as yours. Sorry, when I tested them yesterday I apparently tested same query twice.

  • I find this question extremely annoying.

     

    Hiding under a desk from SSIS Implemenation Work :crazy:

  • Heh... why?  If you think this is bad, do a search on "Bankers Rounding"

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • This is what I use for my queries when finding and/or deleting duplicates:

    Note: I use a counter to determine how many times to cycle through the database in order to limit how many records I might be deleting.  You can also add where clasuses to either of the subqueries to help out as well.  Credit however is due to either google or this site, as I saw and modified this code from somewhere other than my own brain.

    declare

    @Counter int

    set

    @Counter=80

    set

    rowcount 1

    while

    @Counter>0

    begin

    delete

    from tf where empid in (SELECT empid

    FROM

    tf  GROUP BY empid

    HAVING

    ( COUNT(empid) > 1 ))

    set @Counter=@Counter-1

    end

    set

    rowcount 0

  • Gosh is that inefficient tho . . .

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply