How to remove duplicates from column without using Temp table

  • hey ,

    I am tring to removed duplicates without distinct keyword and temp table.

    please give me any idea.

  • u might wanna try this!!!!

    SET ROWCOUNT 1

    DELETE YOURTABLE

    FROM YOURTABLE A WHERE (SELECT COUNT(*) FROM YOURTABLE B WHERE A.COLNAME=B.COLNAME) >1

    WHILE @@ROWCOUNT >0

    DELETE YOURTABLE

    FROM YOURTABLE A WHERE (SELECT COUNT(*) FROM YOURTABLE B WHERE A.COLNAME=B.COLNAME ) >1

    SET ROWCOUNT 0

    [font="Times New Roman"]For better assistance in answering your questions
    Click Here[/url][/font]

  • You may want to use a Common Table Expression (CTE) for example:

    CREATE TABLE T(Date_Stamp DATETIME,KW1 DECIMAL(5,1), KW2 DECIMAL(5,1))

    INSERT INTO T

    SELECT '12/10/2010', 5.3, 3.1 UNION ALL

    SELECT '12/10/2010', 5.3, 3.1 UNION ALL

    SELECT '12/9/2010', 4, 2 UNION ALL

    SELECT '12/8/2010', 3, 1 UNION ALL

    SELECT '12/7/2010', 7.4, 5 UNION ALL

    SELECT '12/7/2010', 7.4, 5 UNION ALL

    SELECT '12/7/2010', 7.4, 5

    ;with cte

    as (select row_number() over(partition by Date_Stamp,KW1,KW2 order by Date_Stamp) as rn,

    KW1,KW2,Date_Stamp

    from T)

    SELECT * FROM cte

    Result):

    rnKW1KW2Date_Stamp

    17.45.02010-12-07 00:00:00.000

    27.45.02010-12-07 00:00:00.000

    37.45.02010-12-07 00:00:00.000

    13.01.02010-12-08 00:00:00.000

    14.02.02010-12-09 00:00:00.000

    15.33.12010-12-10 00:00:00.000

    25.33.12010-12-10 00:00:00.000

    Once you have verified the data, then replace the SELECT * FROM CTE statment with:

    DELETE FROM cte WHERE rn > 1

    For a matter of record the above code was copied from a posting by Jeff Moden on a SSC forum, and not my work, but one I use frequently.

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • I like using subqueries and an Identity column. If you don't have an Identity, add it to the table. Then join the table to a subquery searching for the MIN() Identity. In the ON clause, match on the PK (or all the columns except the PK if the PK is the identity) where IdentityValue <> IdentityValue.

    Do it as a Select statement to make sure it looks right, then turn it into a DELETE.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Great Job dude ur script works. but can u explain me that code.

  • vaibhavraut88 (6/16/2011)


    Great Job dude ur script works. but can u explain me that code.

    Look up Common Table Expressions and Row_Number() in Books Online, or Google the phrases. There's a lot to know about CTEs and the best way to understand it is to learn it.

    If it's the rest of the SELECT statement that you're having issues with, then I suggest you reverse engineer the code to a point where you can see what's going on. Start with the base SELECT statement, then start adding things (looking up terms along the way) until you get to a point of confusion. Then ask us specifically about that point.

    As one of the frequent posters on this forum likes to say, "Don't use this code in a production environment until you understand what it does. After all, you'll be the one supporting it."

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • vaibhavraut88 (6/16/2011)


    Great Job dude ur script works. but can u explain me that code.

    Ummm.... which script? The one with the WHILE Loop or the other one?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sriram.RM (6/16/2011)


    u might wanna try this!!!!

    SET ROWCOUNT 1

    DELETE YOURTABLE

    FROM YOURTABLE A WHERE (SELECT COUNT(*) FROM YOURTABLE B WHERE A.COLNAME=B.COLNAME) >1

    WHILE @@ROWCOUNT >0

    DELETE YOURTABLE

    FROM YOURTABLE A WHERE (SELECT COUNT(*) FROM YOURTABLE B WHERE A.COLNAME=B.COLNAME ) >1

    SET ROWCOUNT 0

    You should really set up a test on a million rows that has a fair number of duplicates and see how slow that actually is. Also, it may not matter if you're stuck in 2k5 but just so you know, SET ROWCOUNT has been deprecated and will be removed from future versions of SQL Server.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • bitbucket-25253 (6/16/2011)


    For a matter of record the above code was copied from a posting by Jeff Moden on a SSC forum, and not my work, but one I use frequently.

    You, Sir, are a gentleman and a scholar. Thank you for the kudo but it's a well known and common method. I'm just another "user" on this one. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Brandie Tarvin (6/16/2011)


    If you don't have an Identity, add it to the table.

    I prefer a method which exploits the IDENTITY column, as well, but it's just not possible to add a column to a table for most folks. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (6/17/2011)


    Brandie Tarvin (6/16/2011)


    If you don't have an Identity, add it to the table.

    I prefer a method which exploits the IDENTITY column, as well, but it's just not possible to add a column to a table for most folks. 🙂

    Actually, I usually use it in conjunction with a temp table. @=) But he said no temp tables, so I left out that part of the suggestion.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • hi

    on a related note , i once suggested the below approach and wanted your input on if it was a good solution, naturally i am talking about a huge table 25 million rows + and we needed to identity duplicates ,

    I suggested using a hash value to identify the rows which are duplicates using Hasbytes , then counting only hashbytes to identify the dulicates rather than performing the distinct , group by , row_num operations on all columns in the table.

    In retrospect i cant help feeling that this solution was too complicated.

    Jayanth Kurup[/url]

  • Keep in mind your hashes may produce duplicates as well !

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Hashbytes seems overly complicated to me as well.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Jeff Moden (6/17/2011)


    Sriram.RM (6/16/2011)


    u might wanna try this!!!!

    SET ROWCOUNT 1

    DELETE YOURTABLE

    FROM YOURTABLE A WHERE (SELECT COUNT(*) FROM YOURTABLE B WHERE A.COLNAME=B.COLNAME) >1

    WHILE @@ROWCOUNT >0

    DELETE YOURTABLE

    FROM YOURTABLE A WHERE (SELECT COUNT(*) FROM YOURTABLE B WHERE A.COLNAME=B.COLNAME ) >1

    SET ROWCOUNT 0

    You should really set up a test on a million rows that has a fair number of duplicates and see how slow that actually is. Also, it may not matter if you're stuck in 2k5 but just so you know, SET ROWCOUNT has been deprecated and will be removed from future versions of SQL Server.

    My Bad... that indeed is a bad one... mistook it for the SQL 2000 forum!!!

    [font="Times New Roman"]For better assistance in answering your questions
    Click Here[/url][/font]

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply