eliminating duplicates

  • rbarryyoung,

    any updates about your article ? is it published ?

    karthik

  • Yes, it will be published on Tuesday, August 5th, I believe. Unlike Jeff, my time machine is broken so I do not have a pre-publication link for you. 😀

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • ack, never mind, nothing to see here, look, is that elvis?

  • Thanks RBarryYoung for this 123 post long excursion into "Effects of unclear requirements in Software Engineering"

    Your fantastic article that went live by now made me too curious and I have really read through each and every individual posts (although some I have not read in detail...)

    Unfortunately I sometimes make the same mistake of not reading the requirements well enough. I just blame it me being a human though.

    This reminded me of the good ol' SE cartoon:

    http://www.codinghorror.com/blog/archives/000230.html

    Best Regards,

    Chris Büttner

  • Heh... OUTSTANDING CHRIS! I've been looking for that exact cartoon for a while now! Thanks. And, yeah... it does remind me a lot about the Shenanigans that went on in this post.

    Hey folks, if you haven't read Barry's article, he did a great job on it. Added some very interesting historical information about this, what used to be, common practice to save disk space. See the following URL...

    http://www.sqlservercentral.com/articles/T-SQL/63578/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Glad i could help 🙂

    Best Regards,

    Chris Büttner

  • Barry, I am truly humbled. When I read your article I felt like the tyke I was back in your "back in the day", with the binky in my mouth in the early 70's. Its like kids in school learning math with calculators now. Some of them can do math that would have made Einstein proud, but take the calculator away, and they can not work out a simple calculus problem by hand. I am very glad there are so many seasoned professionals like yourself willing to impart your knowledge.

    Greg
    _________________________________________________________________________________________________
    The glass is at one half capacity: nothing more, nothing less.

  • OP wrote there was no primary key.

    Is there a clustered index present?

    CREATE TABLE#Sample

    (

    Col1 VARCHAR(5),

    Col2 INT,

    Col3 CHAR(1)

    )

    INSERT#Sample

    SELECT'ABC', 24, 'M' UNION ALL

    SELECT'ABC', 24, 'M' UNION ALL

    SELECT'DEF', 24, 'M' UNION ALL

    SELECT'DEF', 24, 'F' UNION ALL

    SELECT'GHI', 26, 'F' UNION ALL

    SELECT'GHI', 26, 'F' UNION ALL

    SELECT'GHI', 26, 'F' UNION ALL

    SELECT'GHI', 26, 'F' UNION ALL

    SELECT'GHI', 26, 'F' UNION ALL

    SELECT'LMN', 27, 'M' UNION ALL

    SELECT'LMN', 27, 'M' UNION ALL

    SELECT'LMN', 27, 'M' UNION ALL

    SELECT'PQRS', 25, 'F' UNION ALL

    SELECT'XYZ', 24, 'M' UNION ALL

    SELECT'XYZ', 25, 'M'

    CREATE CLUSTERED INDEX IX_Sample ON #Sample (Col1, Col2, Col3)

    DECLARE@Col1 VARCHAR(5),

    @Col2 INT,

    @Col3 CHAR(1)

    UPDATE#Sample

    SET@Col3 = Col3 =CASE

    WHEN Col1 = @Col1 AND Col2 = @Col2 AND Col3 = @Col3 THEN Col3

    ELSE 'x'

    END,

    @Col1 = Col1,

    @Col2 = Col2,

    @Col3 = Col3

    SELECT*

    FROM#Sample

    DELETE

    FROM#Sample

    WHERECol3 = 'x'

    SELECT*

    FROM#Sample

    DROP TABLE#Sample


    N 56°04'39.16"
    E 12°55'05.25"

  • Thanks Greg. Glad you liked the article.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Well, I see you guys had a lot of fun here.

    OK, I read this thread as well.

    And I cannon figure out one thing.

    Everyone agreed the puzzle was designed to prove some problems cannot be solved without loops.

    OK.

    I can see some set-based methods solving the problem, including the one nobody mentioned (or I missed it): count distinct rows and then with a little help from Tally wizard restore CUNT(*)-1 rows.

    But here is a funny question: how to solve it USING a cursor?

    Yes, it's easy to find 1st non-duplicated row which you need to delete.

    But how to do delete itself?

    _____________
    Code for TallyGenerator

  • DELETE WHERE CURRENT OF?


    N 56°04'39.16"
    E 12°55'05.25"

  • Sergiy (8/6/2008)


    Well, I see you guys had a lot of fun here.

    OK, I read this thread as well.

    And I cannon figure out one thing.

    Everyone agreed the puzzle was designed to prove some problems cannot be solved without loops.

    OK.

    I can see some set-based methods solving the problem, including the one nobody mentioned (or I missed it): count distinct rows and then with a little help from Tally wizard restore CUNT(*)-1 rows.

    But here is a funny question: how to solve it USING a cursor?

    Yes, it's easy to find 1st non-duplicated row which you need to delete.

    But how to do delete itself?

    That is a great question....2005 supports the "delete top", but how to do in 2000 WITH cursor. I have to give major kudos to Chris Morris first though, that is awesome. I was on those lines, but was afraid of an "update @variable = @variable + column1" type logic due to the "no loops" restriction, but you totally nailed it! Nice Job...

    This is ONLY in response to the question of how to solve WITH a cursor, and it would only work in 2005...so any help would be appreciated:

    Create Table #test(

    [Name] varchar(5),

    Age tinyint,

    Sex char(1)

    )

    Go

    Insert Into #Test ([Name], Age, Sex)

    Select 'ABC', 24,'M' Union All

    Select 'ABC', 24,'M' Union All

    Select 'DEF', 24,'M' Union All

    Select 'DEF', 24,'F' Union All

    Select 'GHI', 26,'F' Union All

    Select 'GHI', 26,'F' Union All

    Select 'GHI', 26,'F' Union All

    Select 'GHI', 26,'F' Union All

    Select 'GHI', 26,'F' Union All

    Select 'LMN', 27,'M' Union All

    Select 'LMN', 27,'M' Union All

    Select 'LMN', 27,'M' Union All

    Select 'PQRS', 25,'F' Union All

    Select 'XYZ', 24,'M' Union All

    Select 'XYZ', 25,'M'

    --

    Declare @name varchar(50), @age int, @sex char(1)

    Declare cur_delete cursor for (

    Select distinct [name], age, sex from #test

    )

    Open cur_delete

    Fetch Next From cur_delete into @name, @age, @sex

    While @@Fetch_status = 0

    Begin

    Delete top(1) from #test where [name] = @name and age = @age and sex = @sex

    Fetch Next From cur_delete into @name, @age, @sex

    End

    Close cur_delete

    Deallocate cur_delete

    Select * from #test

    Drop table #test

    ...My next question would be, what is the most efficient way to solve this problem, barring nothing ;)...

  • You can use SET ROWCOUNT instead of using TOP.

    DECLARE@Sample TABLE

    (

    Name VARCHAR(5),

    Age TINYINT,

    Sex CHAR(1)

    )

    INSERT@Sample

    SELECT'ABC', 24, 'M' UNION ALL

    SELECT'ABC', 24, 'M' UNION ALL

    SELECT'DEF', 24, 'M' UNION ALL

    SELECT'DEF', 24, 'F' UNION ALL

    SELECT'GHI', 26, 'F' UNION ALL

    SELECT'GHI', 26, 'F' UNION ALL

    SELECT'GHI', 26, 'F' UNION ALL

    SELECT'GHI', 26, 'F' UNION ALL

    SELECT'GHI', 26, 'F' UNION ALL

    SELECT'LMN', 27, 'M' UNION ALL

    SELECT'LMN', 27, 'M' UNION ALL

    SELECT'LMN', 27, 'M' UNION ALL

    SELECT'PQRS', 25, 'F' UNION ALL

    SELECT'XYZ', 24, 'M' UNION ALL

    SELECT'XYZ', 25, 'M'

    DECLARE@Name VARCHAR(5),

    @Age TINYINT,

    @Sex CHAR(1)

    DECLAREcurDelete CURSOR FOR

    SELECT DISTINCT Name, Age, Sex FROM @Sample

    OPENcurDelete

    FETCH NEXT FROM curDelete INTO @Name, @Age, @Sex

    SET ROWCOUNT 1

    WHILE @@FETCH_STATUS = 0

    BEGIN

    DELETE@Sample

    WHEREName = @Name

    AND Age = @Age

    AND Sex = @Sex

    FETCH NEXT FROM curDelete INTO @Name, @Age, @Sex

    END

    SET ROWCOUNT 0

    CLOSE curDelete

    DEALLOCATE curDelete

    SELECT * FROM @Sample


    N 56°04'39.16"
    E 12°55'05.25"

  • Gosh Jeremy :blush: thanks!

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Nice, that works. I forgot about SET ROWCOUNT, havent had to use in a while. Anyways, now there is a post from me with a cursor in it....crap! 😉 What I like best about this entire article/discussion is that if cursors dont even have a place in this highly invented scenerio... enough said.

Viewing 15 posts - 121 through 135 (of 137 total)

You must be logged in to reply to this topic. Login to reply