Just For Fun: An Impossible Delete

  • karthikeyan (8/5/2008)


    Excellent article !:)

    Actually you gave a variety of good & tasty food in a single box. If you write more articles like this , it would help very much for sql developer who would like to improve their SQL knowledge. Again, Great Article ! ๐Ÿ™‚

    Thanks for the feedback, Karthik.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • timmy_patterson (8/5/2008)


    I'm having trouble understanding Step 4. Where do tables 'TAB' and 'Tally' come from?

    Thanks,

    Tim

    Insert into SOURCE(Name, Age, Sex)

    Select Name

    , Age

    , CHAR(ASCII(Sex) & 1) --extract and preserve the [Sex] field

    From TAB

    Join Tally ON Tally.Number <= Ascii(Sex)/2

    Where Tally.Number Between 2 and 127

    Oops, another typo! TAB should be SOURCE. A Tally table is just another name for a "Numbers" table. In other words, a table that has every integer from 1 to "X". In our case X would not have to be greater than 127. Many DBA's keep them around pre-made because of how useful they are for performance and for avoing cursors (See Jeff Moden's many articles on this).

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • I took Shaun's recommendation and found a good article here: http://www.sqlservercentral.com/articles/TSQL/62867/.

    I never heard of a Tally table before. This is good stuff. I already know places where I can use it to boost performance.

    I've been masking bits for years. I even do in in SQL but I've never used it like you did. Very clever. The most valuable nugget I got from this whole thing is the Tally table. I can't wait to use it. :satisfied:

    Many thanks for the great article and thanks to all the great responses!

  • Ryan Beesley (8/5/2008)


    If you know that you will not have more than 128 possible "duplicate" records, I agree that this is a clever solution.

    Yes, and that seems like a reasonable assumption on my part. After all what would be the purpose of so many identical records?

    Also, as I stated, I could have gotten at least 4 more bits from the NAME column, If I needed them. Makes it more complex, but that would take me up to 2047 duplicates.

    In the initial design of the table, unless you are trying to allow for some interesting possibilities, why wouldn't this already be a bit field?

    Well, it could have been, but it wasn't, as evidenced by the fact that it held either an "M" which is not legal for a bit field. I didn't design the table or the problem, someone posted it here. Back in the day, of course, we would never use 8 bits where 1 would do, but today we've got the space so we use it to make things easier on ourselves.

    Secondly, why would you choose the low order bit to retain the original value. When you tally the results, you will either need to save the bit when manipulating the result and or add 2 for each tally. Using the high order bit would allow you to add 1 in either case and then just use a mask to check that high bit.

    Sorry, I cannot see a difference either way and I just find it easier to keep track of a low order bit, in my head. Maybe you could post an example of the procedure using the high-order bit instead?

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Barry - how cool to see something so Old School. I took my first Fortran course in about 1970, and became a staff programmer at UH soon after. You're right - people would cram anything into a byte ๐Ÿ˜€ to save some space, and if you had a whole "word", jeez - you would put an entire record in that quicker than you could say "MOUNT TAPE". Thanks for the memories...

    Ken

  • Likewise, the F's are stored as "01000110" in upper-case and "01000110" in lower

    I've looked very closely, and I can't see any difference between those two. Or maybe you meant "01100110 in lower"?

  • Awesome article. It is good to know and makes me very glad that memory was much less constrained than that by the time I started programming.

    ---
    Timothy A Wiseman
    SQL Blog: http://timothyawiseman.wordpress.com/

  • Jonathan Rynd (8/5/2008)


    Likewise, the F's are stored as "01000110" in upper-case and "01000110" in lower

    I've looked very closely, and I can't see any difference between those two. Or maybe you meant "01100110 in lower"?

    Right you are. A previous commentor picked up on this also, my apologies.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • timothyawiseman (8/5/2008)


    Awesome article. It is good to know and makes me very glad that memory was much less constrained than that by the time I started programming.

    Thanks for the feedback Timothy.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Great article and great solutions to an "interesting" requirement set!

    I admit I didn't read the original post, but the assumption Chris's and Phil's solutions rely on (that duplicate rows are grouped together) seems to be implied by the original poster's phrase: "Now i would wish to 'DELETE' the 1st original row ( Row no 1,3,6,7)and keep the other duplicates from above data. "

    I agree with what I think Tao is hinting at, though. Back in the day when tiny disk drives cost more than my house, a solution that nearly doubled the table size wouldn't likely fly. Of course it's not likely a system would be designed to store duplicates, either. And, if any were found, they would be deleted, not preserved.

    Also, the concept of storing a tally table would be a pretty hard sell. "So you're suggesting we have a permanent table of nothing but consectutive numbers? REALLY? Well, how high should we count? I say we use long ints for this table of yours and we'll take the storage cost out of your salary."

    Anyway, as reluctant as I am to suggest this, (please don't hate me, Jeff) if one assumes duplicate rows are grouped together, a RBAR solution to this interesting problem is likely what would have been adopted back then. Of course you can forget about concurrency if you're RBAR grinding through GBs of data.

    Please don't get me wrong. I totally understand and promote set based solutions over RBAR and have used tally tables to great advantage. It's just interesting to see hybrid solutions combining 70s constraints, goals, and techiques with modern ones. We certainly have it better than we used to, but we're doing a lot more now, too.

  • 60s and 70s ๐Ÿ˜€

  • WOW! It's not often I get to learn as much from the comments as from the article which originated the comments! Of course, I'm at that stage where data storage is not a problem - it's the algorithm used to retrieve the data out of my grey-matter storage device.

    Thanks to all for the lessons learned!

  • Back in the day, of course, we would never use 8 bits where 1 would do, but today we've got the space so we use it to make things easier on ourselves.

    Except that, in this case, SQL would still allocate a full byte for that single bit column. The remaining seven bits would be wasted. Now, if you had multiple bit columns (up to eight) you would begin to see a savings.

    Andrew

    --Andrew

  • Awesome article.

    And awesome technique.

    And although "prohibited" in the article (a "Loop" is used), I cannot withstand to post another possible solution to this type of task:

    [font="Courier New"]DECLARE @Current nvarchar(4000); -- Iterator for the distinct rows

    -- Select first group of dupes (or single record)

    -- And delete the first occurrence using the WHERE clause

    -- and ROWCOUNT

    SET @Current = (SELECT MIN(Name + CAST(Age AS CHAR(2)) + Sex) FROM TAB);

    SET ROWCOUNT 1; -- Limit DELETE to one row

    DELETE TAB WHERE Name + CAST(Age AS CHAR(2)) + Sex = @Current

    SET ROWCOUNT 0; -- Reset

    -- Loop through each of the remaining group and delete one

    -- occurrence of the group.

    WHILE 1=1

    BEGIN

    --Chose next group

    SET @Current = (SELECT MIN(Name + CAST(Age AS CHAR(2)) + Sex)

    FROM (SELECTName, Age, Sex

    FROMTAB

    WHEREName + CAST(Age AS CHAR(2)) + Sex > @Current) A

    );

    IF @Current IS NULLBREAK; -- Exit WHILE if no more groups.

    SET ROWCOUNT 1; -- Limit to one row

    DELETE TAB WHERE Name + CAST(Age AS CHAR(2)) + Sex = @Current;

    SET ROWCOUNT 0; -- Reset

    END;[/font]

    And while we are at it, does anyone know how to add color IF codes? When I add them, they display as text like here:

    Best Regards,

    Chris Bรผttner

  • Christian Buettner (8/5/2008)


    Awesome article.

    And awesome technique.

    ...

    And while we are at it, does anyone know how to add color IF codes? When I add them, they display as text like here:

    Thanks, Christian.

    And you have to quote the argument in the color IFCode: [ color="red" ]

    (the buttons get it wrong).

    You can also use the Simple-Talk Prettifier for color-coded SQL:http://extras.sqlservercentral.com/prettifier/prettifier.aspx

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

Viewing 15 posts - 46 through 60 (of 156 total)

You must be logged in to reply to this topic. Login to reply