Just For Fun: An Impossible Delete

  • bob.holbrey (8/5/2008)


    A really simple option which is able to take advantage that "age" is a number...

    Create a table "test" with the following data as shown:

    select * from test

    NameAgeSex

    ABC 24M

    ABC 24M

    DEF 24M

    DEF 24M

    GHI 26F

    GHI 26F

    GHI 26F

    PQRS 25F

    insert test

    select distinct name, age*1000, sex

    from test

    delete from test where age<1000

    update test

    set age=age/1000

    select * from test

    NameAgeSex

    ABC 24M

    DEF 24M

    GHI 26F

    PQRS 25F

    i believe the desired solution was to delete only the FIRST original row and keep the other duplicates - so your result should include multiple GHI rows... but not the PQRS row since it was a single original row:

    ABC 24 M

    ABC 24 M

    LMN 27 M

    LMN 27 M

    LMN 27 M

    PQRS 25 F

    XYZ 24 M

    XYZ 25 M

    Now i would wish to 'DELETE' the 1st original row ( Row no 1,3,6,7)and keep the other duplicates from above data. Now, If there is a row without any duplicates present.. it will be deleted ( for eg Row no 6)

  • bob.holbrey (8/5/2008)


    A really simple option which is able to take advantage that "age" is a number...

    ...

    select distinct name, age*1000, sex

    Right Bob, that was another option. However, I did not take it because I felt that I could not assume that AGE was any larger than TINYINT (one byte), "1000" could have been out of its range.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Good article, anything that gets this many responses is, IMHO, a good article. Being a relative newcomer to the field (late 90's) I never had to deal with the lack of storage issues as you had to, so would never have come up with that solution. Of course this is what I love about SSC, never saw the original thread, but now I have some new ways to solve the problem.

  • Before I get into my first impression I will say it is a good article. And you should be aware people can have specific habits they don't realize which can lead to specific solutions.

    But my first impression was the first example for the contact numbers. I noticed you made 4 passes over the data instead of just the one. And even thou you option works the issue the customer faced was they had two seperate updates which caused the issue. If both had been handled in the same UPDATE they issue does not occurr and I would have done something like this

    BEGIN Transaction

    UPDATE CONTACT_NUMBERS

    SET CallOrder = (CASE WHEN CallOrder = 1 THEN 2 ELSE 1 END)

    COMMIT Transaction

    as this makes one pass across the data and adjust all records at the same time the conflict does not occurr.

    With all that said you still offer people who may other ways they could do things should there be a possible limitation.

    Additionally, I would add you also have to weigh the system impact of the choice you make. As I stated, in your first example you make 4 passes over the dataset and you also write all the ouched rows twice to the log file. The method I present makes 1 pass and logs each row only once to the log.

    Lastly, I would add that if you are going to provide examples please be consistent. In the contact number example you provide the script to build the sample set but in the case of the example which is the focus of the article you do not. Just keep this in mind when you write further articles is be consistent in your presentation stlye throughout the article.

  • Warning - Newcomer here - This is the first challenge I've tried to actually get a result out of. Let me know what you think.

    I chose to find a duplicate, and remove it. This is repeated until there are no more duplicates left. I ran this through sql2005...no luck with 2000.

    Thanks for you input

    - Jon

    -----------------------------------------------------------

    /*This First part is just to setup the example

    --CREATE THE TEST TABLE

    CREATE TABLE tblTest

    (

    NAMES varchar(50) ,

    AGE varchar(50),

    SEX varchar(50)

    )

    --INSERT VALUE INTO TEST TABLE

    INSERT Into tblTest

    Select 'ABC','24', 'M'

    UNION ALL Select 'ABC','24', 'M'

    UNION ALL Select 'LMN','27', 'M'

    UNION ALL Select 'LMN','27', 'M'

    UNION ALL Select 'LMN','27', 'M'

    UNION ALL Select 'PQRS','25', 'F'

    UNION ALL Select 'XYZ','24', 'M'

    UNION ALL Select 'XYZ','25', 'M'

    --DROP TABLE IF NEEDED

    DROP TABLE TBLTEST

    */

    DECLARE @NAMES VARCHAR(50)

    DECLARE @AGE VARCHAR(50)

    DECLARE @SEX VARCHAR(50)

    WHILE 1 = 1

    BEGIN

    SELECT @NAMES = (SELECT TOP(1) NAMES FROM (SELECT NAMES,AGE,SEX, count(*) 'COUNT' from TBLTEST group by NAMES ,AGE,SEX)T1

    WHERE COUNT > 1)

    SELECT @AGE = (SELECT TOP(1) AGE FROM (SELECT NAMES,AGE,SEX, count(*) 'COUNT' from TBLTEST group by NAMES ,AGE,SEX)T1

    WHERE COUNT > 1)

    SELECT @SEX = (SELECT TOP(1) SEX FROM (SELECT NAMES,AGE,SEX, count(*) 'COUNT' from TBLTEST group by NAMES ,AGE,SEX)T1

    WHERE COUNT > 1)

    ;

    IF @NAMES IS NULL

    BEGIN

    BREAK

    END

    ELSE

    WITH TMP AS

    (SELECT * FROM TBLTEST

    WHERE NAMES = @NAMES

    AND [AGE] = @AGE

    AND [SEX] = @SEX)

    DELETE TOP(1) FROM TMP

    END

  • Antares686 (8/5/2008)


    But my first impression was the first example for the contact numbers. I noticed you made 4 passes over the data instead of just the one. And even thou you option works the issue the customer faced was they had two seperate updates which caused the issue. If both had been handled in the same UPDATE they issue does not occurr and I would have done something like this

    BEGIN Transaction

    UPDATE CONTACT_NUMBERS

    SET CallOrder = (CASE WHEN CallOrder = 1 THEN 2 ELSE 1 END)

    COMMIT Transaction

    as this makes one pass across the data and adjust all records at the same time the conflict does not occurr.

    True, however, recall that I qualified this example in the article:

    And yes, I do know that there are other ways to do this correctly, especially with a CASE function. For reasons that I cannot get into, that was not an option here.

    The reason I did not use it is because the customer was not familiar with the CASE function and I was talking them through this over the phone. Since I could not see what they were typing, I was concerned that they could make a simple typo, (not visible to me) that would take their situation from bad to worse.

    Consequently, I opted for the slower but safer option, of preserving the command sturcture that they already had and just executing it twice.

    Lastly, I would add that if you are going to provide examples please be consistent. In the contact number example you provide the script to build the sample set but in the case of the example which is the focus of the article you do not. Just keep this in mind when you write further articles is be consistent in your presentation stlye throughout the article.

    Good point, and I did have the INSERT's for a bunch of test data too, I just forgot to include them. Oh well. Anyone who still wants some sample data, here is the data from Chris Morris's post:

    DROP TABLE #Testing

    CREATE TABLE #Testing (

    [Name] Varchar(4),

    Age TINYINT,

    Sex Char(1)

    )

    INSERT INTO #Testing ([name], Age, Sex)

    SELECT 'ABC',24,'M' UNION ALL --

    SELECT 'ABC',24,'M' UNION ALL

    SELECT 'DEF',24,'M' UNION ALL --

    SELECT 'DEF',24,'F' UNION ALL

    SELECT 'GHI',26,'F' UNION ALL --

    SELECT 'GHI',26,'F' UNION ALL

    SELECT 'GHI',26,'F' UNION ALL

    SELECT 'GHI',26,'F' UNION ALL

    SELECT 'GHI',26,'F' UNION ALL

    SELECT 'LMN',27,'M' UNION ALL --

    SELECT 'LMN',27,'M' UNION ALL

    SELECT 'LMN',27,'M' UNION ALL

    SELECT 'PQRS',25,'F' UNION ALL --

    SELECT 'XYZ',24,'M' UNION ALL --

    SELECT 'XYZ',25,'M' --

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • I ran into this problem before and came up with a solution using Distinct, which may not be the most efficent but it is very quick and easy, and seems to work well on smaller tables.

    The gist is to make a copy of the table you want to remove the dups from then truncate that table and then Select Distinct back into the table from the copy. This example starts with 7 rows and ends up with 3.

    --Setup

    Create table TAB (

    Fnamevarchar(50),

    Ageint,

    Sexchar(1))

    Insert into TAB

    Select'ABC',24,'M' Union ALL

    Select'ABC',24,'M' Union ALL

    Select'ABC',25,'M' Union ALL

    Select'DEF',35,'F' Union ALL

    Select'DEF',35,'F' Union ALL

    Select'DEF',35,'F' Union ALL

    Select'DEF',35,'F'

    --Add all records to temp table

    Select *

    into #temp

    From TAB

    --Clear out table and add records back in

    Truncate table TAB

    Insert into TAB (Fname, Age, Sex)

    Select Distinct Fname, Age, Sex

    From #temp

    --Output

    Select *

    From TAB

    --Cleanup

    drop table TAB, #temp

  • Excellent article !:)

    Actually you gave a variety of good & tasty food in a single box. If you write more articles like this , it would help very much for sql developer who would like to improve their SQL knowledge. Again, Great Article ! 🙂

    karthik

  • try as hard as you can;-

    and still the dupe removers and requirments non-readers supply solutions to the problem already

    answered/solved and burried by rbarryyoung, chris.morriss and Phil Factor.

    All legends.

    *Shaun drinks a swig of calm juice*

    People pay attention;-

    We neither need nor want any additional code or solutions unless they address the issue involving all of the requirements of the article and provide something more than has already been written!

    *mutters they won't read this anyway, at least I tried!*

    Hiding under a desk from SSIS Implemenation Work :crazy:

  • Shaun McGuile (8/5/2008)


    try as hard as you can;-

    and still the dupe removers and requirments non-readers supply solutions to the problem already

    answered/solved and burried by rbarryyoung, chris.morriss and Phil Factor.

    All legends.

    *Shaun drinks a swig of calm juice*

    People pay attention;-

    We neither need nor want any additional code or solutions unless they address the issue involving all of the requirements of the article and provide something more than has already been written!

    *mutters they won't read this anyway, at least I tried!*

    Sorry I wasn't one of the first 5 responders, I believe my solution is one of the easiest so far and definately needs to be posted.

  • I'm having trouble understanding Step 4. Where do tables 'TAB' and 'Tally' come from?

    Thanks,

    Tim

    Insert into SOURCE(Name, Age, Sex)

    Select Name

    , Age

    , CHAR(ASCII(Sex) & 1) --extract and preserve the [Sex] field

    From TAB

    Join Tally ON Tally.Number <= Ascii(Sex)/2

    Where Tally.Number Between 2 and 127

  • You used a temp table - forbidden by the article requirements.

    Sorry,

    Just havin a bad mood moment, please don't take it personally, I get worked up on long running threads....;)

    Hiding under a desk from SSIS Implemenation Work :crazy:

  • Dang, I thought I was onto something there. I went back and re-read the requirements and broke all kinds of stuff. Thanks for the 'heads up'.

  • timmy_patterson (8/5/2008)


    I'm having trouble understanding Step 4. Where do tables 'TAB' and 'Tally' come from?

    Thanks,

    Tim

    Insert into SOURCE(Name, Age, Sex)

    Select Name

    , Age

    , CHAR(ASCII(Sex) & 1) --extract and preserve the [Sex] field

    From TAB

    Join Tally ON Tally.Number <= Ascii(Sex)/2

    Where Tally.Number Between 2 and 127

    TAB looks like a typo probably should be SOURCE

    and TALLY is the tally table you should have in any and all databases -if you do not have one you need one so get one - search this site for articles. 🙂

    Hiding under a desk from SSIS Implemenation Work :crazy:

  • If you know that you will not have more than 128 possible "duplicate" records, I agree that this is a clever solution. In the initial design of the table, unless you are trying to allow for some interesting possibilities, why wouldn't this already be a bit field? Secondly, why would you choose the low order bit to retain the original value. When you tally the results, you will either need to save the bit when manipulating the result and or add 2 for each tally. Using the high order bit would allow you to add 1 in either case and then just use a mask to check that high bit.

Viewing 15 posts - 31 through 45 (of 156 total)

You must be logged in to reply to this topic. Login to reply