Delete 1000 records randomly

  • To test a program, I need to delete 1000 records randomly in TableOrder.

    How to code to do it if there is a ID column in TableOrder?

  • is there any criteria or just delte 1000 records randomly?

    Dan

    If only I could snap my figures and have all the correct indexes apear and the buffer clean and.... Start day dream here.

  • OK, So this is not a complete answer. But it now has me asking WTF?

    The code below SHOULD work, at least from what I can see it should. You'll see the issue if you run this multiple times. The end result should ALWAYS be 6 rows. Right?????

    DECLARE @t TABLE (nDex INT IDENTITY(1,1), val VARCHAR(9))

    -- sample data

    INSERT @t

    SELECT 'a' UNION ALL SELECT 'b' UNION ALL SELECT 'c' UNION ALL SELECT 'd' UNION ALL

    SELECT 'e' UNION ALL SELECT 'f' UNION ALL SELECT 'g' UNION ALL SELECT 'h' UNION ALL

    SELECT 'i' UNION ALL SELECT 'j' UNION ALL SELECT 'k' UNION ALL SELECT 'l' UNION ALL

    SELECT 'm' UNION ALL SELECT 'n' UNION ALL SELECT 'o' UNION ALL SELECT 'p'

    SELECT * FROM @t ORDER BY ndex

    DELETE t

    FROM @t AS t

    INNER JOIN (SELECT TOP 10 nDex FROM @t ORDER BY NEWID()) AS b

    ON b.ndex = t.nDex

    SELECT * FROM @t ORDER BY ndex

    ______________________________________________________________________

    Personal Motto: Why push the envelope when you can just open it?

    If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.

    Jason L. Selburg
  • Modify my previous script to use a temp table instead of a table variable, and it will work.

    I think I've found a Question of the day to submit. 🙂

    ______________________________________________________________________

    Personal Motto: Why push the envelope when you can just open it?

    If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.

    Jason L. Selburg
  • What I have donw in the past to select a random set of data is use the newID() function.

    So select top 10 * from table1

    order by newID()

    NEWID() will generate a random ID and by sorting and selecting the top 10 you will end up getting a random set of resutls every time.

    Dan

    If only I could snap my figures and have all the correct indexes apear and the buffer clean and.... Start day dream here.

  • adonetok (3/18/2011)


    To test a program, I need to delete 1000 records randomly in TableOrder.

    How to code to do it if there is a ID column in TableOrder?

    This almost sounds malicious. Why do you need to do this?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Something like this should get the job done quickly and dirtily as long as you have a decent enough sample size to hand to TABLESAMPLE. Be aware however that TABLESAMPLE makes no guarantees as to how many rows it will return. It can even return 0 at times but with a sample size of 10000 I would expect it to return something. If your requirement is to delete 1000 rows exactly, and no less, then you'll want to take that into consideration.

    DELETE TOP (1000)

    FROM Person.Contact

    WHERE ContactID IN (SELECT ContactID

    FROM Person.Contact TABLESAMPLE(10000 ROWS)) ;

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Jason Selburg (3/18/2011)


    Modify my previous script to use a temp table instead of a table variable, and it will work.

    I think I've found a Question of the day to submit. 🙂

    That is bizarre...I don't like that at all, it seems like it could be a bug. I hope someone chimes in with an explanation. If not you should head over to the Connect website to see if it's been submitted. For now this one is just one more reason not to torment the ANSI-SQL gods by using ORDER BY in a derived table 😛

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three (3/18/2011)


    That is bizarre...I don't like that at all, it seems like it could be a bug. I hope someone chimes in with an explanation.

    It's a simple case of "Halloweening". The optimizer makes a mistake in how to do the problem because the Table Variable is estimated to only have one row in it. Add OPTION(RECOMPILE) to the DELETE and the optimizer makes a much better choice because it has a better estimate of the number of rows.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • As a bit of a sidebar, add one more reason to why I avoid the use of Table Variables as a general rule with very, very few exceptions. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ah... what the heck. Here's what I stored in my notes for the example.

    /**********************************************************************************************************************

    The goal is to delete 10 randomly selected rows from a table. It doesn't work with a Table Variable alone.

    This is one of those places where SQL Server sometimes loses it's mind and it's known as "Halloweening". It's when SQL

    Server basically decides to make an "internal Cross-Join" to solve a problem. For every row in Table "A", it makes a

    full scan of Table "B". If you don't think so, take a look at the table scan symbol in the lower right corner of the

    Actual Execution Plan and look at the arrow coming out of it. It has 256 actual rows coming out of it which is the

    number of rows in the table times itself (16 in this case).

    In this case, there's a bit of obfuscation of the problem caused by the TOP 10... if you read all 16 rows from the table

    and take the TOP 10, you should get what you want EXCEPT that it's doing that 16 times. If you take the Top 10 of the

    same set of 16 rows of data in random order, you end up with more or less than 10 unique ID's being chosen because of

    the random sort. Even though the Inner Join that follows that will "uniquify" the ID's there are still more than 10

    which is why you end up deleting more or less than 10.

    There are three fixes for this. One is to trick SQL Server into doing the right thing by using an INNER MERGE JOIN.

    Another thing you can do is use a Temp Table so SQL Server can make a better "guess" at the Execution Plan where it will

    throw in an extra table spool to sort the mess out. The reason it'll make a better guess with the Temp Table is because

    it can actually estimate the rows instead of SQL Server estimating that there's just one row in the table (which won't

    need the table spool) because of the fact that Table Variables are estimated as having just one row. You can prove

    that by using a third possible fix... force a recompile for the DELETE by using OPTION(RECOMPILE) and it'll work

    correctly every time.

    As a side bar, this (being evaluated as a single row no matter how many rows it has) is why I don't use Table Variables

    in ANY T-SQL except in functions and special "rollback proof" code (very rarely need) even if they are occasionally

    faster. I just don't trust the damned things. They're also a PITA to troubleshoot because they don't persist data in

    SSMS.

    The explanation for a QOD should be that it'll delete more or less than 10 rows in an unpredictable fashion because the

    Halloweening will cause the SELECT TOP 10 to run and return 10 random ID's once for each row in the table and there may

    be more or less than 10 unique values returned by those (in this case), 16 executions.

    Run this a dozen times and see what you get.

    **********************************************************************************************************************/

    --===== Create a table variable to store the example data in.

    DECLARE @t TABLE (RowNum INT IDENTITY(1,1), SomeValue VARCHAR(9))

    ;

    --===== Populate the table with example data

    INSERT @t

    (SomeValue)

    SELECT 'a' UNION ALL SELECT 'b' UNION ALL SELECT 'c' UNION ALL SELECT 'd' UNION ALL

    SELECT 'e' UNION ALL SELECT 'f' UNION ALL SELECT 'g' UNION ALL SELECT 'h' UNION ALL

    SELECT 'i' UNION ALL SELECT 'j' UNION ALL SELECT 'k' UNION ALL SELECT 'l' UNION ALL

    SELECT 'm' UNION ALL SELECT 'n' UNION ALL SELECT 'o' UNION ALL SELECT 'p'

    ;

    --===== Try to do the random delete of ten rows

    DELETE t

    FROM @t AS t

    INNER JOIN (

    SELECT TOP 10 RowNum FROM @t ORDER BY NEWID()

    ) AS b

    ON b.RowNum = t.RowNum

    ;

    --===== Show that it didn't work

    SELECT @@ROWCOUNT AS RowsDeleted

    ;

    SELECT COUNT(*) AS RowsKept FROM @t

    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply