Deleting data from a table ... quickly

  • I'm working on a system which stores what you might call "temporary working data" in a number of database tables. To manage concurrent calculations, these all have a column to record the @@SPID of the connection - this is used to ensure that each connection only uses its own data.

    At various points in the calculation (specifically, at the very start before anything's done, and at the end after everything's done), we want to clear a particular table of any and all data for a specific @@SPID. At the moment, we're doing this:

    If Exists (Select 1 From dbo.TTemp_MaintainActuals With (NoLock) Where (SPID = @@SPID))
     Delete XXX
      From dbo.TTemp_MaintainActuals XXX With (ForceSeek)
      Where (XXX.SPID = @@SPID)

    That works just fine but with an eye on performance, and bearing in mind that these tables could contain a reasonably large (although not massive) number of rows, I'd like to do a Truncate instead of a Delete where possible. The "where possible" bit obviously means where the table doesn't contain any rows from other connections.

    To make this as multi-thread safe / bulletproof as possible, I'm thinking of the following sort of approach:


    Declare @Truncate Bit = 1

    If (@@TranCount = 0)
     Set @Truncate = 0
    Else
    Begin
     Set Lock_Timeout 0

     Begin Try
      If Exists (Select 1 From dbo.TTemp_MaintainActuals With (TabLock, HoldLock) Where (SPID <> @@SPID))
       Set @Truncate = 0
     End Try
     Begin Catch
      Set @Truncate = 0
     End Catch
    End

    If (@Truncate = 0)
    Begin
     If Exists (Select 1 From dbo.TTemp_MaintainActuals With (NoLock) Where (SPID = @@SPID))
      Delete XXX
       From dbo.TTemp_MaintainActuals XXX With (ForceSeek)
       Where (XXX.SPID = @@SPID)
    End
    Else
     Truncate Table dbo.TTemp_MaintainActuals

    The aim of the "If Exists (Select 1 From dbo.TTemp_MaintainActuals With (TabLock, HoldLock) Where (SPID <> @@SPID))" is twofold:

    1) See if there are any rows belonging to other connections.
    2) Get a lock on the table for the duration of the proposed truncation.

    The problem is that at the end of the code, the connection still has a lock on the whole table, which is only relinquished when its transaction is committed or rolled back. And that is a problem for other connections wanting to make use of it.

    I can't find any other way of releasing the lock - does anybody know of one?

  • May I ask what may be a dumb question?  Why aren't each of the processes using their own temporary table so that you aren't sharing a table among multiple processes for these particular calculations?

  • Excellent question! And one we have considered. We found that creating temporary tables for each process was significantly slower than using permanent ones.

  • julian.fletcher - Monday, October 22, 2018 10:04 AM

    Excellent question! And one we have considered. We found that creating temporary tables for each process was significantly slower than using permanent ones.

    Care to expand on this?  How were the tables being created?  Were you creating # temporary tables or unique permanent tables for each process?

  • We were making # tables. We have considered making permanent tables for each process (e.g. TTemp_MaintainActuals1, TTemp_MaintainActuals2, TTemp_MaintainActuals3, TTemp_MaintainActuals4 ... TTemp_MaintainActuals32767) but that would require quite a few new problems to be resolved.

  • Just delete the rows.  If it's not that many rows, it's more work than it's worth to try to TRUNCATE.

    I suspect (hope) the temp table is clustered first on SPID.  If it's not, do that.  But other than that, you should be OK.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher - Monday, October 22, 2018 11:21 AM

    Just delete the rows.  If it's not that many rows, it's more work than it's worth to try to TRUNCATE.

    I suspect (hope) the temp table is clustered first on SPID.  If it's not, do that.  But other than that, you should be OK.

    Ah, be careful now... clustering on the SPID will create silos that are inserted into "out of order" which will cause massive page splits and huge amounts of log file usage even in the SIMPLE recovery model unless you can convert it to SPID followed by some form of ever-increasing column.  Even if you have that, if you also have "Expansive" updates, insert and update performance is going to tank compared to what could be and log file activity will rise to about 43 times as much as it needs to be.  You'll also end up with incredibly low page density within the silos, which is where most of the work will be done which also wastes a ton of memory on mostly emptied pages.

    I can't upload an image from where I'm at right now.  I'll do it when I get home but you'll be shocked at what such an index looks like under such conditions.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • julian.fletcher - Monday, October 22, 2018 10:23 AM

    We were making # tables. We have considered making permanent tables for each process (e.g. TTemp_MaintainActuals1, TTemp_MaintainActuals2, TTemp_MaintainActuals3, TTemp_MaintainActuals4 ... TTemp_MaintainActuals32767) but that would require quite a few new problems to be resolved.

    So why was the creating of the # tables slow?

  • julian.fletcher - Monday, October 22, 2018 8:53 AM

    I'm working on a system which stores what you might call "temporary working data" in a number of database tables. To manage concurrent calculations, these all have a column to record the @@SPID of the connection - this is used to ensure that each connection only uses its own data.

    At various points in the calculation (specifically, at the very start before anything's done, and at the end after everything's done), we want to clear a particular table of any and all data for a specific @@SPID. At the moment, we're doing this:

    If Exists (Select 1 From dbo.TTemp_MaintainActuals With (NoLock) Where (SPID = @@SPID))
     Delete XXX
      From dbo.TTemp_MaintainActuals XXX With (ForceSeek)
      Where (XXX.SPID = @@SPID)

    That works just fine but with an eye on performance, and bearing in mind that these tables could contain a reasonably large (although not massive) number of rows, I'd like to do a Truncate instead of a Delete where possible. The "where possible" bit obviously means where the table doesn't contain any rows from other connections.

    To make this as multi-thread safe / bulletproof as possible, I'm thinking of the following sort of approach:


    Declare @Truncate Bit = 1

    If (@@TranCount = 0)
     Set @Truncate = 0
    Else
    Begin
     Set Lock_Timeout 0

     Begin Try
      If Exists (Select 1 From dbo.TTemp_MaintainActuals With (TabLock, HoldLock) Where (SPID <> @@SPID))
       Set @Truncate = 0
     End Try
     Begin Catch
      Set @Truncate = 0
     End Catch
    End

    If (@Truncate = 0)
    Begin
     If Exists (Select 1 From dbo.TTemp_MaintainActuals With (NoLock) Where (SPID = @@SPID))
      Delete XXX
       From dbo.TTemp_MaintainActuals XXX With (ForceSeek)
       Where (XXX.SPID = @@SPID)
    End
    Else
     Truncate Table dbo.TTemp_MaintainActuals

    The aim of the "If Exists (Select 1 From dbo.TTemp_MaintainActuals With (TabLock, HoldLock) Where (SPID <> @@SPID))" is twofold:

    1) See if there are any rows belonging to other connections.
    2) Get a lock on the table for the duration of the proposed truncation.

    The problem is that at the end of the code, the connection still has a lock on the whole table, which is only relinquished when its transaction is committed or rolled back. And that is a problem for other connections wanting to make use of it.

    I can't find any other way of releasing the lock - does anybody know of one?

    you can use the TOP operator and delete by smaller chunks, something like this:
    DECLARE @batchSize INT
    DECLARE @results INT

    SET @results = 1 --stores the row count after each successful batch
    SET @batchSize = 1000

    WHILE (@results > 0)
    BEGIN

     Delete top(@batchSize)
     From dbo.TTemp_MaintainActuals XXX With (ForceSeek)
     Where (XXX.SPID = @@SPID)

     -- very important to obtain the latest rowcountrowcount to avoid infinite loops
     SET @results = @@ROWCOUNT

    END

    Of course you have to test it first, but that could be the general idea, with this method, you can reduce table locks, and reduce log usage. If indexes are good enough also the delete time will be reduced...

    MCSE, MCSA SQL Server Database Developer/Administrator

  • Jeff Moden - Monday, October 22, 2018 11:52 AM

    ScottPletcher - Monday, October 22, 2018 11:21 AM

    Just delete the rows.  If it's not that many rows, it's more work than it's worth to try to TRUNCATE.

    I suspect (hope) the temp table is clustered first on SPID.  If it's not, do that.  But other than that, you should be OK.

    Ah, be careful now... clustering on the SPID will create silos that are inserted into "out of order" which will cause massive page splits and huge amounts of log file usage even in the SIMPLE recovery model unless you can convert it to SPID followed by some form of ever-increasing column.  Even if you have that, if you also have "Expansive" updates, insert and update performance is going to tank compared to what could be and log file activity will rise to about 43 times as much as it needs to be.  You'll also end up with incredibly low page density within the silos, which is where most of the work will be done which also wastes a ton of memory on mostly emptied pages.

    I can't upload an image from where I'm at right now.  I'll do it when I get home but you'll be shocked at what such an index looks like under such conditions.

    Presumably not, if the SPIDs are deleted relatively quickly from the work table.  If it somehow does become an issue, just inflate the row so that only 1 or 2 fit per page.  That should address the problem too.  It's still less eternal maintenance than trying to figure out when you can TRUNC and when you can't.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Lynn Pettis - Monday, October 22, 2018 12:28 PM

    julian.fletcher - Monday, October 22, 2018 10:23 AM

    We were making # tables. We have considered making permanent tables for each process (e.g. TTemp_MaintainActuals1, TTemp_MaintainActuals2, TTemp_MaintainActuals3, TTemp_MaintainActuals4 ... TTemp_MaintainActuals32767) but that would require quite a few new problems to be resolved.

    So why was the creating of the # tables slow?

    Perhaps because they are doing SELECT ... INTO #... a lot.  That seems to create internal locks on the sys tables.  I thought MS had claimed to have lessened the issues with this, but we're on SQL 2016 and we still see it in legacy code where we haven't re-worked the SELECT INTOs.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Well, with a permanent table already created they aren't using SELECT ... INTO #<table>.  Why not create the #<table> then insert the data into it?

  • Lynn Pettis - Monday, October 22, 2018 1:51 PM

    Well, with a permanent table already created they aren't using SELECT ... INTO #<table>.  Why not create the #<table> then insert the data into it?

    When I said "perhaps because" I meant "maybe" they are using SELECT ... INTO, I don't know for sure.  But I know that can cause delays, sometimes severe delays, in using metadata, including table creation and deletion, and even in tempdb.  It's convenient to code, and a lot of people don't realize the potential issues they can see from it.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Whatever, dude.

  • ScottPletcher - Monday, October 22, 2018 2:05 PM

    Lynn Pettis - Monday, October 22, 2018 1:51 PM

    Well, with a permanent table already created they aren't using SELECT ... INTO #<table>.  Why not create the #<table> then insert the data into it?

    When I said "perhaps because" I meant "maybe" they are using SELECT ... INTO, I don't know for sure.  But I know that can cause delays, sometimes severe delays, in using metadata, including table creation and deletion, and even in tempdb.  It's convenient to code, and a lot of people don't realize the potential issues they can see from it.

    Do you have any demonstrable/repeatable examples? I've previously tested overlapping SELECT/INTOs and other ways of creating tables that started after the SELECT/INTO and finished before the SELECT/INTO.  The only thing I've seen it be a PITA on is things like the Explorer window in SSMS.  And, no... not saying it doesn't happen.  I've just not seen it (interference/delays with table creation... not stuff in EM or SSMS) happen since SQL Server 6.5 RTM before the hot fix that kept it from crippling TempDB.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply