Deleting data from a table ... quickly

Question

Deleting data from a table ... quickly

julian.fletcher

SSCrazy

Points: 2965
More actions
October 22, 2018 at 8:53 am

#416257

I'm working on a system which stores what you might call "temporary working data" in a number of database tables. To manage concurrent calculations, these all have a column to record the @@SPID of the connection - this is used to ensure that each connection only uses its own data.
At various points in the calculation (specifically, at the very start before anything's done, and at the end after everything's done), we want to clear a particular table of any and all data for a specific @@SPID. At the moment, we're doing this:
If Exists (Select 1 From dbo.TTemp_MaintainActuals With (NoLock) Where (SPID = @@SPID)) Delete XXX   From dbo.TTemp_MaintainActuals XXX With (ForceSeek)   Where (XXX.SPID = @@SPID)
That works just fine but with an eye on performance, and bearing in mind that these tables could contain a reasonably large (although not massive) number of rows, I'd like to do a Truncate instead of a Delete where possible. The "where possible" bit obviously means where the table doesn't contain any rows from other connections.
To make this as multi-thread safe / bulletproof as possible, I'm thinking of the following sort of approach:
Declare @Truncate Bit = 1
If (@@TranCount = 0)
Set @Truncate = 0
Else
Begin
Set Lock_Timeout 0
Begin Try
  If Exists (Select 1 From dbo.TTemp_MaintainActuals With (TabLock, HoldLock) Where (SPID <> @@SPID))
   Set @Truncate = 0
End Try
Begin Catch
  Set @Truncate = 0
End Catch
End
If (@Truncate = 0)
Begin
If Exists (Select 1 From dbo.TTemp_MaintainActuals With (NoLock) Where (SPID = @@SPID))
  Delete XXX
   From dbo.TTemp_MaintainActuals XXX With (ForceSeek)
   Where (XXX.SPID = @@SPID)
End
Else
Truncate Table dbo.TTemp_MaintainActuals
The aim of the "If Exists (Select 1 From dbo.TTemp_MaintainActuals With (TabLock, HoldLock) Where (SPID <> @@SPID))" is twofold:
1) See if there are any rows belonging to other connections.
2) Get a lock on the table for the duration of the proposed truncation.
The problem is that at the end of the code, the connection still has a lock on the whole table, which is only relinquished when its transaction is committed or rolled back. And that is a problem for other connections wanting to make use of it.
I can't find any other way of releasing the lock - does anybody know of one?

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 1

May I ask what may be a dumb question? Why aren't each of the processes using their own temporary table so that you aren't sharing a table among multiple processes for these particular calculations?

julian.fletcher SSCrazy Points: 2965 More actions · Answer 2

Excellent question! And one we have considered. We found that creating temporary tables for each process was significantly slower than using permanent ones.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 3

julian.fletcher - Monday, October 22, 2018 10:04 AM
Excellent question! And one we have considered. We found that creating temporary tables for each process was significantly slower than using permanent ones.

Care to expand on this? How were the tables being created? Were you creating # temporary tables or unique permanent tables for each process?

julian.fletcher SSCrazy Points: 2965 More actions · Answer 4

We were making # tables. We have considered making permanent tables for each process (e.g. TTemp_MaintainActuals1, TTemp_MaintainActuals2, TTemp_MaintainActuals3, TTemp_MaintainActuals4 ... TTemp_MaintainActuals32767) but that would require quite a few new problems to be resolved.

ScottPletcher SSC Guru Points: 101119 More actions · Answer 5

Just delete the rows. If it's not that many rows, it's more work than it's worth to try to TRUNCATE.

I suspect (hope) the temp table is clustered first on SPID. If it's not, do that. But other than that, you should be OK.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 6

ScottPletcher - Monday, October 22, 2018 11:21 AM
Just delete the rows. If it's not that many rows, it's more work than it's worth to try to TRUNCATE.
I suspect (hope) the temp table is clustered first on SPID. If it's not, do that. But other than that, you should be OK.

Ah, be careful now... clustering on the SPID will create silos that are inserted into "out of order" which will cause massive page splits and huge amounts of log file usage even in the SIMPLE recovery model unless you can convert it to SPID followed by some form of ever-increasing column. Even if you have that, if you also have "Expansive" updates, insert and update performance is going to tank compared to what could be and log file activity will rise to about 43 times as much as it needs to be. You'll also end up with incredibly low page density within the silos, which is where most of the work will be done which also wastes a ton of memory on mostly emptied pages.

I can't upload an image from where I'm at right now. I'll do it when I get home but you'll be shocked at what such an index looks like under such conditions.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 7

julian.fletcher - Monday, October 22, 2018 10:23 AM
We were making # tables. We have considered making permanent tables for each process (e.g. TTemp_MaintainActuals1, TTemp_MaintainActuals2, TTemp_MaintainActuals3, TTemp_MaintainActuals4 ... TTemp_MaintainActuals32767) but that would require quite a few new problems to be resolved.

So why was the creating of the # tables slow?

epivaral SSC-Addicted Points: 428 More actions · Answer 8

julian.fletcher - Monday, October 22, 2018 8:53 AM
I'm working on a system which stores what you might call "temporary working data" in a number of database tables. To manage concurrent calculations, these all have a column to record the @@SPID of the connection - this is used to ensure that each connection only uses its own data.
At various points in the calculation (specifically, at the very start before anything's done, and at the end after everything's done), we want to clear a particular table of any and all data for a specific @@SPID. At the moment, we're doing this:
If Exists (Select 1 From dbo.TTemp_MaintainActuals With (NoLock) Where (SPID = @@SPID)) Delete XXX   From dbo.TTemp_MaintainActuals XXX With (ForceSeek)   Where (XXX.SPID = @@SPID)
That works just fine but with an eye on performance, and bearing in mind that these tables could contain a reasonably large (although not massive) number of rows, I'd like to do a Truncate instead of a Delete where possible. The "where possible" bit obviously means where the table doesn't contain any rows from other connections.
To make this as multi-thread safe / bulletproof as possible, I'm thinking of the following sort of approach:
Declare @Truncate Bit = 1
If (@@TranCount = 0)
Set @Truncate = 0
Else
Begin
Set Lock_Timeout 0
Begin Try
  If Exists (Select 1 From dbo.TTemp_MaintainActuals With (TabLock, HoldLock) Where (SPID <> @@SPID))
   Set @Truncate = 0
End Try
Begin Catch
  Set @Truncate = 0
End Catch
End
If (@Truncate = 0)
Begin
If Exists (Select 1 From dbo.TTemp_MaintainActuals With (NoLock) Where (SPID = @@SPID))
  Delete XXX
   From dbo.TTemp_MaintainActuals XXX With (ForceSeek)
   Where (XXX.SPID = @@SPID)
End
Else
Truncate Table dbo.TTemp_MaintainActuals
The aim of the "If Exists (Select 1 From dbo.TTemp_MaintainActuals With (TabLock, HoldLock) Where (SPID <> @@SPID))" is twofold:
1) See if there are any rows belonging to other connections.
2) Get a lock on the table for the duration of the proposed truncation.
The problem is that at the end of the code, the connection still has a lock on the whole table, which is only relinquished when its transaction is committed or rolled back. And that is a problem for other connections wanting to make use of it.
I can't find any other way of releasing the lock - does anybody know of one?

you can use the TOP operator and delete by smaller chunks, something like this:
DECLARE @batchSize INT DECLARE @results INT

SET @results = 1 --stores the row count after each successful batch SET @batchSize = 1000

WHILE (@results > 0) BEGIN

Delete top(@batchSize) From dbo.TTemp_MaintainActuals XXX With (ForceSeek) Where (XXX.SPID = @@SPID)

-- very important to obtain the latest rowcountrowcount to avoid infinite loops SET @results = @@ROWCOUNT

END

Of course you have to test it first, but that could be the general idea, with this method, you can reduce table locks, and reduce log usage. If indexes are good enough also the delete time will be reduced...

MCSE, MCSA SQL Server Database Developer/Administrator

ScottPletcher SSC Guru Points: 101119 More actions · Answer 9

Jeff Moden - Monday, October 22, 2018 11:52 AM
ScottPletcher - Monday, October 22, 2018 11:21 AM
Just delete the rows. If it's not that many rows, it's more work than it's worth to try to TRUNCATE.
I suspect (hope) the temp table is clustered first on SPID. If it's not, do that. But other than that, you should be OK.
Ah, be careful now... clustering on the SPID will create silos that are inserted into "out of order" which will cause massive page splits and huge amounts of log file usage even in the SIMPLE recovery model unless you can convert it to SPID followed by some form of ever-increasing column. Even if you have that, if you also have "Expansive" updates, insert and update performance is going to tank compared to what could be and log file activity will rise to about 43 times as much as it needs to be. You'll also end up with incredibly low page density within the silos, which is where most of the work will be done which also wastes a ton of memory on mostly emptied pages.
I can't upload an image from where I'm at right now. I'll do it when I get home but you'll be shocked at what such an index looks like under such conditions.

Presumably not, if the SPIDs are deleted relatively quickly from the work table. If it somehow does become an issue, just inflate the row so that only 1 or 2 fit per page. That should address the problem too. It's still less eternal maintenance than trying to figure out when you can TRUNC and when you can't.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

ScottPletcher SSC Guru Points: 101119 More actions · Answer 10

Lynn Pettis - Monday, October 22, 2018 12:28 PM
julian.fletcher - Monday, October 22, 2018 10:23 AM
We were making # tables. We have considered making permanent tables for each process (e.g. TTemp_MaintainActuals1, TTemp_MaintainActuals2, TTemp_MaintainActuals3, TTemp_MaintainActuals4 ... TTemp_MaintainActuals32767) but that would require quite a few new problems to be resolved.
So why was the creating of the # tables slow?

Perhaps because they are doing SELECT ... INTO #... a lot. That seems to create internal locks on the sys tables. I thought MS had claimed to have lessened the issues with this, but we're on SQL 2016 and we still see it in legacy code where we haven't re-worked the SELECT INTOs.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 11

Well, with a permanent table already created they aren't using SELECT ... INTO #<table>. Why not create the #<table> then insert the data into it?

ScottPletcher SSC Guru Points: 101119 More actions · Answer 12

Lynn Pettis - Monday, October 22, 2018 1:51 PM
Well, with a permanent table already created they aren't using SELECT ... INTO #<table>. Why not create the #<table> then insert the data into it?

When I said "perhaps because" I meant "maybe" they are using SELECT ... INTO, I don't know for sure. But I know that can cause delays, sometimes severe delays, in using metadata, including table creation and deletion, and even in tempdb. It's convenient to code, and a lot of people don't realize the potential issues they can see from it.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 13

Lynn Pettis

SSC Guru

Points: 442467

October 22, 2018 at 2:06 pm

#2010687

Whatever, dude.

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 14

ScottPletcher - Monday, October 22, 2018 2:05 PM
Lynn Pettis - Monday, October 22, 2018 1:51 PM
Well, with a permanent table already created they aren't using SELECT ... INTO #<table>. Why not create the #<table> then insert the data into it?
When I said "perhaps because" I meant "maybe" they are using SELECT ... INTO, I don't know for sure. But I know that can cause delays, sometimes severe delays, in using metadata, including table creation and deletion, and even in tempdb. It's convenient to code, and a lot of people don't realize the potential issues they can see from it.

Do you have any demonstrable/repeatable examples? I've previously tested overlapping SELECT/INTOs and other ways of creating tables that started after the SELECT/INTO and finished before the SELECT/INTO. The only thing I've seen it be a PITA on is things like the Explorer window in SSMS. And, no... not saying it doesn't happen. I've just not seen it (interference/delays with table creation... not stuff in EM or SSMS) happen since SQL Server 6.5 RTM before the hot fix that kept it from crippling TempDB.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)