Truncate vs Delete for Bulk Removal of (some not all ) records

Question

Truncate vs Delete for Bulk Removal of (some not all ) records

simon_s

Ten Centuries

Points: 1358
More actions
November 22, 2021 at 12:35 pm

#3955589
EDIT : the table is not 1gb its 1tb 🙁

Hi
I have a table with 351,495,000 records (about 1TB in size) . I need to remove a bunch of historical data , approx 219,000,000 records.
Which be quicker , a delete based on criteria (int column , biggest value 4 digits) , or copy the 219 mil records to a copy of the table, truncate the original and then copy them back.
No identity columns involved and I'll be setting recovery to SIMPLE.
Table has a clustered composite key that includes the aforementioned int column.
Many thanks
Simon
- This topic was modified 3 years ago by simon_s.
- This topic was modified 3 years ago by simon_s.
- This topic was modified 3 years ago by simon_s.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Phil Parkin SSC Guru Points: 246960 More actions · Answer 1

Another option: copy the rows you wish to keep to a copy of the current table, then swap that table with the current one.

Also, how did you get J22 into an INT column?

This reply was modified 3 years ago by Phil Parkin.

The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin

simon_s Ten Centuries Points: 1358 More actions · Answer 2

apologies ,cut and paste error , example value 2859 i.e theyre all 4 digit numbers.

Would that be quicker do you think even with having to recreate the nonclustered (& 7 nonclustered) indexes ?

Phil Parkin SSC Guru Points: 246960 More actions · Answer 3

simon_s wrote:

apologies ,cut and paste error , example value 2859 i.e theyre all 4 digit numbers.
Would that be quicker do you think even with having to recreate the nonclustered (& 7 nonclustered) indexes ?

I would not guarantee that it would be quicker. But in my experience, deletions of millions of rows of data are often painfully slow and cause blocking. Let's wait for some other comments.

The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin

ScottPletcher SSC Guru Points: 101097 More actions · Answer 4

Is the existing clustered composite key unique? And explicitly declared to SQL as unique?

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

simon_s Ten Centuries Points: 1358 More actions · Answer 5

Field names replaced with datatypes , not unique / explicitly declared as unique

CREATE CLUSTERED INDEX [cl_table] ON [dbo].

(
[int] ASC,
[varchar10] ASC,
[dateTime] ASC
)
) 
GO

This reply was modified 3 years ago by simon_s.

Grant Fritchey SSC Guru Points: 398882 More actions · Answer 6

Testing is your friend. Try Phil's suggestion. It's likely going to be faster just based on the simple idea of data movement. Moving 100 million rows through the copy may, emphasis, be faster than deleting 200 million rows, although, indexes can be a factor here as well.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

ScottPletcher SSC Guru Points: 101097 More actions · Answer 7

Well, if the datetime in the clus index is current datetime, you're ok. Otherwise, not good, since you wouldn't be able to easily tell when copying the "good" rows that you got them all but that you didn't copy any row twice.

Typically it's best to copy in clus key order, in batches, but if the key is not always ascending, that will be far more difficult to do accurately.

Do you have any column/set of columns that are guaranteed to be unique in the table?

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

simon_s Ten Centuries Points: 1358 More actions · Answer 8

Unfortunately I dont have the space to test (create a duplicate db/table ), the whole rationale for this exercise is diskspace is full 🙁 , nor can I take it off line . I considered maybe I can do smallset subset copied to another server but that might negate the point of the testing.

Grant Fritchey SSC Guru Points: 398882 More actions · Answer 9

You can't restore to another server & try to test it there? I'd argue it's important to get something like this right because messing it up could be painful.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

ScottPletcher SSC Guru Points: 101097 More actions · Answer 10

Then follow the standard method and delete from the table in batches in clus key order.

In order to do that, you might first have to go thru the table and store the clus key values of all the rows to be deleted. Hopefully all rows with the same clus key value are either all deleted or all kept.

Loop thru the key-values-to-be-deleted table, deleting say TOP (5000) rows each time (or however many rows work out best to delete per query). Generally I'll add a small delay after every n deletes, for example, wait 1/4 or 1/2 sec after every 50K or 100K deletes.

If the db is not in simple mode, you'll want to do very frequent log backups are you're deleting to make sure the log doesn't need to grow to log the deletes, esp. since disk space is an issue for you.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Jeff Moden SSC Guru Points: 1004329 More actions · Answer 11

simon_s wrote:

Unfortunately I dont have the space to test (create a duplicate db/table ), the whole rationale for this exercise is diskspace is full 🙁 , nor can I take it off line . I considered maybe I can do smallset subset copied to another server but that might negate the point of the testing.

To be honest, disk space is relatively cheap. I would petition management for an substantial addition of a new disk. You do need some headroom both for future growth and to be able to do things as a DBA.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)