November 22, 2021 at 12:35 pm
EDIT : the table is not 1gb its 1tb 🙁
Hi
I have a table with 351,495,000 records (about 1TB in size) . I need to remove a bunch of historical data , approx 219,000,000 records.
Which be quicker , a delete based on criteria (int column , biggest value 4 digits) , or copy the 219 mil records to a copy of the table, truncate the original and then copy them back.
No identity columns involved and I'll be setting recovery to SIMPLE.
Table has a clustered composite key that includes the aforementioned int column.
Many thanks
Simon
November 22, 2021 at 12:50 pm
Another option: copy the rows you wish to keep to a copy of the current table, then swap that table with the current one.
Also, how did you get J22 into an INT column?
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
November 22, 2021 at 1:26 pm
apologies ,cut and paste error , example value 2859 i.e theyre all 4 digit numbers.
Would that be quicker do you think even with having to recreate the nonclustered (& 7 nonclustered) indexes ?
November 22, 2021 at 1:58 pm
apologies ,cut and paste error , example value 2859 i.e theyre all 4 digit numbers.
Would that be quicker do you think even with having to recreate the nonclustered (& 7 nonclustered) indexes ?
I would not guarantee that it would be quicker. But in my experience, deletions of millions of rows of data are often painfully slow and cause blocking. Let's wait for some other comments.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
November 22, 2021 at 3:02 pm
Is the existing clustered composite key unique? And explicitly declared to SQL as unique?
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
November 22, 2021 at 3:19 pm
Testing is your friend. Try Phil's suggestion. It's likely going to be faster just based on the simple idea of data movement. Moving 100 million rows through the copy may, emphasis, be faster than deleting 200 million rows, although, indexes can be a factor here as well.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
November 22, 2021 at 3:23 pm
Well, if the datetime in the clus index is current datetime, you're ok. Otherwise, not good, since you wouldn't be able to easily tell when copying the "good" rows that you got them all but that you didn't copy any row twice.
Typically it's best to copy in clus key order, in batches, but if the key is not always ascending, that will be far more difficult to do accurately.
Do you have any column/set of columns that are guaranteed to be unique in the table?
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
November 22, 2021 at 3:25 pm
Unfortunately I dont have the space to test (create a duplicate db/table ), the whole rationale for this exercise is diskspace is full 🙁 , nor can I take it off line . I considered maybe I can do smallset subset copied to another server but that might negate the point of the testing.
November 22, 2021 at 3:36 pm
You can't restore to another server & try to test it there? I'd argue it's important to get something like this right because messing it up could be painful.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
November 22, 2021 at 3:36 pm
Then follow the standard method and delete from the table in batches in clus key order.
In order to do that, you might first have to go thru the table and store the clus key values of all the rows to be deleted. Hopefully all rows with the same clus key value are either all deleted or all kept.
Loop thru the key-values-to-be-deleted table, deleting say TOP (5000) rows each time (or however many rows work out best to delete per query). Generally I'll add a small delay after every n deletes, for example, wait 1/4 or 1/2 sec after every 50K or 100K deletes.
If the db is not in simple mode, you'll want to do very frequent log backups are you're deleting to make sure the log doesn't need to grow to log the deletes, esp. since disk space is an issue for you.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
November 22, 2021 at 4:53 pm
Unfortunately I dont have the space to test (create a duplicate db/table ), the whole rationale for this exercise is diskspace is full 🙁 , nor can I take it off line . I considered maybe I can do smallset subset copied to another server but that might negate the point of the testing.
To be honest, disk space is relatively cheap. I would petition management for an substantial addition of a new disk. You do need some headroom both for future growth and to be able to do things as a DBA.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply