Will this work for deleteing a large number of rows?

Question

Will this work for deleteing a large number of rows?

jasona.work

SSC Guru

Points: 50117
More actions
December 9, 2013 at 11:36 am

#280081

I've got a DB that has one table that is currently using about 52GB and I need to free up some space in the DB. It's reached its set max size, and there's not really room on the disk to let it get any bigger...
What I'm looking to do is, delete about 1/2 the rows out of the largest table, once the Vendor says we can do that without breaking anything...
The table in question only has 5 columns, as follows:
OID (PK, numeric(19,0), not null) {Also has a clustered index on it}
C_Type (numeric(1,0), not null)
C_SZ (int, not null)
C_DATA_TXT (ntext, null)
C_DATA_BIN (image, NULL)
There are no other indexes on the table, and no foreign keys either referencing it, or on it.
I'd enable compression on the table, but it will have no effect due to the ntext. I came up with the following to delete only a percentage of the current records, freeing up space in the DB to keep things going until I get more disk space...
if exists (select * from tempdb.sys.all_objects where name like '%#deleteoids%')
drop table #deleteoids
select top (50) percent oid as [oiddelete] into #deleteoids from [problem].
order by oid asc
--Replace the "select cnts.oid, doids.oiddelete" with "delete *" when running for real
select cnts.oid, doids.oiddelete from [problem].
as cnts
inner join #deleteoids as doids
on doids.oiddelete = cnts.oid
if exists (select * from tempdb.sys.all_objects where name like '%#deleteoids%')
drop table #deleteoids;
My big concern is the inner join. Some things I've tracked down seem to indicate it will work, some say it won't, and some indicate you need to jump through a few more hoops than I've got...
I might also wrap everything between the two "if exists" with a begin / end transaction, although a partial delete would still be a "win" for me...
Thanks all!
Jason

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Ed Wagner SSC Guru Points: 287026 More actions · Answer 1

I don't think you can say "delete *" - it's just "delete from" but I'm a little uneasy about the approach of deleting with an inner join. Then again, I've never tried it. 😛

As an alternative, since a CTE can be manipulated as the table itself, could you use the following approach? This would eliminate the creation of the temp table. You could check it by using a SELECT instead the DELETE. I tested this on your table structure using 1M rows and took it down 50% several times.

with cte as (

select top 50 percent oid

from problem_table

order by OID asc)

delete from cte;

Two noteworthy items:

1. I would definitely test it first. After all, this is production we're talking about.

2. Deleting half of a 52 GB table can cause some pretty hefty growth of your transaction log. You say you already have a space problem on the data drive for this database, so I don't know if you have the same situation on your log drive. If this is likely to be a problem, you may have to bust up your delete into smaller chunks and take a backup after each chunk.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

jasona.work SSC Guru Points: 50117 More actions · Answer 2

Ed,

On #1, absolutely! This is why there's a QA environment as well. I've actually been developing the code against the QA DB / table.

On #2, I'd forgotten about possible log growth. Looking at the DB, right now it's got about 3GB free in the log, so this *might* fit without growing it. Worst case, as you said, chunk out my deletes...

I also hadn't thought of using a CTE, although I recently used one in a query I'd created initially using temp tables.

Right now, the big question is in the vendors' hands, can we delete these entries...

Maybe I can anger the Nedom Ffej gods, and use a cursor to RBAR the deletes!

:hehe:

Ed Wagner SSC Guru Points: 287026 More actions · Answer 3

jasona.work (12/9/2013)
Ed,
On #1, absolutely! This is why there's a QA environment as well. I've actually been developing the code against the QA DB / table.
On #2, I'd forgotten about possible log growth. Looking at the DB, right now it's got about 3GB free in the log, so this *might* fit without growing it. Worst case, as you said, chunk out my deletes...
I also hadn't thought of using a CTE, although I recently used one in a query I'd created initially using temp tables.
Right now, the big question is in the vendors' hands, can we delete these entries...
Maybe I can anger the Nedom Ffej gods, and use a cursor to RBAR the deletes!
:hehe:

RBARing (if that form of the acronym is even a word) the deletes won't save your logs. You could take the total number of rows you want to delete, fire the CTE above using 10% of the total number (as an integer), take a backup and repeat 4 more times to get to your 50% total. Here's what I'm attempting to say if you have a 1M-row table:

with cte as (

select top 100000 oid

from problem_table

order by OID asc)

delete from cte;

backup data [JasonsDB] to disk = 's:\temp1.bak' with init;

If you think it'll fit in your free log space, you can fire the whole thing and then take an immediate backup to mark the log entries as backed up and available for reuse.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

SQLBlimp SSCertifiable Points: 6340 More actions · Answer 4

Hi.

As much as I hate loops, this is the right place for one in order to remove the data without ballooning the transaction log.

declare @rowcnt int

select @rowcnt = COUNT(*) / 2 from problem_table

print '@rowcnt=' + convert(varchar(10),@rowcnt);

while @rowcnt > 1

begin

with cte as (

select

top 5000

OID

from problem_table

order by OID

)

delete from cte;

select @rowcnt = @rowcnt - @@ROWCOUNT

print '@rowcnt=' + convert(varchar(10),@rowcnt);

end

This will do the trick without whacking the transaction log; row count is +-5000 of half the table. You may have to try something else to make this work, such as deleting all of X before a certain date, which is a far more common purge criteria.

Thanks

John.

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 5

jasona.work (12/9/2013)
Ed,
On #1, absolutely! This is why there's a QA environment as well. I've actually been developing the code against the QA DB / table.
On #2, I'd forgotten about possible log growth. Looking at the DB, right now it's got about 3GB free in the log, so this *might* fit without growing it. Worst case, as you said, chunk out my deletes...
I also hadn't thought of using a CTE, although I recently used one in a query I'd created initially using temp tables.
Right now, the big question is in the vendors' hands, can we delete these entries...
Maybe I can anger the Nedom Ffej gods, and use a cursor to RBAR the deletes!
:hehe:

A cursor wouldn't be a problem here. Still, no need for it.

I ran into a very similar problem once. Being paranoid, I killed to birds with one stone. Think about it... if you do a backup of the database before hand and someone wants some data that you've deleted from the table, how on Earth will you get the data back? Those thoughts went through my mind and here's what I did.

1. Do a couple of selects and determine how you can break the table up into smaller pieces (I used 10 for my problem) based on the OID for the older rows that you want to delete. You need a StartOID and EndOID for each piece.

2. Write 10 (or whatever number you've chosen) separate BCP commands that use the "queryout" mode and have the query follow the basic format of SELECT * FROM dbo.YourTable WHERE OID >= StartOID AND OID < EndOID and direct the output at the backup drives. Now if someone wants you to get data back, you have files that you could BULK INSERT back in one at a time to find the data without having to do a restore and they're small enough to be able to import, one at a time, without blowing out your server, again.

3. Then, figure out which is the median OID and do one more BCP export to the backup drives that contains the 50% of the data you want to keep.

4. Once that's done, simply drop the table and wait for the pages to deallocate.

5. Once that's done, create an empty table (with indexes, if you want) and BULK INSERT that last file that contains the data you want to keep. If you do it right (with things like TABLOCK, etc) the import will be minimally logged.

Since this will become a problem in the future, I'd likely setup monthly or quarterly partitioning so that you can just SWITCH out data (by month or quarter depending on how you set things up) to a new table and then drop the new table. The SWITCH and DROP are measured in milliseconds, so you can see it might be worthwhile to spend a little time setting up monthly partitioning.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 6

Ed Wagner (12/9/2013)
Here's what I'm attempting to say if you have a 1M-row table:
with cte as (
select top 100000 oid
from problem_table
order by OID asc)
delete from cte;
backup data [JasonsDB] to disk = 's:\temp1.bak' with init;
If you think it'll fit in your free log space, you can fire the whole thing and then take an immediate backup to mark the log entries as backed up and available for reuse.

Database backups do not mark the log space as reusable. That requires a log backup on full/bulk logged recovery and a checkpoint on simple recovery.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 7

JohnFTamburo (12/31/2013)
Hi.
As much as I hate loops, this is the right place for one in order to remove the data without ballooning the transaction log.

Loops don't prevent log growth. On full recovery model, only log backups mark the log reusable, so if deleting huge amounts in a loop, that loop needs to run log backups from time to time. On simple recovery the loop needs to run Checkpoint as the automatic checkpoint probably won't kick in often enough to prevent the log from growing.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass