December 9, 2013 at 11:36 am
I've got a DB that has one table that is currently using about 52GB and I need to free up some space in the DB. It's reached its set max size, and there's not really room on the disk to let it get any bigger...
What I'm looking to do is, delete about 1/2 the rows out of the largest table, once the Vendor says we can do that without breaking anything...
The table in question only has 5 columns, as follows:
OID (PK, numeric(19,0), not null) {Also has a clustered index on it}
C_Type (numeric(1,0), not null)
C_SZ (int, not null)
C_DATA_TXT (ntext, null)
C_DATA_BIN (image, NULL)
There are no other indexes on the table, and no foreign keys either referencing it, or on it.
I'd enable compression on the table, but it will have no effect due to the ntext. I came up with the following to delete only a percentage of the current records, freeing up space in the DB to keep things going until I get more disk space...
if exists (select * from tempdb.sys.all_objects where name like '%#deleteoids%')
drop table #deleteoids
select top (50) percent oid as [oiddelete] into #deleteoids from [problem].
order by oid asc
--Replace the "select cnts.oid, doids.oiddelete" with "delete *" when running for real
select cnts.oid, doids.oiddelete from [problem].
as cnts
inner join #deleteoids as doids
on doids.oiddelete = cnts.oid
if exists (select * from tempdb.sys.all_objects where name like '%#deleteoids%')
drop table #deleteoids;
My big concern is the inner join. Some things I've tracked down seem to indicate it will work, some say it won't, and some indicate you need to jump through a few more hoops than I've got...
I might also wrap everything between the two "if exists" with a begin / end transaction, although a partial delete would still be a "win" for me...
Thanks all!
Jason
December 9, 2013 at 12:28 pm
I don't think you can say "delete *" - it's just "delete from" but I'm a little uneasy about the approach of deleting with an inner join. Then again, I've never tried it. 😛
As an alternative, since a CTE can be manipulated as the table itself, could you use the following approach? This would eliminate the creation of the temp table. You could check it by using a SELECT instead the DELETE. I tested this on your table structure using 1M rows and took it down 50% several times.
with cte as (
select top 50 percent oid
from problem_table
order by OID asc)
delete from cte;
Two noteworthy items:
1. I would definitely test it first. After all, this is production we're talking about.
2. Deleting half of a 52 GB table can cause some pretty hefty growth of your transaction log. You say you already have a space problem on the data drive for this database, so I don't know if you have the same situation on your log drive. If this is likely to be a problem, you may have to bust up your delete into smaller chunks and take a backup after each chunk.
December 9, 2013 at 12:38 pm
Ed,
On #1, absolutely! This is why there's a QA environment as well. I've actually been developing the code against the QA DB / table.
On #2, I'd forgotten about possible log growth. Looking at the DB, right now it's got about 3GB free in the log, so this *might* fit without growing it. Worst case, as you said, chunk out my deletes...
I also hadn't thought of using a CTE, although I recently used one in a query I'd created initially using temp tables.
Right now, the big question is in the vendors' hands, can we delete these entries...
Maybe I can anger the Nedom Ffej gods, and use a cursor to RBAR the deletes!
:hehe:
December 9, 2013 at 12:51 pm
jasona.work (12/9/2013)
Ed,On #1, absolutely! This is why there's a QA environment as well. I've actually been developing the code against the QA DB / table.
On #2, I'd forgotten about possible log growth. Looking at the DB, right now it's got about 3GB free in the log, so this *might* fit without growing it. Worst case, as you said, chunk out my deletes...
I also hadn't thought of using a CTE, although I recently used one in a query I'd created initially using temp tables.
Right now, the big question is in the vendors' hands, can we delete these entries...
Maybe I can anger the Nedom Ffej gods, and use a cursor to RBAR the deletes!
:hehe:
RBARing (if that form of the acronym is even a word) the deletes won't save your logs. You could take the total number of rows you want to delete, fire the CTE above using 10% of the total number (as an integer), take a backup and repeat 4 more times to get to your 50% total. Here's what I'm attempting to say if you have a 1M-row table:
with cte as (
select top 100000 oid
from problem_table
order by OID asc)
delete from cte;
backup data [JasonsDB] to disk = 's:\temp1.bak' with init;
If you think it'll fit in your free log space, you can fire the whole thing and then take an immediate backup to mark the log entries as backed up and available for reuse.
December 31, 2013 at 12:09 pm
Hi.
As much as I hate loops, this is the right place for one in order to remove the data without ballooning the transaction log.
declare @rowcnt int
select @rowcnt = COUNT(*) / 2 from problem_table
print '@rowcnt=' + convert(varchar(10),@rowcnt);
while @rowcnt > 1
begin
with cte as (
select
top 5000
OID
from problem_table
order by OID
)
delete from cte;
select @rowcnt = @rowcnt - @@ROWCOUNT
print '@rowcnt=' + convert(varchar(10),@rowcnt);
end
This will do the trick without whacking the transaction log; row count is +-5000 of half the table. You may have to try something else to make this work, such as deleting all of X before a certain date, which is a far more common purge criteria.
Thanks
John.
December 31, 2013 at 10:57 pm
jasona.work (12/9/2013)
Ed,On #1, absolutely! This is why there's a QA environment as well. I've actually been developing the code against the QA DB / table.
On #2, I'd forgotten about possible log growth. Looking at the DB, right now it's got about 3GB free in the log, so this *might* fit without growing it. Worst case, as you said, chunk out my deletes...
I also hadn't thought of using a CTE, although I recently used one in a query I'd created initially using temp tables.
Right now, the big question is in the vendors' hands, can we delete these entries...
Maybe I can anger the Nedom Ffej gods, and use a cursor to RBAR the deletes!
:hehe:
A cursor wouldn't be a problem here. Still, no need for it.
I ran into a very similar problem once. Being paranoid, I killed to birds with one stone. Think about it... if you do a backup of the database before hand and someone wants some data that you've deleted from the table, how on Earth will you get the data back? Those thoughts went through my mind and here's what I did.
1. Do a couple of selects and determine how you can break the table up into smaller pieces (I used 10 for my problem) based on the OID for the older rows that you want to delete. You need a StartOID and EndOID for each piece.
2. Write 10 (or whatever number you've chosen) separate BCP commands that use the "queryout" mode and have the query follow the basic format of SELECT * FROM dbo.YourTable WHERE OID >= StartOID AND OID < EndOID and direct the output at the backup drives. Now if someone wants you to get data back, you have files that you could BULK INSERT back in one at a time to find the data without having to do a restore and they're small enough to be able to import, one at a time, without blowing out your server, again.
3. Then, figure out which is the median OID and do one more BCP export to the backup drives that contains the 50% of the data you want to keep.
4. Once that's done, simply drop the table and wait for the pages to deallocate.
5. Once that's done, create an empty table (with indexes, if you want) and BULK INSERT that last file that contains the data you want to keep. If you do it right (with things like TABLOCK, etc) the import will be minimally logged.
Since this will become a problem in the future, I'd likely setup monthly or quarterly partitioning so that you can just SWITCH out data (by month or quarter depending on how you set things up) to a new table and then drop the new table. The SWITCH and DROP are measured in milliseconds, so you can see it might be worthwhile to spend a little time setting up monthly partitioning.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 2, 2014 at 12:07 am
Ed Wagner (12/9/2013)
Here's what I'm attempting to say if you have a 1M-row table:
with cte as (
select top 100000 oid
from problem_table
order by OID asc)
delete from cte;
backup data [JasonsDB] to disk = 's:\temp1.bak' with init;
If you think it'll fit in your free log space, you can fire the whole thing and then take an immediate backup to mark the log entries as backed up and available for reuse.
Database backups do not mark the log space as reusable. That requires a log backup on full/bulk logged recovery and a checkpoint on simple recovery.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 2, 2014 at 12:09 am
JohnFTamburo (12/31/2013)
Hi.As much as I hate loops, this is the right place for one in order to remove the data without ballooning the transaction log.
Loops don't prevent log growth. On full recovery model, only log backups mark the log reusable, so if deleting huge amounts in a loop, that loop needs to run log backups from time to time. On simple recovery the loop needs to run Checkpoint as the automatic checkpoint probably won't kick in often enough to prevent the log from growing.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply