September 26, 2008 at 3:08 pm
We are planning to delete a bunch of old data. When I do the test delete on a table with 250million + rows it takes 25 hours to delete and the row count comes to 6 million.
My question is the DB size before the test is 239GB with available space shown at 2GB and after the test it is the same but available space shown at 42GB.
When I look at the table properties I see the data space occupied on the table decreases from 47Gb to 7GB. But the file sizes remain same.
How do I work it out to see the actual DB size as 197 Gb with free space 0.
Do I update statistics on the table or rebuild indexes with fill factor like 90 %?
Thanks in Advance!!!
The_SQL_DBA
MCTS
"Quality is never an accident; it is always the result of high intention, sincere effort, intelligent direction and skillful execution; it represents the wise choice of many alternatives."
September 26, 2008 at 4:08 pm
Have you run DBCC UPDATEUSAGE ('DatabaseName') WITH Count_Rows after that?
MJ
September 26, 2008 at 5:37 pm
Deleting data from the database will not shrink the database file size. What you are seeing is correct and is what I would expect.
BTW - you want to have space available in the database for additional growth and maintenance operations. When you rebuild your indexes - you need space in the database for the rebuild operation. If you shrink the database - it will grow again to accomodate those operations, or worse case - the system will not be able to rebuild the indexes without at least some level of fragmentation.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
September 26, 2008 at 7:36 pm
The_SQL_DBA (9/26/2008)
We are planning to delete a bunch of old data. When I do the test delete on a table with 250million + rows it takes 25 hours to delete and the row count comes to 6 million.My question is the DB size before the test is 239GB with available space shown at 2GB and after the test it is the same but available space shown at 42GB.
When I look at the table properties I see the data space occupied on the table decreases from 47Gb to 7GB. But the file sizes remain same.
How do I work it out to see the actual DB size as 197 Gb with free space 0.
Do I update statistics on the table or rebuild indexes with fill factor like 90 %?
Thanks in Advance!!!
There's a much faster way for this... BCP the data out, truncate the table, BCP the data back in. (IIRC BCP is the fastest way to do major data moves like this, but I'm not 100% sure).
You can also do this without BCP... SELECT * into temp_table from basetable where RowsAreValid
TRUNCATE Base Table
INSERT INTO Base Table (Colslist) SELECT Colslist from temp_table.
On a beefy server, that should run in only a few minutes... maybe 1 hour.
September 27, 2008 at 5:54 pm
Just remember that TRUNCATE doesn't work in the presence of foreign keys. I believe they can be disabled rather than dropped, though.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 28, 2008 at 7:36 am
The bcp/truncate works well if you can. I typically delete large sets of data in batches, 10k rows or so at a time. Seems to work faster.
You always want free space in your db, even read only ones. If you need to rebuild indexes, update data, etc., you want free space you've created, not auto-grown or fragmented space. You also don't want to srhink after you've set indexes as you'll fragment things.
September 28, 2008 at 8:00 am
@Jeffrey: I would want to delete a large amount of data as is required and that the table size decreases dramatically after the test delete. Now my obvious question is where does this space go. If it was deleted the right way I would want to see the space reflect in the reduced DB size if not free space
@steve-2: The idea of deleting 10K rows sounds fast to me too, but I am required to delete in excess of 10 million rows, that would be humongous to do if I do it in chunks.
I may not truncate the data, because I have a maintenance widow of less than 8 hours, and the site is live. There are 20 Odd clients who dont want to see there data deleted. Thats a FACT I cannot ignore. If I truncate the data, I may not BCP only a selected data within the maintenace window.
What is the best way forward after the purge is done, to keep data consistent and also reflect fast data retrieval and IS SAFEST way to do it.
My plan is I have scripts that DELETE data with our requirements and then Rebuild Indexes on that DB...do I need anything else???
Thanks for your responses!!!
The_SQL_DBA
MCTS
"Quality is never an accident; it is always the result of high intention, sincere effort, intelligent direction and skillful execution; it represents the wise choice of many alternatives."
September 28, 2008 at 8:33 am
Deleting it in chunks is actually faster, not huge. When you are trying to drop 10mm records at once, it starts to eat up resources. I've seen deletes that are > 1mm records take hours and when you break it into batches, it runs in tens of minutes.
September 28, 2008 at 8:40 am
The progressive delete is quite easy and often much faster than doing a single delete with millions of rows :
SET ROWCOUNT 10000
DELETE FROM dbo.Table WHERE WhatEver
WHILE @@RowCount = 10000
BEGIN
--make sure the delete is the last statement of the while or the rowcount will be reset to something else than 10 000
DELETE FROM dbo.Table WHERE WhatEver
END
SET ROWCOUNT 0
...
Load new data
September 28, 2008 at 8:44 am
One last idea that me be real fast and not cause any troubles.
Can you load all the new data into an exact copy of the base table, then in a transaction, swap the table names like EM do when you modify the tables.
That may be the fastest way to do the delete and always have the data available.
Any comments appreciated... I've never done this myself.
September 28, 2008 at 10:58 am
There are seven related tables that are dependent. I am supposed to delete the data starting with the table on the lowest level of the parent child hierarchy. There is a common field in 6 of these tables that happens to be unique identifier in some. I have to find the one's that are common excluding the client that are expecting us to maintain their data in the site. I have this set up as scripts and I can't go by the date because date field is not common in all the tables and unfortunately in some tables it is set as a STRING which makes my job lil hard.
I will anyways try testing with the @@ROWCOUNT and set it to 25000 to begin with. Do I run this statement many a times until I reach the desired row level or I loop it using a WHILE statement. I dont want to use a CURSOR thats for sure..thanks!!!
The_SQL_DBA
MCTS
"Quality is never an accident; it is always the result of high intention, sincere effort, intelligent direction and skillful execution; it represents the wise choice of many alternatives."
September 28, 2008 at 11:15 am
You don't need any cursor. Do a test on a smaller table to understand how it works.
Just use the script as I put it, it will delete all the rows like a the single delete would, but it should be much faster... the while @@Rowcount = 10000 will keep the loop going untill less than 10 000 rows are deleted... which means that the delete is complete.
September 28, 2008 at 8:43 pm
Ninja's_RGR'us (9/28/2008)
The progressive delete is quite easy and often much faster than doing a single delete with millions of rows :SET ROWCOUNT 10000
DELETE FROM dbo.Table WHERE WhatEver
WHILE @@RowCount = 10000
BEGIN
--make sure the delete is the last statement of the while or the rowcount will be reset to something else than 10 000
DELETE FROM dbo.Table WHERE WhatEver
WAITFOR DELAY '00:00:02' --<-----
END
SET ROWCOUNT 0
...
Load new data
1. I added waitfor 2 seconds in the while loop. Thus, user normal operations will not get affected much (long blocking) in a live production; SQL Server gets a break for housekeeping;
2. Having an index for where clause column(s) helps.
I did these kinds of purging. Breaking into small chunks using rowcount is manageable/faster/less log space usage/avoid blocking...
September 28, 2008 at 9:01 pm
Good call... just make sure to put the waitfor at the top of the loop so that it does not affect the rowcount!... or save that value in a variable to not have to worry about it ever again.
September 29, 2008 at 8:30 pm
well the rowcount has worked like a charm. I shall try yhe wait for also. I am also referencing by the indexed column in the where clause now.
thanks a lot guys!!!
The_SQL_DBA
MCTS
"Quality is never an accident; it is always the result of high intention, sincere effort, intelligent direction and skillful execution; it represents the wise choice of many alternatives."
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply