Database Size After Deleting 200-million rows

Question

Database Size After Deleting 200-million rows

alex.sqldba

SSChampion

Points: 10254
More actions
December 7, 2016 at 11:38 am

#315786

Hi Guys,
I've been going through a process of cleaning up extraneous data, both for compliance and as a bit of a spring clean. As a result a log table has just had 200-million or so rows removed, which represents a large proportion of the entire database:
238 GB Total Size
25 GB Free Space
24 GB Log File
However, since performing the delete the database free space has hardly moved. I was expecting the current free space to be at least a hundred GB. If not a bit more.
Interestingly, looking at sp_Spaceused, the row count has dropped significantly, however the space used is the same as before the delete.
I have updated statistics for this table but no change. I have not however performed a reindex -- as that is going to run this weekend as per the normal maintenance routines.
SO my actual question is, why am I not seeing any disk space usage drops? Is there a system sp I should run to identify the space? Does it *need* a re-index to identify this space?
I've also identified another 50GB or so to be removed.
I am on SQL Server 2008 R2 (the aim is to move it to 2016 hence the desire to trim space out of it)
Cheers All,
Alex

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply

Eric M Russell SSC Guru Points: 125563 More actions · Answer 1

Think of a database table as a library shelf containing books. When you pull a book from the shelf, the self doesn't get smaller. It just contains empty slots where other books can be inserted. The reserved size of the database won't get smaller unless you shrink the data and/or log files. However, it is recommended that the DBA not shrink files, because growing files causes time, blocking, and fragmentation.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Eirikur Eiriksson SSC Guru Points: 182919 More actions · Answer 2

Eric M Russell (12/7/2016)
Think of a database table as a library shelf containing books. When you pull a book from the shelf, the self doesn't get smaller. It just contains empty slots where other books can be inserted. The reserved size of the database won't get smaller unless you shrink the data and/or log files. However, it is recommended that the DBA not shrink files, because growing files causes time, blocking, and fragmentation.

+1

Good analogy

😎

Eric M Russell SSC Guru Points: 125563 More actions · Answer 3

Yes, the library analogy helps explain a lot.

For example, shrinking or expanding the book shelves to accommodate the current number of books would not only be a lot of work for the librarian but also a lot of blockage for patrons trying to browse the library.

Likewise, sorting the books in natural order (by author name for example) creates blockage for the browsing patrons whenever the librarian has to shuffle books around to make room for 100 copies of a best seller by the same author or whenever random books are inserted by patrons out of order (a phenomenon which does happen in real world libraries).

Indexing books by slot number would require the librarian to update the card catalog every time a book is moved, so it's more efficient to index books by author name and/or ISBN.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

alex.sqldba SSChampion Points: 10254 More actions · Answer 4

Thanks all for your replies.

Yes, I can appreciate the size of the file must remain the same, in line with your library shelf analogy. However, I expected to see free space within the file -- much like with a full length shelf of books, I'd expected to see some space where the books used to be.

It is this space I Am not seeing.

I also understand the down side of performing a shrink -- but in this instance it is because ultimately a debug flag was left on in the application, causing the db to store 7 years of verbose logging. Over 200GB's worth -- actual usable data of any value is around about 80 GB.

I could copy the 80 we want to keep rather than delete the 200 we don't. But that also doesn't explain why I cannot see the free space within the database.

Eric M Russell SSC Guru Points: 125563 More actions · Answer 5

If this table is a heap (no clustered index) then space will not be deallocated for deleted rows.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

alex.sqldba SSChampion Points: 10254 More actions · Answer 6

Hi Eric,

Whilst it is not a heap, there is a clustered index on a datetime field, I am curious as to how one would go about reclaiming the space if it was in fact a heap?

Would you have to add a clustered index to deal with it?

Eric M Russell SSC Guru Points: 125563 More actions · Answer 7

alex.sqldba (12/7/2016)
Hi Eric,
Whilst it is not a heap, there is a clustered index on a datetime field, I am curious as to how one would go about reclaiming the space if it was in fact a heap?
Would you have to add a clustered index to deal with it?

This will rebuild table and indexes in one atomic operation.

ALTER TABLE <TableName> REBUILD;

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

alex.sqldba SSChampion Points: 10254 More actions · Answer 8

alex.sqldba

SSChampion

Points: 10254

December 7, 2016 at 3:41 pm

#1916645

Ah! Marvellous, cheers Eric.

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 9

Eric M Russell (12/7/2016)
alex.sqldba (12/7/2016)
Hi Eric,
Whilst it is not a heap, there is a clustered index on a datetime field, I am curious as to how one would go about reclaiming the space if it was in fact a heap?
Would you have to add a clustered index to deal with it?
This will rebuild table and indexes in one atomic operation.
ALTER TABLE <TableName> REBUILD;

I could be wrong but the documentation on ALTER TABLE leads me to believe that it will either rebuild a HEAP or just the Clustered Index on a table and that it won't actually rebuild the Non-Clustered Indexes. Haven't actually tried it, though, so can't say for sure.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

richlion2 SSCrazy Points: 2484 More actions · Answer 10

I keep having these conversations with people who don't know how the SQL server engine works that SQL is looking after itself. I recently asked one of my analyst to clear up his data a bit as I was running out of space on tapes on my backup. So he removed his data, but to his surprise the database did not "shrink". Of course it does't!!!!!! SQL-server does not know what you want to do with it, maybe you want to reuse that space and load some new data. You also will not have any automated job that reduces the size of a database. To shrink or not to shrink, google that subject and you will know why.

Performing a "shrink database" operation is costly and must be done with special care, because you will fragment your indexes. I agreed to do it during out of office hours. It took 2 hours to shrink a database from 80GB to 17GB with the Reorg option, then at some point the process crashed as a deadlock was detected. I reran the job and then created a Maintenance Plan with an Index rebuild process followed by a Statistics process. Overall it took me almost 3 hours to do the job. It is not something you do on a daily basis and you need to have some control over it.

Ah and you must not forget the TEMPDB database. I had to add an additional file, as the default was located on the C: drive with a size limit. Rebuilding indexes requires more resources in the TEMPDB database, the default settings may get you into trouble.

And this thread just shows how well planned an operation has to be:

http://www.sqlservercentral.com/Forums/Topic767929-146-1.aspx

When deleting loads of data from your tables if you have the option to clear them completely consider using TRUNCATE instead of DELELE. This puts less pressure on your t-logs. What I have done recently was used "bcp" to export tables from a whole database to a new temporary database, TRUNCATE-ed all original tables and reimported all data back using "bcp". Of course you need to have the full code of triggers, unless on a newer version of SQL-server with the DISABLE TRIGGER option when you perform DELETE/TRUNCATE/INSERT operations. You will also have to prepare some CREATE PK and INDEX scripts, as it's best to drop all your primary keys before the import of data and to recreate your indexes after the operation.

To those who with to perform huge cleanup operations one big suggestion - DO NOT DO ANYTHING UNLESS YOU HAVE A PLAN AND HAVE TESTED ALL YOUR OPTIONS.

Mr. Brian Gale SSC-Insane Points: 24800 More actions · Answer 11

richlion2 (12/8/2016)
I keep having these conversations with people who don't know how the SQL server engine works that SQL is looking after itself. I recently asked one of my analyst to clear up his data a bit as I was running out of space on tapes on my backup. So he removed his data, but to his surprise the database did not "shrink". Of course it does't!!!!!! SQL-server does not know what you want to do with it, maybe you want to reuse that space and load some new data. You also will not have any automated job that reduces the size of a database. To shrink or not to shrink, google that subject and you will know why.
Performing a "shrink database" operation is costly and must be done with special care, because you will fragment your indexes. I agreed to do it during out of office hours. It took 2 hours to shrink a database from 80GB to 17GB with the Reorg option, then at some point the process crashed as a deadlock was detected. I reran the job and then created a Maintenance Plan with an Index rebuild process followed by a Statistics process. Overall it took me almost 3 hours to do the job. It is not something you do on a daily basis and you need to have some control over it.
Ah and you must not forget the TEMPDB database. I had to add an additional file, as the default was located on the C: drive with a size limit. Rebuilding indexes requires more resources in the TEMPDB database, the default settings may get you into trouble.
And this thread just shows how well planned an operation has to be:
http://www.sqlservercentral.com/Forums/Topic767929-146-1.aspx
When deleting loads of data from your tables if you have the option to clear them completely consider using TRUNCATE instead of DELELE. This puts less pressure on your t-logs. What I have done recently was used "bcp" to export tables from a whole database to a new temporary database, TRUNCATE-ed all original tables and reimported all data back using "bcp". Of course you need to have the full code of triggers, unless on a newer version of SQL-server with the DISABLE TRIGGER option when you perform DELETE/TRUNCATE/INSERT operations. You will also have to prepare some CREATE PK and INDEX scripts, as it's best to drop all your primary keys before the import of data and to recreate your indexes after the operation.
To those who with to perform huge cleanup operations one big suggestion - DO NOT DO ANYTHING UNLESS YOU HAVE A PLAN AND HAVE TESTED ALL YOUR OPTIONS.

I was just reading this and I have a few questions for you about it that really confuse me:

1 - Why shouldn't you do regular statistics updates and index rebuilds (on fragmented indexes)? We do it weekly and my understanding was that best practice was to do this frequently? I agree about shrinking not being done often (or at all unless absolutely neccessary), but statistics and index rebuilds I think should be done frequently. Or was your point more about the shrinking?

2 - why would you drop your indexes when you can just disable them during the data insert and rebuild when the data insert completes?

3 - why would you drop the primary keys? Why not just let it re-populate it during the insert? If it is an identity column (which a lot of PK's are), it would reseed back at the initial value from table creation time if you are doing the truncate.

4 - truncating is faster than deleting if you are removing all rows, but isn't your process of truncate followed by insert and recreating PK's and indexes a lot slower and more memory/cpu intensive than just doing the delete with covering indexes? I can't think of a case where DELETE would be slower unless you wanted to delete everything or your indexes/statistics were not optimized.

And to add to your last point, I think you should not remove any data from the database unless you are sure your backups are good. Before doing a huge data cleanup, first perform a full backup as you will get faster restore times. And depending on how much data you are removing, you may even want to switch to simple recovery mode and back to full when you are done if your SLA requires you to be in full recovery mode.

I would just like more clarification about those points so we don't scare/confuse too many other DBA's.

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

Eric M Russell SSC Guru Points: 125563 More actions · Answer 12

Jeff Moden (12/7/2016)
Eric M Russell (12/7/2016)
alex.sqldba (12/7/2016)
Hi Eric,
Whilst it is not a heap, there is a clustered index on a datetime field, I am curious as to how one would go about reclaiming the space if it was in fact a heap?
Would you have to add a clustered index to deal with it?
This will rebuild table and indexes in one atomic operation.
ALTER TABLE <TableName> REBUILD;
I could be wrong but the documentation on ALTER TABLE leads me to believe that it will either rebuild a HEAP or just the Clustered Index on a table and that it won't actually rebuild the Non-Clustered Indexes. Haven't actually tried it, though, so can't say for sure.

If the table is rebuilt, then the indexes must be rebuilt as well since they are bookmarked by row ids or clustering keys. So, ALTER TABLE <TableName> REBUILD will do this automatically, rebuilding non-clustered indexes in proper order within a single implicit transaction.

If this table is a heap, then you need to add a clustered key, even if it's just an INT IDENTITY(1,1) for the sole purpose of reorganization and space reallocation.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

alex.sqldba SSChampion Points: 10254 More actions · Answer 13

So he removed his data, but to his surprise the database did not "shrink". Of course it does't!!!!!! SQL-server does not know what you want to do with it

Not sure if this was aimed at me, but to add its not that I expected the data files to shrink. But I did expect the free space within the files to be visible. And to an extent it is. It's just not as much as I was expecting.

I was also of the assumption that white space in a database doesn't consume any space in a backup file. Though I might be wrong about that.

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 14

alex.sqldba (12/8/2016)

I did expect the free space within the files to be visible. And to an extent it is. It's just not as much as I was expecting.

Even a single row with even just a single NULL in a single column in it will cause the whole page it lives on to persevere even though all other rows have been deleted. Something similar may happen with Uniform Extents although I don't know for sure and would have to lookup or confirm. That's why the Index Rebuilds have to be done to "release" the space.

As a bit of a sidebar, I've found that "ONLINE" rebuilds don't do quite as good a job as "OFFLINE" rebuilds do, which may add to what you're seeing a bit. That's a casual observation rather than anything I can prove. I've not taken the time to prove it to myself with an experiment because I've also found that OFFLINE rebuilds are faster and take less space (document by MS) so I try to do OFFLINE rebuilds whenever possible and ONLINE rebuilds only when I can't do the OFFLINE ones.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)