Database Size After Deleting 200-million rows

  • I was just reading this and I have a few questions for you about it that really confuse me:

    1 - Why shouldn't you do regular statistics updates and index rebuilds (on fragmented indexes)? We do it weekly and my understanding was that best practice was to do this frequently? I agree about shrinking not being done often (or at all unless absolutely neccessary), but statistics and index rebuilds I think should be done frequently. Or was your point more about the shrinking?

    2 - why would you drop your indexes when you can just disable them during the data insert and rebuild when the data insert completes?

    3 - why would you drop the primary keys? Why not just let it re-populate it during the insert? If it is an identity column (which a lot of PK's are), it would reseed back at the initial value from table creation time if you are doing the truncate.

    4 - truncating is faster than deleting if you are removing all rows, but isn't your process of truncate followed by insert and recreating PK's and indexes a lot slower and more memory/cpu intensive than just doing the delete with covering indexes? I can't think of a case where DELETE would be slower unless you wanted to delete everything or your indexes/statistics were not optimized.

    And to add to your last point, I think you should not remove any data from the database unless you are sure your backups are good. Before doing a huge data cleanup, first perform a full backup as you will get faster restore times. And depending on how much data you are removing, you may even want to switch to simple recovery mode and back to full when you are done if your SLA requires you to be in full recovery mode.

    I would just like more clarification about those points so we don't scare/confuse too many other DBA's.

    Hi there, you can have my suggestions... (you can correct me if I am wrong)

    1. Everything depends on the type of database you have and the amount of data that is shifted around, if you have a heavy OLTP database and a lot of data it's best to monitor your indexes using a query. There are plenty of samples online you can get hold of. Secondly - I never ever create any regular sheduled jobs as it normally should not be done during working hours. Whether it's Oracle or SQL-Server this should done under a DBA's supervision, usually planned and best executed during our of office hours. If it fails, you need to make sure your database is not in any way in a bad state (expanded TEMPDB, too many T-Logs generated, out of disk space issues, etc.). Remember, such a process will generate tons of data in your T-Logs. I've just seen at least 2-3 recent posts from people crying for help, because they have done something and ran out of space. Hence my post.

    " We do it weekly and my understanding was that best practice was to do this frequently?"

    It's up to you, if you have tested your schedules and don't have any problem with it, then leave it as it is. People should still be careful which option to choose, this is a very nice article about stats:

    https://www.pythian.com/blog/sql-server-statistics-maintenance-and-best-practices/

    And a quote from there:

    Statistics update after Index Rebuild: As mentioned previously, the index rebuild (not reorg) will also update index statistics using full scan. Scheduling stats maintenance after the index maintenance will cause duplicate work.

    However, SQL-Server boasts the best Optimizer and statistics are gathered automatically almost every time an sql is run (I think there are complex algorithms SQL server is using, probably we can read about it at another time). I have a couple of big database and there is no need to run regular statistics on them. If you feel your weekly jobs helps boots the speed of your system and you have proven it does, then why not?

    I recently ran a shrink/rebuild index/statistics on a database that has never had this done during the past 10 years (there was no DBA in sight). Users complained it was slow, but there is no significant improvement after what I did. Application, lack of indexes, network may be at fault too.

    2. If you have the option to disable - yes, you can. Sorry, I recently had to deal with a SQL-2000 version and sometimes you don't have that option. There is no "ALTER TRIGGER DISABLE" in that version and all triggers, functions and procedures are encrypted. You probably can use DISABLE, however if you had a table that was 20GB in size and actually should have 1GB, then I am not sure about the TRUNCATE, what would happen to the index, will all blocks/leafs be cleared? I would have to figure it out. I did not find MS talking a lot about it except this:

    https://msdn.microsoft.com/en-us/library/ms177570.aspx

    For me DROP/CREATE would be the cleanest operation.

    3. If you have to repopulate your tables in a database that is 80 GB in size try it. It may not be as fast. I was testing this theory and I waited almost 8 hours during the import, job terminated. After doing TRUNCATE/DROP PK/DROP INDEX and import it took me 2 hours to repopulate the tables and rebuild the indexes. But it depends on your database, if you have 200 tables and 5 indexes then of course it makes sense to do the opposite.

    4. Truncate takes a few seconds, try deleting 120 GB of data in 20 tables. Again, it depends on your database. Every table also has something Oracle calls "high water mark", SQL Server may not have a similar concept, but I always take it into consideration. DELETE in a 20GB table does not move the HWM back to it's beginning and may not actually give you back the space you need.

    Compared to the DELETE statement, TRUNCATE TABLE has the following advantages:

    - Less transaction log space is used.

    - The DELETE statement removes rows one at a time and records an entry in the transaction log for each deleted row. TRUNCATE TABLE removes the data by deallocating the data pages used to store the table data and records only the page deallocations in the transaction log.

    - Fewer locks are typically used.

    - When the DELETE statement is executed using a row lock, each row in the table is locked for deletion. TRUNCATE TABLE always locks the table (including a schema (SCH-M) lock) and page but not each row.

    - Without exception, zero pages are left in the table.

    After a DELETE statement is executed, the table can still contain empty pages. For example, empty pages in a heap cannot be deallocated without at least an exclusive (LCK_M_X) table lock. If the delete operation does not use a table lock, the table (heap) will contain many empty pages. For indexes, the delete operation can leave empty pages behind, although these pages will be deallocated quickly by a background cleanup process.

    As per the last statement: "these pages will be deallocated quickly by a background cleanup process". If you planned your job and want to finish it, then you may not with to wait 2 days for the cleanup process to finish. There are limitations as to how self-maintaing an SQL server database engine is. It's good, but not always perfect.

    Do not take all my words literally, somethimes people cry for help, so we tend to write solutions that cannot cover all aspects of the problem. 😎

  • alex.sqldba (12/8/2016)


    So he removed his data, but to his surprise the database did not "shrink". Of course it does't!!!!!! SQL-server does not know what you want to do with it

    Not sure if this was aimed at me, but to add its not that I expected the data files to shrink. But I did expect the free space within the files to be visible. And to an extent it is. It's just not as much as I was expecting.

    I was also of the assumption that white space in a database doesn't consume any space in a backup file. Though I might be wrong about that.

    Hi, sorry, it was a bit aimed at all people who assume that SQL server is sooo easy and every thing is just automatic. Didn't mean to offend any body really, just general remark that we should not take SQL server maintaing itself for granted. 😉

    On the "free space" - white space will not consume space in a compressed database backup, but SQL server will leave that space internally availble for new objects to be created. The concept is not to contanctly expand/shrink/expand/shrink, etc. etc. Adding new block to the database may me a tiny cost, but still does consume CPU and the idea (I think) is to save CPU on such an operation. I think some wrote a fine quote: "your disk space on your server it for using it, not to lay around wasted".

  • Morning 🙂

    I was in no way offended, but I did want to clarify what my question asking.

    Interesting point on white space in compressed backups behaving different to non-compressed -- I didn't know that detail.

    Cheers!

  • One of my databases is 25GB, the compressed backup is 2,9GB. Another one, database 105GB, compressed backup is aprox. 20GB.

    And still on truncate, an interesting view you can read about here:

    https://www.mssqltips.com/sqlservertip/1080/deleting-data-in-sql-server-with-truncate-vs-delete-commands/

    And I find this really interesting, didn't know about it myself until now :w00t:

    When tables require that all records be deleted and TRUNCATE TABLE cannot be used, the following statements can be used to delete the data and reset the identity value:

    DELETE from "table_name"

    DBCC CHECKIDENT("table_name", RESEED, "reseed_value")

  • Hello richlion2,

    I did not think of the case where disabling the index wasn't an option. I suppose for old SQL versions, that does make more sense then.

    With respect to the enabling index, you need to rebuild it to enable it so any existing order or data in the index would be lost when you enabled it again.

    And as for the primary key issue, I suppose it all depends on how much data you are putting back into the table after you truncate it. If you have a 120 GB table and you are putting 119 GB back in, it doesn't really make sense to truncate and repopulate it, eh? BUT if you have a 120 GB table and you are only putting 10 GB back in after truncating it, then the truncate and repopulate method does make more sense.

    AND I realized the biggest mistake in my previous post - I assumed there was a "best way" to do it. I've been working with SQL and been to PASS Summit enough times to know that the best way to do something depends on the situation. Pretty much every Summit presentation has a question asked by the audience where the answer is "it depends".

    Thanks for taking the time to clarify

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Eirikur Eiriksson (12/7/2016)


    Eric M Russell (12/7/2016)


    Think of a database table as a library shelf containing books. When you pull a book from the shelf, the self doesn't get smaller. It just contains empty slots where other books can be inserted. The reserved size of the database won't get smaller unless you shrink the data and/or log files. However, it is recommended that the DBA not shrink files, because growing files causes time, blocking, and fragmentation.

    +1

    Good analogy

    😎

    I always used a bucket & water for my analogy when explaining this to others but I like yours better. Hope you don't mind if I steal it in the future. 😉

    +1


    SELECT quote FROM brain WHERE original = 1
    0 rows returned

Viewing 6 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply