Does it make a sense to update statistics right after tables truncated?

  • Hi all,

    We have a lengthy process that uses 30 tables, some of them are large. In the beginning we truncate them, then ETL data into them, then massage, perform some calculations, and finally reporting. I am tasked to clean up this process. Right after initial truncation we update statistics on all these tables. Does it make a sense? Is statistics preserved even after truncate the tables? And is it useful?

    Thanks

     

  • My opinion - that's pointless. Statistics tell you information about your table data. If the table is empty, the statistics are (mostly) useless. You need data in the table for the statistics to be helpful.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • This was removed by the editor as SPAM

  • A little stupid question by me, I am new into SQL Server, but if the tables are empty after truncate, the statistics should be finished in probably less than a second, because there is empty table, or I am mistaken? Seems like pointless activity, on the other hand even pointless seems to not take too much time and/or effort to finish since the tables are empty. Isn't more concerning about the statistics once the table is again filled with data??

  • If I had to guess, whoever designed that was truncating the table then looking at the statistics for that table and seeing that the statistics indicated there were rows in the table even though you knew it was empty - you truncated it, so why wouldn't the statistics reflect this? The reason - the table metadata indicates the table is empty, so the statistics aren't actually used. Kendra Little has a nice article on this:

    https://littlekendra.com/2016/12/08/does-truncate-table-reset-statistics/

    BUT it still feels like a pointless activity to me. Update the statistics AFTER things are loaded, not before. My opinion, leaving pointless activities in place like that would raise my eyebrows at whoever was/is responsible for maintaining that. If I don't trust the other DBA's to properly maintain databases, I'm going to be taking on extra work for myself which isn't going to be fun. I would much rather reach out to the other DBA's and determine why they did that. MAYBE they have a reason (likely ignorance, but never know - there may be a reason).

    Imagine you loaded up your ETL load and step 1 was to truncate a table. Step 2 was to delete from the same table. Wouldn't you wonder who created that ETL and start to question their ETL ability? It doesn't instantly make me distrust a DBA, but it makes me question their training and knowledge and, if I am new to that system, I am going to be doing my own investigation into their database maintenance configuration (backups process, test restore process, RPO, RTO, index maintenance, statistics maintenance, etc).

     

    Now with the above being said, a "trick" to improve ETL speeds (on my systems, YMMV), if you turn off non-clustered indexes before the truncate and rebuild them after the ETL completes, that can improve the performance of your ETL. In my company, it shaved off a few hours by doing that! Still took hours to complete, but was MUCH faster by disabling non-clustered indexes and then rebuilding them once complete. We probably could have been selective on the indexes to shut off, but we just went nuclear and disabled all non clustered indexes and rebuilt all afterwards as scripting that was much easier than trying to manage and test all the indexes that exist. PLUS we have no non clustered indexes on our staging tables, so it had no negative impact on the individual ETL steps. Probably could get faster ETL by putting indexes on the staging tables and turning those on once populated and before the TL part of the ETL process actually starts. Our ETL is more of an ELETL - extract (from source system), load (to staging tables), extract (from staging tables), transform, load (to final tables). Messy but it got the job done.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply