June 30, 2022 at 11:31 pm
Hello!
We have 5 TB database which contains BLOB about 4TB. We have exported the images to file system, now we need to update the column (which contains the BLOB data) with "NULL" values.
Which will be better option in this scenario or are there any other options that will be more efficient?
Thanks in advance.
July 1, 2022 at 7:27 am
Re point 1 - that no one will be able to answer for you, it will depend on so many factors, CPU/Memory/IO/Other Activity/HADR, the best thing there would be to perform a test in dev/test of the same and extrapolate out the information.
Update 100,000 rows to null, that takes 10 minutes, I have 1,000,000 rows to do in Prod so we are looking 100 minutes, plus add some time for a buffer, maybe 120minutes of maintenance.
Re point 2 - that is correct, while the DB will be 5TB, the backups will be much smaller as they only backup the data pages in use. As you have deallocated a lot of the data due to setting to NULL and extracting to the file system, the backups will be a lot smaller.
Re point 3 - Given the shear size you have removed from the DB, I would advocate for moving to a new blank DB, it will be a lot quicker than performing a DBCC SHRINKFILE or lots of small DBCC SHRINKFILE operations.
Also remember that a DBCC SHRINKFILE does cause blocking at the database level, so which ever way you go on that task, you will cause major downtime for the application.
Again, that's one for dev/test, to see which option is the quicker option for you.
July 1, 2022 at 12:55 pm
Thank you Ant-Green for your response.
Basically whichever approach we take to update the column, importing data to new database, shrink db files there will be downtime to the application\need maintenance window, as this application is critical was looking for something that will not cause downtime or current application still accessible.
Inserting and updating (the column to NULL)creates row-level locks.However, when the number of locks in any transaction is 5,000 or more then a lock escalation occurs and it creates a table level lock.
Thanks.
July 1, 2022 at 1:10 pm
How many tables are being updated ?
July 1, 2022 at 1:17 pm
Dealing just w/ the question of updating, and not recovering space:
If you are updating all rows, can't you just drop the column and re-add (wondering if/why you still need the column at all if your are now using filesystem)?
July 1, 2022 at 1:59 pm
regarding the update to null - depending on the frequency of that table being updated you may be able to create another table with same structure (including indexes), insert all rows from main into this table while setting the column to null and at the end do a truncate source + partition switch from new to old table
or even rename the tables
as for shrink without knowing all details (size/number of rows) of the remaining tables after this update operation its hard to state what would be better - one option would be to do a shrink empty file to a new file and then back to new one - not sure if this one requires indexes to be rebuilt after the moves.
July 1, 2022 at 3:34 pm
NOTE: I modified the quote to use numbers (rather than generic bullets) to make discussing each one easier.
(1) How long will the update process take to update that column to NULL values?
(2) Once updated,DB will have about 4 TB free space but the backup size will be significantly less even though the DB size is 5 TB as the backup only contains the pages with data on them, is that correct?
(3) Once column is updated, we will be left with about 4 TB free space which we would like to reclaim, below are few options we are looking into :
<li style="list-style-type: none;">
- Shrinking the data file to reclaim the space, this will be a long process ( can cause blocking too)as it would need to be done during maintenance window and index will have to be rebuilt too.
- Instead of shrinking the data file would be better option to create an empty database, script out all objects from current database and create those on new database, then migrate data to this new DB? But this will require downtime to the application to perform this task whereas shrink will be long process but will not require downtime to the app.
(4) Which will be better option in this scenario or are there any other options that will be more efficient?
(1) Did you specify 'max value types out of row' on the table? If so, it should be almost immediate. If not, it could take quite some time, hard to say exactly how much.
(2) Correct.
(3A) Rather than a new db, you could create a table in a different filegroup (if allowed in your shop). Temporarily use a different table name, then rename it once you're ready to make it the "real" table.
(3B) Create the clustered index first but not any non-clustered indexes. First decide if you need to use page compression or not on the new table!
(3C) Then copy the non-BLOB data from the original table to the new table. If the current table is clustered by identity or ascending date/time column in the original table, you could do the copy while the existing table is in place.
(3D) Create non-clustered indexes, if any.
(3E) Copy any remaining rows that have been added to the main table while you were copying data to the new table.
(3F) Rename the original table to a different name, rename the new table to the original table name.
(4) Data compression could make the table much smaller once you've gotten the blobs out of the way (assuming they were not forced out-of-row originally).
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
July 9, 2022 at 4:32 pm
Shrinking datafile(s) might create serious performance ( fragmentation, index inverted, forwaded rows ). If this occurs then you will have to rebuild indexes, rebuild tables etc.
What is the risk of having performance issue after shinking datafile(s) ? I can not really say, but I've seen that several times.
I understand that this is an important application, you want to minize the downtime.
I rather have a controlled down time ( when application teams , business teams, managment are aware and approve ) then be forced to shutdown application to rebuild indexes, tables etc...
So my suggestion is to recreate a new database. It will require a controlled down time. Ideally you would have a test environement to test your rebuild operation and give an good estimate down time for production environment.
It's not an easy job, I wish you good luck.
July 10, 2022 at 7:29 pm
I would just shrink the files. If your are on a SAN disk you might not even notice any performance degradation. If you do you might just need to rebuild some of the indexes.
July 18, 2022 at 6:36 pm
I have copied all data (except for BLOB) to the new database, now I need to copy all indexes, statistics, primary foreign keys from the original tables to these new tables.
Does generate script allow that option or will have to use some script to script out everything?
Thanks in advance.
July 18, 2022 at 6:53 pm
Gen script should be able to do that, just specify the correct scripting options in SSMS before you gen the script.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply