Table growing excessively when updating column

Question

Table growing excessively when updating column

gerben.scherpenzeel

SSC Enthusiast

Points: 156
More actions
October 15, 2018 at 8:25 am

#371385

I have got an issue that is driving me nuts.
There is table in an Azure database containing some 8.800.000 rows . This table is created and loaded in a batch process loading data from an exernal source file. Once the loading is done, the table has a size of roughly 3 Gb.
Now, the next step in the batch process is dropping a column from the table, next adding a column to the table, and finally doing an update of that column. Sometimes, but not all the time, after this update the table has grown in size to 300 Gb.
Now, the column added and updates is not a varchar column: it is a decimal (13,0) column, so I cannot comprehend why the table should grow to 100 times it original size. Has anyone had the same issue, or does anyone have a theory why this might happen?

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Grant Fritchey SSC Guru Points: 398890 More actions · Answer 1

That's.... a lot.

From an update I can imagine a scenario where you get quite a few page splits and end up with the size growing 2-4 times. A 100 times growth suggests something else is up. I only have questions. How are you measuring the data size of the table? Are there indexes involved? Have you looked at the fragmentation of the indexes and/or the table (if it's a heap)?

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

frederico_fonseca SSCoach Points: 16157 More actions · Answer 2

Regardless of what is causing the issue have you considered not updating the table, but rather do the insert into a new table where you add the column on the fly, with the correct value and also dropping the column that is not needed?

Single transaction or even in batches if required and it probably performs better and won't increase as much.

ZZartin SSC-Dedicated Points: 30920 More actions · Answer 3

Is there maybe a persisted calculated field based that column?

Jeff Moden SSC Guru Points: 1004374 More actions · Answer 4

Grant Fritchey - Monday, October 15, 2018 12:27 PM
That's.... a lot.
From an update I can imagine a scenario where you get quite a few page splits and end up with the size growing 2-4 times. A 100 times growth suggests something else is up. I only have questions. How are you measuring the data size of the table? Are there indexes involved? Have you looked at the fragmentation of the indexes and/or the table (if it's a heap)?

The updates are being done in a batch. It's VERY possible that you can get "skewed" page splits where many parts of the table drop to nearly trivial page densities (like 10% or less).

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004374 More actions · Answer 5

And stop adding and deleting columns. Populate the table with the "new" column in place and, if it's variable width, make sure that you default it with SPACE(X) where "X" is a number that represents what the max width of 80% of the rows will be once populated. If it's a MAX LOB, get that the hell out of the table using sp_TableOption to force all LOBs out of row.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004374 More actions · Answer 6

gerben.scherpenzeel - Monday, October 15, 2018 8:25 AM
I have got an issue that is driving me nuts.
There is table in an Azure database containing some 8.800.000 rows . This table is created and loaded in a batch process loading data from an exernal source file. Once the loading is done, the table has a size of roughly 3 Gb.
Now, the next step in the batch process is dropping a column from the table, next adding a column to the table, and finally doing an update of that column. Sometimes, but not all the time, after this update the table has grown in size to 300 Gb.
Now, the column added and updates is not a varchar column: it is a decimal (13,0) column, so I cannot comprehend why the table should grow to 100 times it original size. Has anyone had the same issue, or does anyone have a theory why this might happen?

Adding the column causes the rows to grow when they're populated because the "new" column is just a place holder until populated when you add a column like this.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

gerben.scherpenzeel SSC Enthusiast Points: 156 More actions · Answer 7

Grant Fritchey - Monday, October 15, 2018 12:27 PM
That's.... a lot.
From an update I can imagine a scenario where you get quite a few page splits and end up with the size growing 2-4 times. A 100 times growth suggests something else is up. I only have questions. How are you measuring the data size of the table? Are there indexes involved? Have you looked at the fragmentation of the indexes and/or the table (if it's a heap)?

Hai Grant,

yes, that is a lot 🙂 . There are indexes involved, but I separate the index and the table when measuring size. So the size is really only the table.
To measure the table size, I use:

select s.name as schema_name, o.name as object_name, sum(reserved_page_count) * 8.0 / 1024 / 1024 as SizeInGB
from sys.dm_db_partition_stats ps
join sys.objects o on ps.object_id = o.object_id
join sys.schemas s on o.schema_id = s.schema_id
group by s.name, o.name
order by SizeInGB desc

Yes, the table is a heap, but I did not yet look at index fragmentation; that might be a good one the next time the problem surfaces (did I mention that it does not happen all the time? )

gerben.scherpenzeel SSC Enthusiast Points: 156 More actions · Answer 8

frederico_fonseca - Monday, October 15, 2018 12:35 PM
Regardless of what is causing the issue have you considered not updating the table, but rather do the insert into a new table where you add the column on the fly, with the correct value and also dropping the column that is not needed?
Single transaction or even in batches if required and it probably performs better and won't increase as much.

Hai Frederico ,

yes, I have already developed another way of handling the data and suggested this solution to the customer. However, he wants the know the root cause, so he can be certain that the solution I suggested will indeed prevent the sudden growth from happening again....

and to be honest: I am curious as well what causes this behaviour...

gerben.scherpenzeel SSC Enthusiast Points: 156 More actions · Answer 9

ZZartin - Monday, October 15, 2018 12:42 PM
Is there maybe a persisted calculated field based that column?

Interesting suggestion, but no....

gerben.scherpenzeel SSC Enthusiast Points: 156 More actions · Answer 10

Jeff Moden - Monday, October 15, 2018 2:48 PM
And stop adding and deleting columns. Populate the table with the "new" column in place and, if it's variable width, make sure that you default it with SPACE(X) where "X" is a number that represents what the max width of 80% of the rows will be once populated. If it's a MAX LOB, get that the hell out of the table using sp_TableOption to force all LOBs out of row.

Hai Jeff,

I agree, deleting and adding the column is nuts, and I suggested as much to the customer. However, it is not a variable width column: it is a decimal (13,0) column.

And what is puzzling me is that is does not happen all the time....

Jeff Moden SSC Guru Points: 1004374 More actions · Answer 11

gerben.scherpenzeel - Thursday, October 18, 2018 2:34 AM
Jeff Moden - Monday, October 15, 2018 2:48 PM
And stop adding and deleting columns. Populate the table with the "new" column in place and, if it's variable width, make sure that you default it with SPACE(X) where "X" is a number that represents what the max width of 80% of the rows will be once populated. If it's a MAX LOB, get that the hell out of the table using sp_TableOption to force all LOBs out of row.
Hai Jeff,
I agree, deleting and adding the column is nuts, and I suggested as much to the customer. However, it is not a variable width column: it is a decimal (13,0) column.
And what is puzzling me is that is does not happen all the time....

Still, if it's a newly allocated column to the table, the space for the column isn't actually allocated at the page level until you start to populate it with data, causing massive page splits.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)