October 15, 2018 at 8:25 am
I have got an issue that is driving me nuts.
There is table in an Azure database containing some 8.800.000 rows . This table is created and loaded in a batch process loading data from an exernal source file. Once the loading is done, the table has a size of roughly 3 Gb.
Now, the next step in the batch process is dropping a column from the table, next adding a column to the table, and finally doing an update of that column. Sometimes, but not all the time, after this update the table has grown in size to 300 Gb.
Now, the column added and updates is not a varchar column: it is a decimal (13,0) column, so I cannot comprehend why the table should grow to 100 times it original size. Has anyone had the same issue, or does anyone have a theory why this might happen?
October 15, 2018 at 12:27 pm
That's.... a lot.
From an update I can imagine a scenario where you get quite a few page splits and end up with the size growing 2-4 times. A 100 times growth suggests something else is up. I only have questions. How are you measuring the data size of the table? Are there indexes involved? Have you looked at the fragmentation of the indexes and/or the table (if it's a heap)?
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
October 15, 2018 at 12:35 pm
Regardless of what is causing the issue have you considered not updating the table, but rather do the insert into a new table where you add the column on the fly, with the correct value and also dropping the column that is not needed?
Single transaction or even in batches if required and it probably performs better and won't increase as much.
October 15, 2018 at 12:42 pm
Is there maybe a persisted calculated field based that column?
October 15, 2018 at 2:41 pm
Grant Fritchey - Monday, October 15, 2018 12:27 PMThat's.... a lot.From an update I can imagine a scenario where you get quite a few page splits and end up with the size growing 2-4 times. A 100 times growth suggests something else is up. I only have questions. How are you measuring the data size of the table? Are there indexes involved? Have you looked at the fragmentation of the indexes and/or the table (if it's a heap)?
The updates are being done in a batch. It's VERY possible that you can get "skewed" page splits where many parts of the table drop to nearly trivial page densities (like 10% or less).
--Jeff Moden
Change is inevitable... Change for the better is not.
October 15, 2018 at 2:48 pm
And stop adding and deleting columns. Populate the table with the "new" column in place and, if it's variable width, make sure that you default it with SPACE(X) where "X" is a number that represents what the max width of 80% of the rows will be once populated. If it's a MAX LOB, get that the hell out of the table using sp_TableOption to force all LOBs out of row.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 15, 2018 at 2:50 pm
gerben.scherpenzeel - Monday, October 15, 2018 8:25 AMI have got an issue that is driving me nuts.There is table in an Azure database containing some 8.800.000 rows . This table is created and loaded in a batch process loading data from an exernal source file. Once the loading is done, the table has a size of roughly 3 Gb.
Now, the next step in the batch process is dropping a column from the table, next adding a column to the table, and finally doing an update of that column. Sometimes, but not all the time, after this update the table has grown in size to 300 Gb.
Now, the column added and updates is not a varchar column: it is a decimal (13,0) column, so I cannot comprehend why the table should grow to 100 times it original size. Has anyone had the same issue, or does anyone have a theory why this might happen?
Adding the column causes the rows to grow when they're populated because the "new" column is just a place holder until populated when you add a column like this.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 18, 2018 at 2:19 am
Grant Fritchey - Monday, October 15, 2018 12:27 PMThat's.... a lot.From an update I can imagine a scenario where you get quite a few page splits and end up with the size growing 2-4 times. A 100 times growth suggests something else is up. I only have questions. How are you measuring the data size of the table? Are there indexes involved? Have you looked at the fragmentation of the indexes and/or the table (if it's a heap)?
Hai Grant,
yes, that is a lot 🙂 . There are indexes involved, but I separate the index and the table when measuring size. So the size is really only the table.
To measure the table size, I use:
select s.name as schema_name, o.name as object_name, sum(reserved_page_count) * 8.0 / 1024 / 1024 as SizeInGB
from sys.dm_db_partition_stats ps
join sys.objects o on ps.object_id = o.object_id
join sys.schemas s on o.schema_id = s.schema_id
group by s.name, o.name
order by SizeInGB desc
Yes, the table is a heap, but I did not yet look at index fragmentation; that might be a good one the next time the problem surfaces (did I mention that it does not happen all the time? )
October 18, 2018 at 2:24 am
frederico_fonseca - Monday, October 15, 2018 12:35 PMRegardless of what is causing the issue have you considered not updating the table, but rather do the insert into a new table where you add the column on the fly, with the correct value and also dropping the column that is not needed?Single transaction or even in batches if required and it probably performs better and won't increase as much.
Hai Frederico ,
yes, I have already developed another way of handling the data and suggested this solution to the customer. However, he wants the know the root cause, so he can be certain that the solution I suggested will indeed prevent the sudden growth from happening again....
and to be honest: I am curious as well what causes this behaviour...
October 18, 2018 at 2:30 am
ZZartin - Monday, October 15, 2018 12:42 PMIs there maybe a persisted calculated field based that column?
Interesting suggestion, but no....
October 18, 2018 at 2:34 am
Jeff Moden - Monday, October 15, 2018 2:48 PMAnd stop adding and deleting columns. Populate the table with the "new" column in place and, if it's variable width, make sure that you default it with SPACE(X) where "X" is a number that represents what the max width of 80% of the rows will be once populated. If it's a MAX LOB, get that the hell out of the table using sp_TableOption to force all LOBs out of row.
Hai Jeff,
I agree, deleting and adding the column is nuts, and I suggested as much to the customer. However, it is not a variable width column: it is a decimal (13,0) column.
And what is puzzling me is that is does not happen all the time....
October 18, 2018 at 11:24 am
gerben.scherpenzeel - Thursday, October 18, 2018 2:34 AMJeff Moden - Monday, October 15, 2018 2:48 PMAnd stop adding and deleting columns. Populate the table with the "new" column in place and, if it's variable width, make sure that you default it with SPACE(X) where "X" is a number that represents what the max width of 80% of the rows will be once populated. If it's a MAX LOB, get that the hell out of the table using sp_TableOption to force all LOBs out of row.Hai Jeff,
I agree, deleting and adding the column is nuts, and I suggested as much to the customer. However, it is not a variable width column: it is a decimal (13,0) column.
And what is puzzling me is that is does not happen all the time....
Still, if it's a newly allocated column to the table, the space for the column isn't actually allocated at the page level until you start to populate it with data, causing massive page splits.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply