Adding NOT NULL to column with existing rows causes page bloat

  • Hi,

    I'd like to know why the page count is almost exactly doubled for a table with existing rows, having updated the column in question with values so none are NULL (this step also causes double pages, but I understand why, fragmentation becomes 99% so i REBUILD clustered index), then running ALTER TABLE ... ALTER COLUMN ... NOT NULL  (without a default) also causes a page bloat back to double and 99% fragmentation. Interestingly enough, I had to do this for two columns and without a REBUILD after the first NOT NULL modification the second NOT NULL modification (with the table bloated) executes fast and doesn't cause additional pages.

    Edit1: Wrong observation, it takes similar time but no additional pages but by then we already have enough pages and space inside them.

    I don't understand why removing nullability should cause this since I've already added the data required in the column and space has been allocated. Modification of the null bitmap?

    I running standard edition and the fillfactor is 0 e.g. full pages.

    Thanks for any insights!

    Edit2: if I rebuild after the update with fill factor 90 my fragmentation for the NOT NULL modification no longer skyrockets from 0% to 99% but to 6% and page increase is minimal - still takes a while though. 

     

    • This topic was modified 4 years, 4 months ago by  timoq.
  • Interesting to see this. If there are no NULLs, I wouldn't expect rows to need to move or any page splits. This ought to be a meta data only change.

     

  • It would be helpful if you posted the CREATE TABLE statement for the table both before and after each change you make to the table.  Also, the devil's in the details so it would also be helpful to see all the code you used to make this change so we can try to simulate what you did especially since your first edit seems to indicate that your description of your observations was initially incorrect.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Here is a script that details what I do step by step so you can repeat my results. I've found a clue at least - the column added is variable length. If I change that to int or char(32) I get no page increase, much less logical reads and presumably faster execution. Obviously not a solution but may offer a lead as to why this happens?

    Feel free to run this and observe page count increase, fragmentation increase and then change datatype to char(32).

    use tempdb

    drop table if exists testtable
    create table testtable(id int not null identity, dumyColumn int null constraint pk_id primary key (id))
    GO

    insert testtable(dumyColumn) values (42)
    go 10000

    -- ignoring transcactions for brevity

    -- allow nulls
    alter table testtable add newColumn varchar(32) -- change to non variable type
    select index_id, index_level, index_type_desc, avg_fragmentation_in_percent, page_count from sys.dm_db_index_physical_stats(db_id('tempdb'), object_id('testtable'), null, null, 'detailed')
    go

    -- update all rows first
    update testtable set newColumn = 'newValue'

    select index_id, index_level, index_type_desc, avg_fragmentation_in_percent, page_count from sys.dm_db_index_physical_stats(db_id('tempdb'), object_id('testtable'), null, null, 'detailed')

    alter index pk_id on testtable rebuild

    select index_id, index_level, index_type_desc, avg_fragmentation_in_percent, page_count from sys.dm_db_index_physical_stats(db_id('tempdb'), object_id('testtable'), null, null, 'detailed')

    set statistics io on
    -- set not null constraint
    alter table testtable alter column newColumn varchar(32) not null -- change to non variable type
    set statistics io off

    select index_id, index_level, index_type_desc, avg_fragmentation_in_percent, page_count from sys.dm_db_index_physical_stats(db_id('tempdb'), object_id('testtable'), null, null, 'detailed')

    (As for my original first edit, it was only related to execution time regarding the second not null modification (the strikethrough in original post))

    Regards,

    Timo

    • This reply was modified 4 years, 4 months ago by  timoq.
  • I am going to take swag at why you see the what happens to a variable length nullable column created on an existing table with no default value.  The initial column is created taking no space since only the null value is needed and that takes no space at the end of each row of data.  You then update that column with a value that results in expansive updates of all the data rows, resulting in page splits and fragmentation.  If you use a fixed length column (int, char(32)) the necessary space is allocated so when you update the column the space already exists for the new data.

    This is an untested explanation, but seems logical to me based on what you are seeing.  I currently do not have access to server where I can test my hypothesis.

     

  • Based on what I'm seeing, the index is rebuilt AFTER the data is added to the new column and the simple act of redefining the column as NOT NULL adds 10 bytes (which is perfectly explainable) to the average width of the rows.  It almost seems like the column is being rebuilt within the index by creating a new column and dropping the old one, which doesn't actually drop any space.  To recover space from a variable width column drop, you have to do an index rebuild.

    If you rebuild the index after the NOT NULL constraint has been added, the row width goes back to the original 27 bytes.

    There's only one way to know for sure and that's going to require a trip to DBCC IND and DBCC PAGE, which I don't have time for tonight.  Still, I'm amazed at this... who would think this type of thing would happen just by adding a NOT NULL to an existing column?  It's news to me.

    I have seen stranger, though.  For example... https://www.sqlservercentral.com/articles/some-t-sql-inserts-do-follow-the-fill-factor-sql-oolie

     

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I am not privy to why the "simple act of redefining the column as NOT NULL adds 10 bytes" are "perfectly explainable". Please do. 🙂 I mean the column exist with data already added to each row thus allocated.

    I have a dummyColumn allowing NULL so I presume I already have a null bitmap allocated in the row (I was pondering whether the new NOT NULL column required reallocating space for more bits in the null bitmap but I assume the smallest null bitmap would be a byte so that should not be the case - I may be completely wrong how the null bitmap is allocated, just a guess).

    Regards,

    Timo

  • timoq wrote:

    I am not privy to why the "simple act of redefining the column as NOT NULL adds 10 bytes" are "perfectly explainable". Please do. 🙂 I mean the column exist with data already added to each row thus allocated.

    I have a dummyColumn allowing NULL so I presume I already have a null bitmap allocated in the row (I was pondering whether the new NOT NULL column required reallocating space for more bits in the null bitmap but I assume the smallest null bitmap would be a byte so that should not be the case - I may be completely wrong how the null bitmap is allocated, just a guess).

    Regards,

    Timo

    If you re-read my post, I'm not privy as to "Why" either.  You can see it if you add "avg_record_size_in_bytes" to your query that uses sys.dm_db_index_physical_stats.  Based on that, I formed mearly a hypothesis of what's happening that will need to be proven.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply