Why is the MS calculation for SQL Server table size not reflect Storage in SSMS

  • Hi there

    We have been referring to the following article, in order to work out the size of a table in SQL Server

    https://docs.microsoft.com/en-us/sql/relational-databases/databases/estimate-the-size-of-a-heap?view=sql-server-ver15

    Now summarising the calculation we have the following:

    Null_Bitmap:                               2 + ((Num_Cols + 7) / 8)

    Fixed_Data_Size:                       Real (4 Bytes) + DateTime( 6 Bytes)

    Row Size:                                      Fixed_Data_Size + Variable_Data_Size + Null_Bitmap + 4

    Rows_Per_Page:                        8096 / (Row_Size + 2)

    Num_Pages:                                Num_Rows / Rows_Per_Page

    Heap_Size:                                   8192 X Num_Pages

    Now we used a table that we have created with 10 million records

    I cant post the whole table and data but here is the DDL for it

    CREATE TABLE [dbo].[SIData_222222_88_20211103_1436](

    [ReadingDateTime] [datetime2](2) NULL,

    [ReadingValue] [real] NULL

    ) ON [PRIMARY]

    GO

    Now using the calculation above, we arrived at a table size of 193.52 Meg

    Value (Bytes)

    Num_Columns 2

    Num_Rows 10,000,000

    Null_Bitmap 3.125

    Fixed_Data_Size 10

    Variable_Data_Size 0

    Row Size 17.125

    Rows_Per_Page 423.3202614

    Num_Pages 23,623

    Heap Size (Bytes) 193517786.6

    Heap size (meg) 193.52

    however referecing sp_SpaceUsed we get figures of

    -- Data = 386480 KB 377.421875 MB (in binary)

    -- SSMS (Storage) 377.422 MB

    So SSMS is reporting the size of this table to be 183 Meg more than our calculations.

    Why is this the case?

    Which is the correct figure?

    Attached are out calcualtions

    Attachments:
    You must be logged in to view attached files.
  • no clustered index on that table? or columnstore?

    if not then it is a a heap so see what the resulting size is after you issue "alter table [dbo].[SIData_222222_88_20211103_1436] rebuild"

    if it does have a clustered index (you didn't put it on your create statement) then instead rebuild the clustered index "ALTER INDEX [indexname] ON [dbo].[SIData_222222_88_20211103_1436] REBUILD PARTITION = ALL"

    also note that not having a primary key (and associated clustered index) will add a 8 byte column to serve as unique row id - see https://docs.microsoft.com/en-us/sql/relational-databases/indexes/heaps-tables-without-clustered-indexes?view=sql-server-ver15

  • There is no clustered index on the table or columnstore

    After doing the Rebuild I got a size  318.883 MB     so thats way above my calculation

     

     

  • Would you script out the table to make sure no other column "snuck" in there?

    And please verify the row count:

    SELECT SUM(rows) AS row_count
    FROM sys.partitions
    WHERE object_id = OBJECT_ID('dbo.SIData_222222_88_20211103_1436') AND
    index_id = 0

    I assume it's not 10M exactly.

     

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Because you didn't follow the instructions in the write-up for when to round down, your final value of 193MB is off by abou 10MB.  It should have been 183.398438 MB.

    That, however, doesn't explain the huge difference of 377.422 MB

    When I used the following command on a 10 Million Row populated copy of your table...

     sp_spaceused 'dbo.SIData_222222_88_20211103_1436'

    ...  it produced the following output with the difference being because the bloody thing is using KB math instead of MB/page math.

    With all that, I'll say that it can only be a PICNIC problem. 😀  What is the exact code that you used to return the size of the table and what is the exact output as above?

    My bet is that you accidentally populated the table twice and that it actually contains 20 Million rows instead of just 10 million.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • sure the table DDL is

    CREATE TABLE [dbo].[SIData_222222_88_20211103_1436](

    [ReadingDateTime] [datetime2](2) NULL,

    [ReadingValue] [real] NULL

    ) ON [PRIMARY]

    On running your command i get the following for rowcount

    10000000

     

  • Hi Jeff

    I also used

    sp_spaceused 'dbo.SIData_222222_88_20211103_1436' but got the following stats

    e

    I used that to check against a manual calculation. If you look at the excel attachment to this thread in the opening post,

    i have put the calculation in there.

  • can you run the following - sp_spaceuses sometimes reports wrong sizes

     

    select t.NAME as tablename
    , s.Name as schemaname
    , p.rows
    , sum(a.total_pages) * 8 as totalspacekb
    , cast(round(((sum(a.total_pages) * 8) / 1024.00), 2) as numeric(36, 2)) as totalspacemb
    , sum(a.used_pages) * 8 as usedspacekb
    , cast(round(((sum(a.used_pages) * 8) / 1024.00), 2) as numeric(36, 2)) as usedspacemb
    , (sum(a.total_pages) - sum(a.used_pages)) * 8 as unusedspacekb
    , cast(round(((sum(a.total_pages) - sum(a.used_pages)) * 8) / 1024.00, 2) as numeric(36, 2)) as unusedspacemb
    from sys.tables t
    inner join sys.indexes i
    on t.object_id = i.object_id
    inner join sys.partitions p
    on i.object_id = p.object_id
    and i.index_id = p.index_id
    inner join sys.allocation_units a
    on p.partition_id = a.container_id
    left outer join sys.schemas s
    on t.schema_id = s.schema_id
    where t.name = 'SIData_222222_88_20211103_1436'
    group by t.Name
    , s.Name
    , p.Rows
  • I ran that and got a totalspacemb of 318.95 MB

    Attachments:
    You must be logged in to view attached files.
  • I ran that and got a totalspacemb of 318.95

    Attachments:
    You must be logged in to view attached files.
  • Crud, I went to hit "Reply" on one of the posts above and hit "Report", instead.  I've sent the WebMaster a request to undo that.

    In the mean time, I have two things to run to see what's going on.

    In the first bit of code, verify that what is reported for datatypes is actually what you expected.

     sp_help 'dbo.SIData_222222_88_20211103_1436'

    You should see the following in the second section.  I know it seems obvious but this is a strange problem and so I'm doing things "Strictly by the numbers" and "peeling one potato at a time".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Here's the next thing that could be a possibility... Run this code in the database where the table is at.

     SELECT is_read_committed_snapshot_on
    FROM sys.databases
    WHERE database_id = DB_ID()
    ;

    That's kind of me grasping at straws because "RCSI" won't usually matter until you update a row, which will cause a (IIRC) 14 byte bit of row versioning to be added to each row, but it's worthwhile checking.

    You might also want to check the average page fullness using sys.dm_db_index_physical_stats.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Excellent idea.  That gets much closer to the given size.

    There's one very minor miscalc above, in that I believe page data can never exceed an absolute max of 8060 bytes, so this:

    Rows_Per_Page:                        8096 / (Row_Size + 2)

    should be:

    Rows_Per_Page:                        8060 / (Row_Size + 2)

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Jeff Moden wrote:

    Here's the next thing that could be a possibility... Run this code in the database where the table is at.

     SELECT is_read_committed_snapshot_on
    FROM sys.databases
    WHERE database_id = DB_ID()
    ;

    That's kind of me grasping at straws because "RCSI" won't usually matter until you update a row, which will cause a (IIRC) 14 byte bit of row versioning to be added to each row, but it's worthwhile checking.

    You might also want to check the average page fullness using sys.dm_db_index_physical_stats.

    that's it I would say.

    I've done a test db were I created the table with all required rows , then enabled RCSI and then copied from the table onto a new one - new one after RCSI has size significantly increased

    sql_rcsi

  • frederico_fonseca wrote:

    Jeff Moden wrote:

    Here's the next thing that could be a possibility... Run this code in the database where the table is at.

     SELECT is_read_committed_snapshot_on
    FROM sys.databases
    WHERE database_id = DB_ID()
    ;

    That's kind of me grasping at straws because "RCSI" won't usually matter until you update a row, which will cause a (IIRC) 14 byte bit of row versioning to be added to each row, but it's worthwhile checking.

    You might also want to check the average page fullness using sys.dm_db_index_physical_stats.

    that's it I would say.

    I've done a test db were I created the table with all required rows , then enabled RCSI and then copied from the table onto a new one - new one after RCSI has size significantly increased

    sql_rcsi

    Let's hope that we don't have to wait until after Thanksgiving to find out from the OP.   Thanks for doing the test, Frederico.  It looks a bit like an exact match for the problem. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply