August 8, 2017 at 9:31 am
Hi everybody,
I am creating a table that will be the basis of a fact table for a cube. I have 3.1 billion rows ready for it and, if all goes well, it will take 1.6 million entries each day.
As a consequence, I want to make the table as compact as possible. It has 4 columns — the first column is a bigint and the primary key, the second and third column are ints and the fourth column has the datatype date. All columns are not nullable. The table is very simple. It has a clustered index from the primary key and no non-clustered indexes. There are no foreign keys, constraints (except NOT NULL) etc.
I expected that each row would in or around 19 bytes long. Instead, each row is 26 bytes long. What is in this extra 7 bytes and can I recover it in some way?
Furthermore, I expected to get 426 rows (8096/19) per full page. Instead, I'm getting 289 and the «avg_page_space_used_in_percent» is 99.92%. Even 311 (8096/26) rows would be better.
I had thought that it might have something to do the fill factor, but it is 0 (and fill factor only affects non-clustered indexes, doesn't it? and there are none).
I am only looking at the leaf-level pages of the clustered index.
I have recreated my table in a script below and it shows where I am getting my numbers from. Am I missing something obvious?
I would be happy if someone could answer these questions, point out the error of my ways or, at least, point me in the right direction.
Many thanks,
Sean Redmond.
-- If I have a table with 4 columns — bigint, int, int, date — why is the average row size much greater than 19 bytes?
-- 0. create necessary objects
use master
go
create database gosh
on ( name = 'gosh', filename = '«your path here»\gosh.mdf', size = 32MB, filegrowth = 8MB )
log on ( name = 'gosh_log', filename = '«your path here»\gosh_log.ldf', size = 8MB, filegrowth = 8MB );
go
use gosh
go
create table dbo.RowSizes( RowSizesID bigint not null primary key, anInt int not null, anotherInt int not null, aDate date not null );
go
-- 2. populate the table with a million entries
insert into dbo.RowSizes( RowSizesID, anInt, anotherInt, aDate )
select top 1000000
row_number() over(order by a.id)
, (abs(a.id) - row_number() over(order by a.id))
, (abs(a.id) + row_number() over(order by a.id))
, dateadd( minute, row_number() over(order by a.id)*9, '20000101' )
from master.dbo.syscolumns a
cross join master.sys.columns b
;
go
--3. how much spaced is used. The query is from Itzik Ben-Gan.
declare @db varchar(10) = 'gosh';
declare @object varchar(50) = 'dbo.RowSizes';
select index_type_desc
, index_depth
, index_level
, page_count
, record_count
, avg_page_space_used_in_percent
, avg_fragmentation_in_percent
from sys.dm_db_index_physical_stats( db_id(@db), object_id(@object, 'u'), null, null, 'detailed' );
exec dbo.sp_spaceused @objname = @object, @updateusage = true;
go
/*
index_type_desc index_depth index_level page_count record_count avg_page_space_used_in_percent
CLUSTERED INDEX 3 0 3461 1000000 99.92
CLUSTERED INDEX 3 1 16 3461 45.40
CLUSTERED INDEX 3 2 1 16 03.33
name rows reserved data index_size unused
dbo.RowSizes 1000000 27976 KB 27688 KB 144 KB 144 KB
*/
select '1. Theoretical No. of rows per page' Ting , cast(8096/19 as int) Answer union -- 426. 8096: available space in bytes on a page; 19 = 8+4+4+3 (Theoretical row-size)
select '2. Actual Avg. No. of rows per page' , cast(1000000/3461 as int) union -- 288
select '3. Theoretical No. of rows per page' Ting , 19 union -- 8+4+4+3
select '4. Actual No. of Bytes per Row' , (cast(27688 as int)*1024)/1000000 -- 28
-- housekeeping
use master
go
drop database gosh;
go
August 8, 2017 at 9:38 am
The extra bytes are row overhead, for SQL to manage the rows, and, no, you can't recover any of that space. That's actually rather low, so I suspect 7 bytes is the min overhead.
As to fullness, you should explicitly indicate freespace, but yes, it must have been 0 because 99.92% is extremely full already. SQL doesn't split rows across pages (too I/O inefficient), of course, so only full rows can fit. Still, 289 does seem low. Perhaps not loaded in order / page splits??
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
August 10, 2017 at 6:03 am
Sean Redmond - Tuesday, August 8, 2017 9:31 AMI expected that each row would in or around 19 bytes long. Instead, each row is 26 bytes long. What is in this extra 7 bytes and can I recover it in some way?
Furthermore, I expected to get 426 rows (8096/19) per full page. Instead, I'm getting 289 and the «avg_page_space_used_in_percent» is 99.92%. Even 311 (8096/26) rows would be better.
I had thought that it might have something to do the fill factor, but it is 0 (and fill factor is only affects non-clustered indexes, doesn't it? and there are none).
The extra bytes are row overhead, for SQL to manage the rows, and, no, you can't recover any of that space. That's actually rather low, so I suspect 7 bytes is the min overhead.
As to fullness, you should explicitly indicate freespace, but yes, it must have been 0 because 99.92% is extremely full already. SQL doesn't split rows across pages (too I/O inefficient), of course, so only full rows can fit. Still, 289 does seem low. Perhaps not loaded in order / page splits??
Maybe helpfull, mayby not.
;
;WITH
aa as (select
rowsizesid
,convert (binary (4), reverse (substring (t.%%physloc%%, 1, 4))) page
,convert (binary (2), reverse (substring (t.%%physloc%%, 5, 2))) file_id
,convert (binary (2), reverse (substring (t.%%physloc%%, 7, 2))) slot
from
rowsizes t
),
HH as (select page, MAX(slot) max_slot, CONVERT(int,max(slot)) aantal from aa group by page)
,HH2 as (select aantal, COUNT(*) tel from HH group by aantal)
-- select top 500 * from AA
-- select top 500 * from HH
select * from HH2 order by tel
Selection on AA gives for each row the location
Selection on HH gives for each page the number of rows.
selection on HH2 gives de statistics for the number of rows in pages.
I did run your data and all pages but one contained 191 rows, this is a fair indication that this is the maximum number of rows which can be fitted.
Because it is a clustered table, there can be no forwarded rows. (They are always heap I think).
Page split as an action does not influence the number of pages.
A 'split' caused by a forwarded row does not apply here.
Obsolete here, but for future reverence, here is some code which shows the forwarded_row count and some sizes
select
forwarded_record_count,
page_count,
avg_page_space_used_in_percent,
record_count,
min_record_size_in_bytes,
max_record_size_in_bytes,
avg_record_size_in_bytes,
DB_NAME(database_id) as databasename,
OBJECT_ID,
object_name(OBJECT_ID, database_id) as table_name,
index_type_desc
From
sys.dm_db_index_physical_stats (DB_ID(N'testdb'),
Object_id(N'dbo.rowsizes'), null, null, 'DETAILED')
And does anybody have code to actually read or dump a page of data?print convert(int, 0x000166FA)
DBCC TRACEON (3604);
dbcc page (testdb, 1, 91898,1 ) -- The page number should determined for your system.
This produces a dump of the data, third parameter can be 0/1/2/3 see documentation to see which to use. (1 gives a per row dump).
Hope this helps.
Ben
August 10, 2017 at 6:13 am
2 bytes for the row header, 2 bytes for the slot array index, 2 bytes for the null bitmap offset, then one bit per column (minimum 1 byte) for the null bitmap itself.
That is indeed the minimum number of bytes that goes into the row structure itself.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 10, 2017 at 6:35 am
GilaMonster - Thursday, August 10, 2017 6:13 AM2 bytes for the row header, 2 bytes for the slot array index, 2 bytes for the null bitmap offset, then one bit per column (minimum 1 byte) for the null bitmap itself.That is indeed the minimum number of bytes that goes into the row structure itself.
The 7 (2 + 2+ 2 +1) bytes plus :
Slot 1 Column 1 Offset 0x4 Length 8 Length (physical) 8
RowSizesID = 2
Slot 1 Column 2 Offset 0xc Length 4 Length (physical) 4
anInt = 1073624920
Slot 1 Column 3 Offset 0x10 Length 4 Length (physical) 4
anotherInt = 1073624924
Slot 1 Column 4 Offset 0x14 Length 3 Length (physical) 3
Makes 26 bytes.
26*191 = 4966
This still leaves around 3 K for header and other information.
(And the slot table is already included. So a large chunck is still not explained).
(40 bytes for each row were reported. 191 * 40 = 7640. But 40 is significantly more than explained above).
Ben
August 10, 2017 at 7:45 am
ScottPletcher - Tuesday, August 8, 2017 9:38 AMSean Redmond - Tuesday, August 8, 2017 9:31 AMI expected that each row would in or around 19 bytes long. Instead, each row is 26 bytes long. What is in this extra 7 bytes and can I recover it in some way?
Furthermore, I expected to get 426 rows (8096/19) per full page. Instead, I'm getting 289 and the «avg_page_space_used_in_percent» is 99.92%. Even 311 (8096/26) rows would be better.
I had thought that it might have something to do the fill factor, but it is 0 (and fill factor is only affects non-clustered indexes, doesn't it? and there are none).The extra bytes are row overhead, for SQL to manage the rows, and, no, you can't recover any of that space. That's actually rather low, so I suspect 7 bytes is the min overhead.
As to fullness, you should explicitly indicate freespace, but yes, it must have been 0 because 99.92% is extremely full already. SQL doesn't split rows across pages (too I/O inefficient), of course, so only full rows can fit. Still, 289 does seem low. Perhaps not loaded in order / page splits??
Page split as an action does not influence the number of pages.
Ben
Hmm, I would think it would have too, especially when it first occurs. By definition, a page split is one page becoming two, right?
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
August 10, 2017 at 7:49 am
Does the database have read_committed_snapshot or allow_snapshot_isolation enabled on it?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 10, 2017 at 7:51 am
ben.brugman - Thursday, August 10, 2017 6:03 AMScottPletcher - Tuesday, August 8, 2017 9:38 AMSean Redmond - Tuesday, August 8, 2017 9:31 AMI expected that each row would in or around 19 bytes long. Instead, each row is 26 bytes long. What is in this extra 7 bytes and can I recover it in some way?
Furthermore, I expected to get 426 rows (8096/19) per full page. Instead, I'm getting 289 and the «avg_page_space_used_in_percent» is 99.92%. Even 311 (8096/26) rows would be better.
I had thought that it might have something to do the fill factor, but it is 0 (and fill factor is only affects non-clustered indexes, doesn't it? and there are none).The extra bytes are row overhead, for SQL to manage the rows, and, no, you can't recover any of that space. That's actually rather low, so I suspect 7 bytes is the min overhead.
As to fullness, you should explicitly indicate freespace, but yes, it must have been 0 because 99.92% is extremely full already. SQL doesn't split rows across pages (too I/O inefficient), of course, so only full rows can fit. Still, 289 does seem low. Perhaps not loaded in order / page splits??Page split as an action does not influence the number of pages.
Ben
Hmm, I would think it would have too, especially when it first occurs. By definition, a page split is one page becoming two, right?
If there had been page splits, the avg percent space used would not be in the high 90% range, as there would be pages with half the page unused.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 10, 2017 at 8:00 am
Hi Gail,
In answer to your question, neither 'read_committed_snapshot' nor 'allow_snapshot_isolation' are enabled. It is at 'Read Committed'.
I haven't gotten around to looking at Ben's queries yet.
Thanks very much for the explanation about the 7 bytes overhead per row. It is good to know.
Sean.
August 10, 2017 at 8:00 am
ScottPletcher - Thursday, August 10, 2017 7:45 AMben.brugman - Thursday, August 10, 2017 6:03 AMScottPletcher - Tuesday, August 8, 2017 9:38 AMSean Redmond - Tuesday, August 8, 2017 9:31 AMI expected that each row would in or around 19 bytes long. Instead, each row is 26 bytes long. What is in this extra 7 bytes and can I recover it in some way?
Furthermore, I expected to get 426 rows (8096/19) per full page. Instead, I'm getting 289 and the «avg_page_space_used_in_percent» is 99.92%. Even 311 (8096/26) rows would be better.
I had thought that it might have something to do the fill factor, but it is 0 (and fill factor is only affects non-clustered indexes, doesn't it? and there are none).The extra bytes are row overhead, for SQL to manage the rows, and, no, you can't recover any of that space. That's actually rather low, so I suspect 7 bytes is the min overhead.
As to fullness, you should explicitly indicate freespace, but yes, it must have been 0 because 99.92% is extremely full already. SQL doesn't split rows across pages (too I/O inefficient), of course, so only full rows can fit. Still, 289 does seem low. Perhaps not loaded in order / page splits??Page split as an action does not influence the number of pages.
BenHmm, I would think it would have too, especially when it first occurs. By definition, a page split is one page becoming two, right?
If there had been page splits, the avg percent space used would not be in the high 90% range, as there would be pages with half the page unused.
I said "perhaps" page splits could account for some of the extra pages. Either way, we still have the mystery of the alleged 99.9+% fill factor but so few rows per page. Something is still missing to fully explain what's going on with this table. That would be many more rows than is typically deleted from a table.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
August 10, 2017 at 8:09 am
Sean Redmond - Thursday, August 10, 2017 8:00 AMHi Gail,
In answer to your question, neither 'read_committed_snapshot' nor 'allow_snapshot_isolation' are enabled. It is at 'Read Committed'.
I haven't gotten around to looking at Ben's queries yet.
Thanks very much for the explanation about the 7 bytes overhead per row. It is good to know.
Sean.
To be clear, I was asking about the database options, not the session's isolation level. Is that what you checked?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 10, 2017 at 8:20 am
GilaMonster - Thursday, August 10, 2017 8:09 AMSean Redmond - Thursday, August 10, 2017 8:00 AMHi Gail,
In answer to your question, neither 'read_committed_snapshot' nor 'allow_snapshot_isolation' are enabled. It is at 'Read Committed'.
I haven't gotten around to looking at Ben's queries yet.
Thanks very much for the explanation about the 7 bytes overhead per row. It is good to know.
Sean.To be clear, I was asking about the database options, not the session's isolation level. Is that what you checked?
For the former two, I right-clicked the DB in question, went to Options » Miscellaneous, and saw that the values for 'read_committed_snapshot' and 'allow_snapshot_isolation' are set to false.
For the isolation level, I took the value from dbcc useroptions.
August 10, 2017 at 8:44 am
ScottPletcher - Thursday, August 10, 2017 7:45 AMben.brugman - Thursday, August 10, 2017 6:03 AMPage split as an action does not influence the number of pages.
BenHmm, I would think it would have too, especially when it first occurs. By definition, a page split is one page becoming two, right?
Sorry, my bad.
My formulation was wrong. Offcourse a page split (as an action) does increase the number of pages. In the given example it doesn't influence the fill off the pages. So the page split does not account for the limited number of rows that can be stored in a page. I did not express myself correctly, sorry.
Ben
August 10, 2017 at 11:23 am
Sean Redmond - Thursday, August 10, 2017 8:20 AMGilaMonster - Thursday, August 10, 2017 8:09 AMSean Redmond - Thursday, August 10, 2017 8:00 AMHi Gail,
In answer to your question, neither 'read_committed_snapshot' nor 'allow_snapshot_isolation' are enabled. It is at 'Read Committed'.
I haven't gotten around to looking at Ben's queries yet.
Thanks very much for the explanation about the 7 bytes overhead per row. It is good to know.
Sean.To be clear, I was asking about the database options, not the session's isolation level. Is that what you checked?
For the former two, I right-clicked the DB in question, went to Options » Miscellaneous, and saw that the values for 'read_committed_snapshot' and 'allow_snapshot_isolation' are set to false.
For the isolation level, I took the value from dbcc useroptions.
Ok, so it's not the row version pointers. Very curious.
What did DBCC Page dump out?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 11, 2017 at 6:05 am
GilaMonster - Thursday, August 10, 2017 11:23 AMOk, so it's not the row version pointers. Very curious.
What did DBCC Page dump out?
The dump out (on my system) shows rows each containing 40 bytes.
I did an update to make identification a bit more easy.
Update rowsizes set anotherInt = 313249263 -- (= efcdab12)
Update rowsizes set anInt = 286331153 -- (= 11111111)
Update rowsizes set aDate = '1900-01-01'
Update rowsizes set aDate = '1900-01-02' where rowsi
So this makes anotherInt and anInt identifiable.
The date is less clear.
And there is probably a 'change/version' number. (number of bytes ???)
It is clear that there are 40 bytes in each row, the offset tabele contains 2 (?) bytes for each row.
What the 40 bytes consist off is not totally clear.
For a part of the dump see below.
All code to create the dump is present in this thread. (So for a complete dump use the code supplied).
Ben
The dump: (selected parts)
--
-------------------------------------------------------------------------------------------
PAGE: (1:91898)
BUFFER:
BUF @0x0000000095FBE240
bpage = 0x0000000095506000 bhash = 0x0000000000000000 bpageno = (1:91898)
bdbid = 8 breferences = 0 bcputicks = 0
bsampleCount = 0 bUse1 = 44104 bstat = 0xc0010b
blog = 0x79797979 bnext = 0x0000000000000000
PAGE HEADER:
Page @0x0000000095506000
m_pageId = (1:91898) m_headerVersion = 1 m_type = 1
m_typeFlagBits = 0x4 m_level = 0 m_flagBits = 0x2000
m_objId (AllocUnitId.idObj) = 6545 m_indexId (AllocUnitId.idInd) = 256
Metadata: AllocUnitId = 72057594466861056
Metadata: PartitionId = 72057594374389760 Metadata: IndexId = 1
Metadata: ObjectId = 1017874793 m_prevPage = (0:0) m_nextPage = (1:91903)
pminlen = 23 m_slotCnt = 192 m_freeCnt = 32
m_freeData = 7776 m_reservedCnt = 0 m_lsn = (6675:15802:2)
m_xactReserved = 0 m_xdesId = (0:0) m_ghostRecCnt = 0
m_tornBits = -1285646452
Allocation Status
GAM (1:2) = ALLOCATED SGAM (1:3) = NOT ALLOCATED
PFS (1:88968) = 0x60 MIXED_EXT ALLOCATED 0_PCT_FULL DIFF (1:6) = CHANGED
ML (1:7) = NOT MIN_LOGGED
Slot 0 Offset 0x60 Length 40
Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP VERSIONING_INFO
Record Size = 40
Memory Dump @0x000000001124C060
0000000000000000: 50001700 01000000 00000000 11111111 †P...............
0000000000000010: efcdab12 5b950a04 00009048 01000100 †ïë.[•.....H....
0000000000000020: 00008b01 00000000 †††††††††††††††††††..‹.....
Version Information =
Transaction Timestamp: 395
Version Pointer: (file 1 page 84112 currentSlotId 0)
Slot 0 Column 1 Offset 0x4 Length 8 Length (physical) 8
RowSizesID = 1 (=01000000)
Slot 0 Column 2 Offset 0xc Length 4 Length (physical) 4
anInt = 286331153 (=11111111)
Slot 0 Column 3 Offset 0x10 Length 4 Length (physical) 4
anotherInt = 313249263 (=efcdab12)
Slot 0 Column 4 Offset 0x14 Length 3 Length (physical) 3
aDate = 1900-01-01 (=8b01 + another byte)
Slot 0 Offset 0x0 Length 0 Length (physical) 0
KeyHashValue = (1b7fe5b8af93)
Slot 1 Offset 0x88 Length 40
Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP VERSIONING_INFO
Record Size = 40
Memory Dump @0x000000001124C088
0000000000000000: 50001700 02000000 00000000 11111111 †P...............
0000000000000010: efcdab12 5c950a04 0000e07e 01000100 †ïë.\•....à ~....
0000000000000020: 40008c01 00000000 †††††††††††††††††††@.Œ.....
Version Information =
Transaction Timestamp: 396
Version Pointer: (file 1 page 98016 currentSlotId 64)
Slot 1 Column 1 Offset 0x4 Length 8 Length (physical) 8
RowSizesID = 2
-- etc etc.
--------------------------------------------------------------------------
OFFSET TABLE:
Row - Offset
191 (0xbf) - 7736 (0x1e38)
190 (0xbe) - 7696 (0x1e10)
189 (0xbd) - 7656 (0x1de8)
188 (0xbc) - 7616 (0x1dc0)
187 (0xbb) .......
......
...... 416 (0x1a0)
7 (0x7) - 376 (0x178)
6 (0x6) - 336 (0x150)
5 (0x5) - 296 (0x128)
4 (0x4) - 256 (0x100)
3 (0x3) - 216 (0xd8)
2 (0x2) - 176 (0xb0)
1 (0x1) - 136 (0x88)
0 (0x0) - 96 (0x60)
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply