January 17, 2003 at 1:15 pm
In the nvarchar and nvarbinary data types, storage is 2 * n characters. Does NULL take any storage space? Does it increase the size of an index?
Might seem like a silly question, but the amount of bytes we backup is a constant concern with this project. Adding hardware is not an option.
Steve Miller
Steve Miller
January 17, 2003 at 3:03 pm
Ok when the data is stored and looked at in hex at the page level there are a few things that occurr.
First the row contains a header that tells if any variable length columns exist in the row with data. NULLable variable length columns if left out will produce header mark for no variable length columns in data.
Next all the fixed length data is stored in the fron of the row under the hood and the variable at the back. Even though you define a table as
col1 int
col2 nvarchar(5)
col3 varchar(5)
col4 char(5)
col5 nvarchar(8)
it will be stored as
col1 int
col4 char(5)
col2 nvarchar(5)
col3 varchar(5)
col5 nvarchar(8)
In order in the table, bet some didn't realize that but that makes storage easier.
Now if there is data in the variable length columns of course the header has the marker set to indentify that.
Then this is what transpiers.
Say I had two nvarchar columns in my table.
And in one insert I insert into the value for only the first.
Reading the data you see a column count marker at the point of the variable length columns being added on.
This column counter is 2 extra bytes long not accounted for in the setup of the table.
Next you see the offset in the row of data for the end of the column.
Also this is 2 bytes long and not counted for in your table def.
SO going further, if you populate both variable length columns you get the column counter showing 2 and then the next 2 bytes represent col1s offset and the next 2 bytes after that represent col2s offset.
In other words you have 6bytes extra plus the data.
Finally the really neat part. If say varible length col1 is NULL and col2 is filled in then you get the following situation.
You have column count 2, col1 offset and col2 offset. But col1 offset points to the position right after col2s offset value. In other words the column count lets the DB engine know to expect 4 bytes for 2 columns then the col1 offset right after that returns a zero characters or length or in this case NULL. And the offset for col2 of course combined with col1s offset tells column 2s data.
So if I created a table like so
CREATE TABLE [Table1] (
[col1] [int],
[col2] [nvarchar] (5),
[col3] [int],
[col4] [char] (10),
[col5] [nvarchar] (3)
)
and fill only the fixed lenght items in I use a length of the byte lengths of those combined plus the header bytes which are 4 and an additional 3 bytes I haven't figured out completely yet.
But if you fill out col2 only in the described table you get
total length of fixed items
header
piece i am not sure about
column count bytes
col2 offset bytes
and actual data length of col2 entered
if you fill out both col2 and col5 then you get
total length of fixed items
header
piece i am not sure about
column count bytes
col2 offset bytes
col5 offset bytes
actual data length of col2 entered
and actual data length of col5 entered
then if you fill out col5 but not col2 you get
total length of fixed items
header
piece i am not sure about
column count bytes
col2 offset bytes
col5 offset bytes
col2 length = 0 bytes (no data no bytes)
and actual data length of col5 entered
So keeping that in mind if you really want to pinch space out make sure your column order is most used variable length column first to least used. And as a gotcha say you add a NON-NULL variable length column to the end. Your column count will include any NULLable that have not been field. So NON-NULL should be before nullable if you are byte conscience.
Hopefully that makes sense and sheds some light on under the hood.
January 20, 2003 at 12:37 pm
OK Antares
"Ok when the data is stored and looked at in hex at the page level there are a few things that occurr."
Do you have any special tools used to view the MDF at the physical level ?
Is there any documentation on the physical structure ?
I'm just curious, I used to know IMS structures (yea I'm an old guy).
KlK, MCSE
KlK
January 20, 2003 at 3:36 pm
DBCC PAGE is what I use. Then just a lot of time figuring the data out.
January 20, 2003 at 4:55 pm
If its just backup space that concerns you, you should take a look at both SQLZip and SQL Litespeed, both advertise here on the site. You can really cut the amount of space needed for backup. Steve Jones has a review of Litespeed, I reviewed SQLZip a while back.
Andy
January 21, 2003 at 12:10 pm
Antares, thanks for sharing your research.
>> In other words you have 6 bytes extra plus the data.
If I'm reading you right, 2 of the 6 bytes is in the column header, 2 is is for one column, and 2 is for the other column.
So if I have one nvarchar in the table, I would add 4 bytes to the table, plus whatever is stored in the columns. Right?
>> If its just backup space that concerns you
Andy, backup space does concern us, but it is broader than that. We're using MSDE as a object repository for a C++/C# front end. The software will be distributed primarily on stand-alone machines all over the world. We don't have the ability to control the hardware it runs on, and much of the hardware will be old. We try to get every ounce of performance we can.
Steve
Steve Miller
January 21, 2003 at 3:59 pm
Actually it is an extra 4 bytes for the row header. Plus 2 bytes for the variable length column count in the row. Plus with a single nvarchar 2 bytes for the offset for the end of the data for the column. Plus 1 byte for an end of record or something that seems to always be 00 that I have not figured out.
So if you have 1 nvarchar column and an int identity field (int is 4 bytes) in the table it would be
4+4+2+2+1+(lengthofdatainnvarcharfield*2)
OR
RecordHeader+INTLen+ColCount+Col1Offset+(lengthofdatainnvarcharfield*2)
If NVARCHAR field is null you would have
4+4+1
Or
RecordHeader+INTLen+EndRec
due to no non-null variable length column and the header would note it as such.
Hope that makes sense.
Edited by - antares686 on 01/21/2003 4:04:51 PM
January 21, 2003 at 4:19 pm
I think it makes sense.
Bottom line for me is this: those records that have a null value in the nvarchar column will not eat any storage. That's what I wanted, and I'm happy.
Thanks,
Steve
Steve Miller
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply