Potentially stupid question

  • If a table has a column which is varchar(1000), yet all of the text within that column is only 20 characters long, is space being wasted in the database? 

    In other words, are only the 20 characters, (or # bytes) actually taking up space or does the declaration of varchar(1000) actually add to the space (or # bytes) being used? 

    Also, what utilities would need to be run to shrink any extra space taken up? 

    I need a good 'technical' reply and if possible, a URL to some microsoft explanation.  This is beyond my realm of expertise to even know how to ask this correctly.   

    I will appreciate your compassion with my ignorance. 

    I wasn't born stupid - I had to study.

  • From the SQL Server Books on Line: Read the sentances regarding "Storage size"

    To navigate directly to this subject, open BOL and from the menu select "go" then "url" and enter

    mk:@MSITStore:C:\Program%20Files\Microsoft%20SQL%20Server\80\Tools\Books\tsqlref.chm::/ts_ca-co_7tpu.htm

    char and varchar

    Fixed-length (char) or variable-length (varchar) character data types.

    char[(n)]

    Fixed-length non-Unicode character data with length of n bytes. n must be a value from 1 through 8,000. Storage size is n bytes. The SQL-92 synonym for char is character.

    varchar[(n)]

    Variable-length non-Unicode character data with length of n bytes. n must be a value from 1 through 8,000. Storage size is the actual length in bytes of the data entered, not n bytes. The data entered can be 0 characters in length. The SQL-92 synonyms for varchar are char varying or character varying.

    SQL = Scarcely Qualifies as a Language

  • Varchar doesn't pad the entries with extra data like spaces. Char datatypes do. You shouldn't have a problem with 'unused' space with varchar columns.

    Search for "Using char and varchar Data" in Books Online for more/better info.

  • I read that in Books Online too.  But its not quite accurate.  In Kalen Delaney's book, Inside SQL Server 2000 (pp231-232), she states -

    "A row with variable-length columns requires special offset entries in order to be internally maintained."

    I didn't find a definition for the offset entry, but I found this in an example she gives -

    "... And a 2-byte overhead exists for each of nine varchar columns..."

    So, if your data is exactly 20 characters, a char(20) would be cheaper.  If the data varies up to 20 characters, your varchar(1000) MIGHT be cheaper, depending on how many blank bytes would be stored in a char(20).

    Steve

  • A Variable length character string uses only enough bytes to actually hold the data. You can use the DataLength fuction to retrun the actual number of bytes used. Len() can be used to determine the number of characters in a string.

    DECLARE @VarChar varchar(1000)

    DECLARE @CharStr(20) --fixed length

    Set @VARChar='123'

    --DATALENGTH RETURNS THE NUMBER OF BYTES USED

    Select DATALENGTH (@VARChar)

    SELECT DATALENGTH(@CharStr) --always 20 Bytes with ASCII code, If UNICODE is used will require 40 bytes

    If you are using an ASCII character set rather than UNICODE TSQL limits you to a string of 8000 characters. VARChar removes all trailing spaces from a string. Leading spaces and embedded spaces are included and count towards you 8000 character limit.

    HTH Mike

    Returns 3

  • the storage requirements of a varchar are the length of the data + some space to store info to work out how long/ where that data is (an offset/length - a number ie)

    a char doesn't need this since it's fixed length - so that length is your offset/length

    ie. if the data isn't fixed length, the db needs to be able to work out how long it is

     

     

     

  • Argh! I lost a really long post explaining this in detail. Anyway, the gist of the post was that like hoo-t says, rows containing varying size columns are stored with an offset array containing 2-bytes for every varying size column, which specifies the offset on the row where the data of each of these columns start. Here is some cleaned up data from DBCC PAGE that shows one row from a table foo (id int, text char(20)) and one from table bar (id int, text varchar(1000)):

    Row from table foo

    001c0010 00000001 61616161 61616161

    61616161 61616161 61616161 000002

    Row from table bar

    00080030 00000001 01000002 61002300

    61616161 61616161 61616161 61616161

    616161

    It is not really necessary to understand what all this means exactly. But you can see that the row from foo requires 31 bytes storage (there are some extra bytes for the row header, including a null bitmap) and the one from bar requires 35 bytes. The extra 4 bytes are the 2 bytes specifying the offset for the varying size column, and 2 bytes specifying where the offset array itself is placed.

    So, the extra storage needed for a table with variying size columns is 2 bytes per column, plus another 2 bytes total.

  • Farrell,  There are no stupid questions, only dumb terminals

    But seriously, knowing where to find the answer to a question is half the battle and this forum always provides the answers, as demonstrated above.

    I'll stop now.....

  • I forgot to mention (in my repost of the lost post) that the rows in each table are inserted as id=1 and text='aaaaaaaaaaaaaaaaaaaa'

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply