which one is better, Char and More Pages Or Varchar and High Fragmentation ?

  • Hi all,

    In our dev environment, it is a common practice to use varchar columns instead of Char so that less space is required and accordingly index size will be less.

    While doing some research on these.

    I created a table with varchar and char column for same size columns.

    I noticed that for varchar, the page count was less (as obviously) - and for the table with char column the page count was high

    However after some updates with increasing column size, the fragmentation effect on table with varchar column was huge.

    So what is your suggestions based on your experience which one is worse:

    To suffer from Fragmented indexes

    Or to have extra pages

  • Pick the correct data type for the data, have a FillFactor that's not 100%, and make sure that you don't have the coding pattern of inserting 'empty' rows and then updating them to contain data.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • By and large, if the data entry is variable in length, I use VARCHAR even though it might be more likely to fragment. That's why we do maintenance on the indexes to deal with this. Storing lots and lots of empty space in a CHAR field only because I don't want to do index maintenance doesn't seem like the best of choices. If the data is fixed length, use CHAR. If it's variable length, use VARCHAR. Don't get hung on secondary processes when picking data types. Pick the correct data type and structure for the data being stored. Dates go in DATE. Times go in Time. Date and time go in DateTime, integer values in INT, fixed length decimals in DECIMAL, etc.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Fair enough,

    I will come up with some scripts for pinpointing the issue

  • Is it page splits that you're seeing?

    It's understandable that key columns or date/time stamps, like OrderStatusID or InventoryDate, are updated as part of an entity's normal lifecycle, but these type columns should be fixed width so updating them doesn't result in change of row width and page splits.

    Maybe drifting a little off topic, but when I see variable width columns routinely getting updated, it's often times symptomatic of a data model that hasn't been normalized correctly. For example, I sometimes see developers using VarChar columns to contain multi-valued ID collections (ex: ProductIDs = "234;1123;4322;"), so when a new item is added to the collection, it requires updating and extending the width. The same goes for an XML data typed columns containing a collection. Ideally you want this modeled in a separate table.

    For even driven attribute changes, something like changes in Customer name or address, I use a separate time stamped table that's joined to the primary Customer table. Not only does it prevent updating Customer rows, but it retains useful history. For example:

    Customer: CustomerID (PK), CustomerSSN, AcquisitionDate

    CustomerAddress: CustomerID (FK), Address1, Address2, StateCode, ZipCode, EffectiveDate, ExpireDate

    For volitile VarChar columns, perhaps something like comments, they can be better managed in a separate table, or you can perhaps define logically in-row as VarChar(max) but set the table's [large value types out of row] option to 'ON', so the text data is physically stored in pages separate from the other static columns. For large tables, you don't want users editing free-form columns that are stored in-row, and this option can be helpful.

    Another scenario that can cause page splits is when users are updating columns contained in the table's clustered key, and this applies not only to variable width columns but also to fixed width columns, even integers.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (1/18/2016)


    Another scenario that can cause page splits is when users are updating columns contained in the table's clustered key, and this applies not only to variable width columns but also to fixed width columns, even integers.

    I'm of the opinion that you should pick a clustered index that never needs updating. Remember the criteria for a clustered index:

    1. Narrow

    2. Non-volatile

    3. Ever increasing

    4. Unique

  • Ed Wagner (1/18/2016)


    Eric M Russell (1/18/2016)


    Another scenario that can cause page splits is when users are updating columns contained in the table's clustered key, and this applies not only to variable width columns but also to fixed width columns, even integers.

    I'm of the opinion that you should pick a clustered index that never needs updating. Remember the criteria for a clustered index:

    1. Narrow

    2. Non-volatile

    3. Ever increasing

    4. Unique

    Definately. Updatable clustered keys, multi-value columns, VarChar dates, and in-row comments will leave your table looking (and running) like a contestant on the TV show 'The Biggest Loser '.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • er.mayankshukla (1/18/2016)


    Hi all,

    In our dev environment, it is a common practice to use varchar columns instead of Char so that less space is required and accordingly index size will be less.

    While doing some research on these.

    I created a table with varchar and char column for same size columns.

    I noticed that for varchar, the page count was less (as obviously) - and for the table with char column the page count was high

    However after some updates with increasing column size, the fragmentation effect on table with varchar column was huge.

    So what is your suggestions based on your experience which one is worse:

    To suffer from Fragmented indexes

    Or to have extra pages

    To answer your first question, yes... One of the uses of VARCHAR() is to reduce the amount of space required especially if the width of the data varies widely and especially if the max width is wide. VARCHAR(1) and some of the lower lengths can work the opposite and actually cause more space to be used because VARCHAR() has 1 2 byte overhead to remember the length of the contents. Of course, if you have a column that has no variance in width, the CHAR would be the way to go no matter how wide that fixed width actually is.

    Shifting gears a bit, fragmentation means practically nothing to most front end code because it typically is only looking for 1 row. Fragmentation isn't going to affect anything there because the lookup still has to go through the B-Tree of the index and it's still capable of doing a seek. A large batch job might be different. Sure, it would still use the same method to find the first value but then, optimally, it would do a range scan at the leaf level of the index to get the rest of the rows. That's where fragmentation can really play an important part in slowing things down.

    And that leads us to the last part of your post... to suffer a bit from fragmented indexes that need to be rebuilt or to have extra pages. The answer is solely dependent on the type of data, the type of index, the average width of rows, and how often the data changes. For example, a lot of people will use a surrogate key for the clustered index just to control the "shape" of the table and to prevent very costly clustered index splits on tables that are wide or take a lot of inserts/updates. Other tables do better with clustered indexes on natural keys (usually reference tables that are mostly static over their lifetime).

    So, the bottom line is that I absolutely agree with Gail. The first task is to select the correct datatype (and it should always be correct, properly sized,and of primary concern) and then you can worry about indexing. The first part of indexing would be to select the correct key(s) for the clustered index based on the type of table, the expected usage, and the type of data.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks all for your replies

    I have written a small article explaining this and leaving to the developer on the decision of their type of data and transaction pattern as Gail and Jeff said

    https://sqlmayank.wordpress.com/2016/01/22/varchar-and-char_facts-to-consider/

    Will appreciate if I get corrections/feedback to this

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply