August 8, 2007 at 3:35 pm
I am designing a table with a column that's having fixed width where records will be always English. I am thinking of data type to be char or varchar. Apart from space issues, are there any benefits for using one or another ? Or they will be the same ?
The number of records in this table will be in millions, the most of the rest character fields (like notes) will be nvarchar. I have already done some tests for performance and the difference was very sligth.
Are there any issues besides performance that I have to take into account ?
Thanks.
August 8, 2007 at 7:02 pm
Fixed width = CHAR
always english = VARCHAR
This however assumes that those variables will never change, which is sometimes hard to garantee.
August 9, 2007 at 1:54 am
Assuming, as you mention, that your column is of fixed width, and this will not change in future designs, using chars is a good idea.
Internal storage:
Varchars are stored in the same physical data row as chars. Chars are stored in the fixed length section, varchars in the variable length section. So in terms of storage overhead for the same string in a varchar the internal row length will be about 2 bytes longer if you have other variable length columns, a few more bytes if this is the only variable length column. If you do have variable sized data but all of your varchars are shorter than 2-3 bytes it may be worth thinking about chars.
Transaction log:
When you modify chars, rows are not moved around, and the transaction log records are also very small (only the character differences are recorded). For the variable length data if the length of the varchar changes the log records will be longer (it records data and length changes), and in some cases the row would need to be moved around (either on the same page or to a new page). If the length the modified varchar is the same the transaction log record will have the same length as in the case of the char column.
Long term:
If you need to increase the width of the row in the future, then go for varchar (or you would need to rebuild the table). While there is SQL to alter a column, add and drop it with different types, SQL Server rarely reuses space for fixed length column types, so you may end up with wasted space in every physical row record.
Regards,
Andras
August 10, 2007 at 8:18 am
I'm confused. Is the question "char or nchar" as the title states? or "char or varchar" as the answers are running?
Char or Nchar (or varchar vs Nvarchar) is the choice to support double width char data for those non English installs where two bytes per charactor are required. Using Nchar is a total waste of space if you are always storing single width data.
Student of SQL and Golf, Master of Neither
August 10, 2007 at 10:12 am
Yes, that's between char or nchar (or English and International). I've also noticed that some replyes were about (n)char and (n)varchar, but that's not a problem.
If the difference only about space, I don't care about it. I wanted to find other problems that might arise with this.
August 10, 2007 at 11:47 am
I take my hat off you Andras - I thought nobody else knew that ...
This advice should also be coupled with NOT NULL as well because anytime you allow a column to be NULL it is then moved from the fixed portion (NOT NULL) to the variable portion (NULL-able) of the data page.
Additionally, is your row of data contains all fixed length columns and no NULL-able columns then when a row is updated it can be updated 'in place'. If any row has any variable or NULL columns then the engine has to work harder because it must forst perform a space hunt and possibly reorg the rows prior to the update. Really esoteric stuff but valuable in high volume OLTP environments.
RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."
August 10, 2007 at 1:37 pm
Lots of good info here, but might I suggest that you really should care about space. Good database design means using the right datatype for the data that will be used. For example, I find a lot of people use datetime when they really only need smalldatetime - using 8 bytes instead of only 4.
If you're certain it will always be English, then there is no good reason to use a unicode datatype that is two bytes per character vs. one byte per character. With millions of rows, that extra byte can add up to a lot of wasted space.
I know this from experience. Years ago on one of the first databases I set up, I had used several nchar and nvarchar columns for no good reason. The database ended up being about 26GB until I rebuilt it without unicode datatypes and got it down to about 13GB. Which would you rather have to backup - a 13GB or 26GB database? That means longer backup times as well as longer restore times. Sure, you can buy more disk space, but you can't buy more time (other than buying backup tools that speed up the process, but the bigger file is still going to take longer to deal with).
Space alone is a good enough reason to stick with char vs nchar.
August 10, 2007 at 1:46 pm
Good answer Bill. I had almost the same reply half typed up when I went back and saw your answer.
Student of SQL and Golf, Master of Neither
August 10, 2007 at 3:06 pm
The impact of the size of the table is not just about space on the disk or how long the backup takes to run.
If you have to scan a table for a report, half the size means it runs a lot faster, even if the whole table is cached in memory. When you push data through memory and CPU, twice as much data means twice as long to run.
More data also means you will have less of a large database in memory, so you are more likely to have to go to disk to get the data and that slows it down even more.
You must alway consider the impact of the physical size of your data, especially for large databases.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply