November 24, 2010 at 3:30 pm
I have done research and I am not finding any detailed information about what I am looking for regarding char() vs. varchar().
Example:
If you have 2 columns, one defined as char(10) and the other varchar(10)...and you put the word "music" in each of them, my understanding and research says that the char(10) will store it as "music" with 5 trailing spaces. The varchar(10) will store it as "music" without the trailing spaces.
This results in a length of 10 for the char and 5 for the varchar (more or less). My question which I cannot find is: Is this physical or logical storage? (meaning, if it is physical, then the varchar record is really a variable length record and you save disk space).
Any clarification would help and be appreciated.
Thanks in advance!
Don
November 24, 2010 at 3:52 pm
dpulliam3 (11/24/2010)
I have done research and I am not finding any detailed information about what I am looking for regarding char() vs. varchar().Example:
If you have 2 columns, one defined as char(10) and the other varchar(10)...and you put the word "music" in each of them, my understanding and research says that the char(10) will store it as "music" with 5 trailing spaces. The varchar(10) will store it as "music" without the trailing spaces.
This results in a length of 10 for the char and 5 for the varchar (more or less). My question which I cannot find is: Is this physical or logical storage? (meaning, if it is physical, then the varchar record is really a variable length record and you save disk space).
Any clarification would help and be appreciated.
Thanks in advance!
Don
Don,
From BOL:
char [ ( n ) ]
Fixed-length, non-Unicode character data with a length of n bytes. n must be a value from 1 through 8,000. The storage size is n bytes. The ISO synonym for char is character.
varchar [ ( n | max ) ]
Variable-length, non-Unicode character data. n can be a value from 1 through 8,000. max indicates that the maximum storage size is 2^31-1 bytes. The storage size is the actual length of data entered + 2 bytes. The data entered can be 0 characters in length. The ISO synonyms for varchar are char varying or character varying.
The char(10) column will use 10 characters.
The "music" in the varchar(10) column will use (5 + 2) = 7 characters. As you use less and less space, you will save disk/storage space.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
November 24, 2010 at 3:54 pm
I can't find the blog post that described this so well right now. The storage is physical. There is a tradeoff at either 8 or 9 characters as to storage requirements for char v varchar. After 9 characters, varchar is better but up to 9 char is better for storage savings.
This is physical storage.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
November 24, 2010 at 3:57 pm
Thanks Wayne for the reply. I must have missed the "actual length" part in the BOL.
November 24, 2010 at 4:08 pm
dpulliam3 (11/24/2010)
Thanks Wayne for the reply. I must have missed the "actual length" part in the BOL.
No problem.
@jason: if you find that blog, I'd be interested in seeing it
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
November 24, 2010 at 4:10 pm
Thank you, Jason.
I, too, would be interested in the blog you referenced, if you find it.
November 24, 2010 at 4:11 pm
I will look some more and post if I find it.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
March 3, 2011 at 7:00 pm
Does NChar or Char perform better that their var alternatives?
Great question. The simple answer is yes in certain situations. Let's see if this can be explained.
Obviously we all know that if I create a table with a column of varchar(255) (let's call this column myColumn) and insert a million rows but put only a few characters into myColumn for each row, the table will be much smaller (overall number of data pages needed by the storage engine) than if I had created myColumn as char(255). Anytime I do an operation (DML) on that table and request alot of rows, it will be faster when myColumn is varchar because I don't have to move around all those "extra" spaces at the end. Move, as in when SQL Server does internal sorts such as during a distinct or union operation, or if it chooses a merge during it's query plan, etc. Move could also mean the time it takes to get the data from the server to my local pc or to another computer or wherever it is going to be consumed.
But there is some overhead in using varchar. SQL Server has to use a two byte indicator (overhead) to, on each row, to know how many bytes that particular row's myColumn has in it. It's not the extra 2 bytes that presents the problem, it's the having to "decode" the length of the data in myColumn on every row.
In my experiences it makes the most sense to use char instead of varchar on columns that will be joined to in queries. For example the primary key of a table, or some other column that will be indexed. CustomerNumber on a demographic table, or CodeID on a decode table, or perhaps OrderNumber on an order table. By using char, the query engine can more quickly perform the join because it can do straight pointer arithmetic (deterministically) rather than having to move it's pointers a variable amount of bytes as it reads the pages. I know I might have lost you on that last sentence. Joins in SQL Server are based around the idea of "predicates." A predicate is a condition. For example myColumn = 1, or OrderNumber < 500.
So if SQL Server is performing a DML statement, and the predicates, or "keys" being joined on are a fixed length (char), the query engine doesn't have to do as much work to match rows from one table to rows from another table. It won't have to find out how long the data is in the row and then walk down the string to find the end. All that takes time.
Now bear in mind this can easily be poorly implemented. I have seen char used for primary key fields in online systems. The width must be kept small i.e. char(15) or something reasonable. And it works best in online systems because you are usually only retrieving or upserting a small number of rows, so having to "rtrim" those trailing spaces you'll get in the result set is a trivial task as opposed to having to join millions of rows from one table to millions of rows on another table.
Another reason CHAR makes sense over varchar on online systems is that it reduces page splits. By using char, you are essentially "reserving" (and wasting) that space so if a user comes along later and puts more data into that column SQL has already allocated space for it and in it goes.
Another reason to use CHAR is similar to the second reason. If a programmer or user does a "batch" update to millions of rows, adding some sentence to a note field for example, you won't get a call from your DBA in the middle of the night wondering why their drives are full. In other words, it leads to more predictable growth of the size of a database.
So those are 3 ways an online (OLTP) system can benefit from char over varchar. I hardly ever use char in a warehouse/analysis/OLAP scenario because usually you have SO much data that all those char columns can add up to lots of wasted space.
Keep in mind that char can make your database much larger but most backup tools have data compression so your backups tend to be about the same size as if you had used varchar. For example LiteSpeed or RedGate SQL Backup.
Another use is in views created for exporting data to a fixed width file. Let's say I have to export some data to a flat file to be read by a mainframe. It is fixed width (not delimited). I like to store the data in my "staging" table as varchar (thus consuming less space on my database) and then use a view to CAST everything to it's char equivalent, with the length corresponding to the width of the fixed width for that column. For example:
create table tblStagingTable ( pkID BIGINT (IDENTITY,1,1), CustomerFirstName varchar(30), CustomerLastName varchar(30), CustomerCityStateZip varchar(100), CustomerCurrentBalance money )
insert into tblStagingTable (CustomerFirstName,CustomerLastName, CustomerCityStateZip) ('Joe','Blow','123 Main St Washington, MD 12345', 123.45)
create view vwStagingTable AS SELECT CustomerFirstName = CAST(CustomerFirstName as CHAR(30)), CustomerLastName = CAST(CustomerLastName as CHAR(30)), CustomerCityStateZip = CAST(CustomerCityStateZip as CHAR(100)), CustomerCurrentBalance = CAST(CAST(CustomerCurrentBalance as NUMERIC(9,2)) AS CHAR(10))
SELECT * from vwStagingTable
This is cool because internally my data takes up less space because it's using varchar. But when I use DTS or SSIS or even just a cut and paste from SSMS to Notepad, I can use the view and get the right number of trailing spaces. In DTS we used to have a feature called, damn I forget I think it was called "suggest columns" or something. In SSIS you can't do that anymore, you have to tediously define the flat file connection manager. But since you have your view setup, SSIS can know the width of each column and it can save alot of time when building your data flow tasks.
So bottom line... use varchar. There are a very small number of reasons to use char and it's only for performance reasons. If you have a system with hundrends of millions of rows you will see a noticeable difference if the predicates are deterministic (char) but for most systems using char is simply wasting space.
Hope that helps. Jeff
February 14, 2012 at 11:26 pm
It's great question and very useful discussion and worth to read. Thanks a lot for wonderful explanation.:-)
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply