October 23, 2009 at 6:11 am
As far as I know, VARCHAR(MAX) is essentially the replacement for the old TEXT data type.
My question is - does VARCHAR(MAX) perform the same way as VARCHAR(x)? IE, does it allocate only enough space to a row to contain the value of that field + 2 more to handle the size? And, for that matter, am I even interpreting VARCHAR(x) correctly?
To my knowledge, CHAR(x) means that the given item requires x bytes of space, regardless of what is stored, meaning even if the field is empty, it still takes up x bytes of space - though if the field is NULL, it takes no space.
VARCHAR(x) on the other hand, means that the given item takes only as much space as its length, + 2 to indicate what the length is - to a maximum of x + 2. Again, if the field is NULL, it takes no space.
VARCHAR(MAX), then, by my reasoning, also takes up only enough space to fit the item in question, + 2 for the length, and, once again, no space for a NULL value.
If these are all true, then what is the purpose of using VARCHAR(x) instead of VARCHAR(MAX)? The only advantage I can see, is that VARCHAR(MAX) can't be indexed using a normal index, while VARCHAR(x) can, as long as the size of the index is less than 900 bytes.
So, as far as I know, the only reason to use VARCHAR(x) is if the field is intended to be indexed. If not, there's no reason not to use VARCHAR(MAX).
Is this reasoning correct?
October 23, 2009 at 6:33 am
Personally I believe that VARCHAR(MAX) is a datatype of last resort.
To give an extreme example the longest possible telephone number is 15 digits so allowing for formatting I allow VARCHAR(20). This means that there are some boundaries on what can be inserted into the field.
VARCHAR(MAX) effectively moves boundaries to such an extent that they become irrelevant. I take the view that if the database does allow unrestricted data then at some point unrestricted data will get put into it.
The other things that strike me is that the "text in row" option may apply and also the maximum replicated text size (defaults to 64Kb.
As far as indexing VARCHAR(MAX) it is possible to put a column which stores the CHECKSUM for the VARCHAR(MAX) and index that. Obviously this isn't a guaranteed match but is good enough to get within shooting distance of the correct records.
October 23, 2009 at 6:38 am
Not quite. If you can guarantee that you will not need to store values greater than 8000 chars, then go for a varchar with a fixed maximum length (of the maximum size your business will need for the column).
This is for a few reasons:
1) Although Varchar(n) and Varchar(max) can be stored in exactly the same way when the actual length is < 8000, the optimiser will treat them differently to allow for the possibility of out-of-row values. So it can be faster to query varchar(n) columns.
2) It's a good way to validate that input data is sensible - personally, I wouldn't want someone to be able to enter 2GB worth of chars into an address field for example
3) Why would you not want the option of indexing the column in the future if you know the values will always be small enough to do so
October 23, 2009 at 6:50 am
Well, I would agree that for a field like telephone number, or zip code, or maybe even address, a fixed VARCHAR(x) value makes sense.
But what about for a field like, say, description? Or, a field that contains a comma-delimited list of values, the values of which could be an arbitrary length?
Yes, I could provide a fixed length. But it seems to me like that's just asking for problems in the future. Let's say I decided that my field was going to store a comma-delimited list of values, and that right now, it seems to me like the number of values would likely not exceed 10, and the length of each value would likely not exceed 5. So, sure, it might make sense for me, right now, to put VARCHAR(60) as my length, lets say.
But two months down the line, maybe something has changed, and now it's possible to have 10 values, or the length of a given value might be actually 15. Now I have to go and update my column definition to allow for a greater VARCHAR(x) value - but not only that, I also have to update all of my procedures, both the select and update ones, to reflect the new x value, or else it would either truncate or lose data that was necessary.
David, can you elaborate on this?
The other things that strike me is that the "text in row" option may apply and also the maximum replicated text size (defaults to 64Kb).
And, Howard, can you elaborate on this?
Although Varchar(n) and Varchar(max) can be stored in exactly the same way when the actual length is < 8000, the optimiser will treat them differently to allow for the possibility of out-of-row values. So it can be faster to query varchar(n) columns.
These are the kind of things that I'd like to know about. Essentially, reasons why a VARCHAR(MAX) datatype would perform worse than a VARCHAR(x) one.
October 23, 2009 at 6:54 am
It really depends on data integrity. If you're dealing with what is essentially unstructured data, then yes VARCHAR(MAX) is the way to go. If you've got a structure, then no, I wouldn't use it the way you're describing. It really mucks with the data integrity.
There is also performance and storage concerns for VARCHAR(MAX) when the data goes beyond what will fit on a page that make is so that you should only use it when you really actually need it, not just because it makes the possibility of later changes easier.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
October 23, 2009 at 7:00 am
The maximum size of a row in SQL Server is 8KB, so all the columns added together must not exceed this.
The way that varchar(max) gets around this restriction is that if the data exceeds the max length for a row, the row will hold a pointer to the data and the data will be stored in seperate page/pages. Obviously there is added overhead in retrieving these extra pages. There is also extra overhead in checking each row to see whether the data is 'in-row' or 'out-of-row'.
I have to ask, why would you want to store a list of comma seperated values in a single column? What do they represent? Why would you not store each of these values in a child table and have a row for each value - it'll make life much easier if you want to do any manipulation of the data in SQL.
If it's a description field that can legitimately exceed 8000 chars, then absolutely, varchar(max) is the way to go.
October 23, 2009 at 7:21 am
Well, the reason I'm using a comma-delimited list as opposed to a child table, is that I have no need for the values as individual entities.
As an example, I have a multi-select combo box in my application, which stores a list of the countries that the user has selected.
What I want, is to be able to save what the user selected, so when they re-load it, they have their entries still.
As such, all I would need to do is grab the field with the comma delimited list of countries, then perform a split() command in my application, and populate the combo box accordingly.
Were I to use a child table for this purpose, for every one of the combo boxes in my application, I would need to have another table, for the sole purpose of storing that entry. Just creates more overhead than I would rather have.
October 24, 2009 at 9:31 am
David, can you elaborate on this?
The other things that strike me is that the "text in row" option may apply and also the maximum replicated text size (defaults to 64Kb).
sp_tableoption used to allow you to force text and image data up to a certain size would be stored in the row rather than as a pointer. I think Andy, Brian or Steve used it in the early days of SQLServerCentral to speed up retrieval of forum posts. It doesn't apply to VARCHAR(MAX) http://msdn.microsoft.com/en-us/library/ms173530.aspx
sp_configure lets you set "max text repl" size to something other than 64Kb. If you replicate large blobs this is very useful.
June 20, 2013 at 8:40 am
I'd model that problem differently...
CREATE TABLE Lists(
List_Id INTEGER NOT NULL,
List_Name VARCHAR(30) NOT NULL);
CREATE TABLE List_Options(
List_Id INTEGER NOT NULL,
Option_Id INTEGER NOT NULL,
Option_Value INTEGER,
Option_Name VARCHAR(100) NOT NULL);
CREATE TABLE Users(
User_Id INTEGER NOT NULL,
User_Name VARCHAR(50));
CREATE TABLE User_List_Options(
User_Id INTEGER NOT NULL,
List_Id INTEGER NOT NULL,
Option_Id INTEGER NOT NULL
);
June 20, 2013 at 8:47 am
Please note: 4 year old thread.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 14, 2015 at 7:42 am
Also, if programmers will be interacting with the database, then they can data-bind the User Interface (UI) controls to the database columns. This has several benefits, one of which is that size restrictions will be automatically enforced with a simple database change. It's a great way to make the UI and database work together seamlessly.
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply