January 15, 2010 at 8:56 am
I'm a DBA responsible for both operations and engineering development for our databases. I rarely design tables, only approve or disapprove of what coders think should be in a table. I do, however, have the final say on all index implementations which I create myself.
Recently, a developer wanted to create a stored procedure that included a VARCHAR(MAX) column in a WHERE clause (without a LIKE reserved word). All of my knowledge and experience tells me that this is going to be very inefficient and resource intensive query. I believe that indexing a VARCHAR(MAX) column is a big mistake.
Can anyone share explicit experience they have with the implementation of this scenario and the consequences of it? I need a qualified second opinion to give an absolute "NO" to this proposed implementation.
LC
January 15, 2010 at 9:07 am
I wouldn't do it.
As an alternative, maybe you could do a Full-text index on that column.
It will be an expensive Index (Generally speaking) if you do decide to create a NC for that column.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
January 15, 2010 at 9:46 am
You cannot create an index that has a varchar(max) as a key column.
CREATE TABLE Test (
id INT,
StringCol VARCHAR(MAX)
)
CREATE INDEX idx_wontwork ON Test (StringCol)
Msg 1919, Level 16, State 1, Line 2
Column 'StringCol' in table 'Test' is of a type that is invalid for use as a key column in an index.
You can add a column of type varchar(max) as an include column, but that isn't seekable, plus will duplicate the column completely (twice the storage)
Why would someone be doing a direct string comparison (without like) on a varchar(max)? Unless the column has the incorrect data type, that could be thousands of characters of string.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 15, 2010 at 6:08 pm
Thanks for your reply, Jason.
And thanks for your indisputable reply, Gail. It will end all debate on the subject. In answer to your "Why?" question, it's because this particular programmer has no concept of the need for efficiency.
LC
January 15, 2010 at 7:13 pm
crainlee2 (1/15/2010)
Thanks for your reply, Jason.And thanks for your indisputable reply, Gail. It will end all debate on the subject. In answer to your "Why?" question, it's because this particular programmer has no concept of the need for efficiency.
LC
Heh... apparently no concept of design, either. 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
January 15, 2010 at 9:31 pm
crainlee2 (1/15/2010)
Thanks for your reply, Jason.And thanks for your indisputable reply, Gail. It will end all debate on the subject. In answer to your "Why?" question, it's because this particular programmer has no concept of the need for efficiency.
LC
You're welcome
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
January 16, 2010 at 12:49 am
If you really need to do an exact match lookup on a varchar(max) column, you can add a computed column that contains a checksum of the varchar(max) column, and index that. This is a very fast way to do lookups on large strings, but will only work for exact matches. Note that different values can have the same checksum, so you must still do an equality check on the actual column.
The idea is that a lookup using an index on an integer checksum value will be extremely fast and then the equality check will only have to be made on a small number of duplicate checksums.
You could also implement this using an indexed view with a checksum or by adding an actual checksum column to the table that has a constraint that forces the value to match the checksum of the varchar(max) column.
Sample lookup code:
declare @MyLookupStringvarchar(max)
set @MyLookupString ='My Lookup value'
select
*
from
Mytable
where
-- Find candidate exact matches using checksum index
MyVarcharMaxChecksum = checksum(@MyLookupString)and
-- Verify exact match to lookup value
MyVarcharMax = @MyLookupString
January 16, 2010 at 1:15 am
crainlee2 (1/15/2010)
In answer to your "Why?" question, it's because this particular programmer has no concept of the need for efficiency.
Nor appropriate data types, I would assume.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 16, 2010 at 8:23 am
Michael Valentine Jones (1/16/2010)
If you really need to do an exact match lookup on a varchar(max) column, you can add a computed column that contains a checksum of the varchar(max) column, and index that. This is a very fast way to do lookups on large strings, but will only work for exact matches. Note that different values can have the same checksum, so you must still do an equality check on the actual column.The idea is that a lookup using an index on an integer checksum value will be extremely fast and then the equality check will only have to be made on a small number of duplicate checksums.
You could also implement this using an indexed view with a checksum or by adding an actual checksum column to the table that has a constraint that forces the value to match the checksum of the varchar(max) column.
Michael brings up an excellent method. I just want to emphasize a caution that he also mentioned...
Because the built in CHECKSUM functions are nothing more than a simple character-by-character "Exclusive OR", be [font="Arial Black"]VERY [/font]aware that two different values can, in fact, have the same checksum. Matching checksums (using the built in CHECKSUM functions) are no guarantee of a perfect match. Non matching checksums are, however, a guarantee that they are not a perfect match. If there is a match (again, using the built in CHECKSUM functions), additional checking [font="Arial Black"]must[/font] still be done to guarantee the match. Of course, using CHECKSUM helps you narrow down the field very quickly as Micheal says.
If you're using certain other methods to do the hashing (I forget what they're called... MD5???), then the additional checking will not be necessary.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 16, 2010 at 9:40 am
GilaMonster (1/16/2010)
crainlee2 (1/15/2010)
In answer to your "Why?" question, it's because this particular programmer has no concept of the need for efficiency.Nor appropriate data types, I would assume.
Exactly. There is a strong tendency in our Engineering department to use VARCHAR(MAX) datatypes instead of VARCHAR(N) datatypes, which is generally in the category of a long-standing bad habit rather than a design requirement deliberately thought through.
I'm currently involved the on-going process of educating our developers on T-SQL, SQL Server, and database best practices, explaining why different implementations that appear to produce identical results, are, in execution, more or less efficient than others, and why they should care about it. They're a good group of engineers but like many software developers, they specialize in software algorithms, not database technologies. The process of education on a foreign technology takes time. And old habits die hard.
It's taken a while but I've finally been allowed to become involved in certain aspects of database and table design. I'm asking developers to defend their design choices, among other things. It's having the intended effect, of educating them on the need to design minimally and carefully, with a lot of fore-thought on exactly how the data is to be accessed and utilized.
LC
January 16, 2010 at 9:43 am
Michael and Jeff,
Thank you for your contributions on using CHECKSUM as a way of searching for equalities on VARCHAR(MAX) datatypes.
I think there are a few places in our databases where we can put that to use.
LC
January 16, 2010 at 10:25 am
You bet... thanks for the feedback, Lee.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 17, 2010 at 10:38 pm
crainlee2 (1/16/2010)
Michael and Jeff,Thank you for your contributions on using CHECKSUM as a way of searching for equalities on VARCHAR(MAX) datatypes.
I think there are a few places in our databases where we can put that to use.
LC
I didn't mention it before, but this is a good way to handle this for long string columns, say varchar(50), instead of putting an index on the column.
January 19, 2010 at 5:45 pm
I really think that it is appropriate to understand what the developer is trying to acomplish.
Understanding the use-case will point you in the correct direction and allow you to better council the developer and provide better support.
Without fully understanding the use-case you are really just shooting in the dark.
As someone already pointed out, a full-text index may be the appropriate soltuion, but without understanding the application there is no way to determine the answer.
January 20, 2010 at 1:34 am
Jeff Moden (1/16/2010)
If you're using certain other methods to do the hashing (I forget what they're called... MD5???), then the additional checking will not be necessary.
Even with MD5 or other hashing methods there's still a possibility of a checksum clash--it might not be as likely as with a simple XOR checksum but it's still possible. The possibility increases the more characters you have in your string, too.
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply