This one is basic but still something that can catch you unawares if you aren’t careful. Most DBAs and developers I know use the function LEN to calculate the length of a string. Little wonder since that is exactly what this function is for. However not everyone realizes the small gotcha with this very basic function. It doesn’t count trailing spaces!
PRINT LEN('123') PRINT LEN('123 ') -- three trailing spaces ----------------------------------- 3 3
Once a developer discovers this, the next most obvious question is “How do I get it to count the trailing space?” Well the easiest way I’ve found is to add a character to the end of the string, then subtract one. Looks a little goofy but it works.
PRINT LEN('123'+'.')-1 PRINT LEN('123 '+'.')-1 -- three trailing spaces ----------------------------------- 3 6
Then comes the function DataLength. This function also returns the length of a string. Or does it? Technically no, it doesn’t. It returns the number of bytes used to represent an expression, or in this particular case a string.
PRINT DATALENGTH('123') PRINT DATALENGTH('123 ') -- three trailing spaces ----------------------------------- 3 6
The results of this are exactly what you probably expected. 3 and 6. So why not use DataLength instead of Len?
PRINT DATALENGTH(N'123') PRINT DATALENGTH(N'123 ') -- three trailing spaces ----------------------------------- 6 12
For those of you who don’t know the N at the beginning of a string means that the string is to be Unicode. And Unicode strings take up 2 bytes for every character. Not 1. So now DataLength is giving us double the length of the string. Not exactly what we were looking for. We could divide by two, but only if we know for sure the string is going to be Unicode.
So to recap. If you are uncertain about the data type of your string then you use LEN. If you care about trailing spaces use LEN(string+’.’)-1, if not just use LEN() on its own. Now if you are certain about the data type of your string then you can use DataLength. Unicode is DataLength(string)/2 and Non-Unicode is just DataLength().
As a side note DataLength works on any datatype. Give it a shot with some numeric, date, datetime etc values. It can be fairly interesting.
Filed under: Microsoft SQL Server, SQLServerPedia Syndication, T-SQL Tagged: code language, language sql, microsoft sql server, sql statements, string code, T-SQL, unicode strings