It wasn’t until recently that i paid attention to truncation during concatenation, but that’s because whenever I used concatenation before it was only using short data lengths so I didn’t see the effects of truncation.
When I first noticed that truncation can occur I was building a Huge piece of dynamic SQL , Don’t worry this was just a setup script designed to be ran just once and then saved off for later use on other servers where required, it’s not like I was going to run this thing every couple of minutes on Production or anything
So there I was building this massive VARCHAR(MAX) string and concatenated at various points in my code were Database names of the datatype NVARCHAR(128).
The interesting part was that I was expecting SQL server to use my largest data type – the VARCHAR(MAX) and just concatenate the NVARCHAR(128) values into it
this was not the case – what actually happened was my string of VARCHAR(MAX) characters being truncated down to an NVARCHAR(4000)!
There is a reason for this and its all to do with Data Type Precedence in this case the NVARCHAR is preceding my VARCHAR unless of course I explicitly convert the NVARCHAR to a VARCHAR.
Normally for me this would go undetected as the length of string that I would normally concatenate is way below 4000 chars but as i said earlier – this was a massive piece of dynamic SQL approx 36000 characters in length.
To illustrate the issue better I have put together some examples which show when truncation can occur.
For my NVARCHAR Value I will be using the results of the below query :
SELECT TOP 1 name FROM sys.databases ORDER BY name ASC
Result: AdventureWorks2014 (18 Characters in Length)
I will be using this query throughout the examples so that we return a real result from a table, then we will be concatenating the results of two REPLICATE functions just so that we can keep things tidy for the demonstration (and to save me having to count out 4000 Characters of text )
The results will be inserted into a table so that we can demonstrate the final Datatype used.
Example 1:
--Concatenate NVARCHAR(128) , NVARCHAR(4000) and NVARCHAR(4000) SELECT TOP 1 name + REPLICATE(N'A',4000) + REPLICATE(N'A',4000) as MyConcatenation INTO Results FROM sys.databases ORDER BY name ASC --Check the Length of the Column we are expecting 8018 Chars - we get 4000 SELECT LEN(MyConcatenation) AS MyConcatenation_ColumnLength FROM Results --Check the DataLength for the Datatype created - NVARCHAR(4000) SELECT Columns.name AS [ColumnName], Columns.column_id AS [ID], Types.name AS [DataType], CAST(CASE WHEN UserTypes.name IN (N'nchar', N'nvarchar') AND Columns.max_length <> -1 THEN Columns.max_length/2 ELSE Columns.max_length END AS int) AS [Length], Columns.is_nullable AS [Nullable] FROM sys.tables AS tbl INNER JOIN sys.all_columns AS Columns ON Columns.object_id=tbl.object_id LEFT OUTER JOIN sys.types AS Types ON Types.user_type_id = Columns.user_type_id LEFT OUTER JOIN sys.types AS UserTypes ON (UserTypes.user_type_id = Columns.system_type_id and UserTypes.user_type_id = UserTypes.system_type_id) or ((UserTypes.system_type_id = Columns.system_type_id) and (UserTypes.user_type_id = Columns.user_type_id) and (UserTypes.is_user_defined = 0) and (UserTypes.is_assembly_type = 1)) WHERE tbl.name= N'Results' DROP TABLE Results
So being Naive I wrongly assumed that this would automatically be converted to an NVARCHAR(MAX) , SQL will not do this for us and rightly so – just because I am concatenating two values that together go beyond the 4000 Character Limit for NVARCHAR (non MAX) SQL is going to simply treat this as an NVARCHAR(4000).
Example 2:
--Next up Explicit NVARCHAR(MAX) Conversion SELECT TOP 1 CAST(name AS NVARCHAR(MAX)) + REPLICATE(N'A',4000) + REPLICATE(N'A',4000) as MyConcatenation INTO Results FROM sys.databases ORDER BY name ASC --Check the Length of the Column, expecting 8018 and we have 8018 SELECT LEN(MyConcatenation) AS MyConcatenation_ColumnLength FROM Results --Check the DataLength for the Datatype created - NVARCHAR(MAX) SELECT Columns.name AS [ColumnName], Columns.column_id AS [ID], Types.name AS [DataType], CAST(CASE WHEN UserTypes.name IN (N'nchar', N'nvarchar') AND Columns.max_length <> -1 THEN Columns.max_length/2 ELSE Columns.max_length END AS int) AS [Length], Columns.is_nullable AS [Nullable] FROM sys.tables AS tbl INNER JOIN sys.all_columns AS Columns ON Columns.object_id=tbl.object_id LEFT OUTER JOIN sys.types AS Types ON Types.user_type_id = Columns.user_type_id LEFT OUTER JOIN sys.types AS UserTypes ON (UserTypes.user_type_id = Columns.system_type_id and UserTypes.user_type_id = UserTypes.system_type_id) or ((UserTypes.system_type_id = Columns.system_type_id) and (UserTypes.user_type_id = Columns.user_type_id) and (UserTypes.is_user_defined = 0) and (UserTypes.is_assembly_type = 1)) WHERE tbl.name= N'Results' DROP TABLE Results
This time we get the 8018 Characters as expected and we also get our NVARCHAR(MAX) .
So What about VARCHAR? how does that behave?
Onto Example 3:
--IMPLICIT CONVERSION, Truncation to NVARCHAR(4000) will occur SELECT TOP 1 name + REPLICATE('A',4000) + REPLICATE('A',4000) as MyConcatenation INTO Results FROM sys.databases ORDER BY name ASC --Check the Length of the Column SELECT LEN(MyConcatenation) AS MyConcatenation_ColumnLength FROM Results --Check the DataLength for the Datatype created SELECT Columns.name AS [ColumnName], Columns.column_id AS [ID], Types.name AS [DataType], CAST(CASE WHEN UserTypes.name IN (N'nchar', N'nvarchar') AND Columns.max_length <> -1 THEN Columns.max_length/2 ELSE Columns.max_length END AS int) AS [Length], Columns.is_nullable AS [Nullable] FROM sys.tables AS tbl INNER JOIN sys.all_columns AS Columns ON Columns.object_id=tbl.object_id LEFT OUTER JOIN sys.types AS Types ON Types.user_type_id = Columns.user_type_id LEFT OUTER JOIN sys.types AS UserTypes ON (UserTypes.user_type_id = Columns.system_type_id and UserTypes.user_type_id = UserTypes.system_type_id) or ((UserTypes.system_type_id = Columns.system_type_id) and (UserTypes.user_type_id = Columns.user_type_id) and (UserTypes.is_user_defined = 0) and (UserTypes.is_assembly_type = 1)) WHERE tbl.name= N'Results' DROP TABLE Results
Interesting….Here is that Data type precedence I mentioned about earlier
So this time using VARCHAR we can see that Data type precedence occurred converting our two VARCHAR(4000) values to NVARCHAR(4000) and as we can see in the results this has truncated our data.
So What happens when we explicity convert the NVARCHAR(128) to VARCHAR(MAX)?
Example 4:
--Explicit VARCHAR(MAX) Conversion SELECT TOP 1 CAST(name AS VARCHAR(MAX)) + REPLICATE('A',4000) + REPLICATE('A',4000) as MyConcatenation INTO Results FROM sys.databases ORDER BY name ASC --Check the Length of the Column SELECT LEN(MyConcatenation) AS MyConcatenation_ColumnLength FROM Results --Check the DataLength for the Datatype created SELECT Columns.name AS [ColumnName], Columns.column_id AS [ID], Types.name AS [DataType], CAST(CASE WHEN UserTypes.name IN (N'nchar', N'nvarchar') AND Columns.max_length <> -1 THEN Columns.max_length/2 ELSE Columns.max_length END AS int) AS [Length], Columns.is_nullable AS [Nullable] FROM sys.tables AS tbl INNER JOIN sys.all_columns AS Columns ON Columns.object_id=tbl.object_id LEFT OUTER JOIN sys.types AS Types ON Types.user_type_id = Columns.user_type_id LEFT OUTER JOIN sys.types AS UserTypes ON (UserTypes.user_type_id = Columns.system_type_id and UserTypes.user_type_id = UserTypes.system_type_id) or ((UserTypes.system_type_id = Columns.system_type_id) and (UserTypes.user_type_id = Columns.user_type_id) and (UserTypes.is_user_defined = 0) and (UserTypes.is_assembly_type = 1)) WHERE tbl.name= N'Results' DROP TABLE Results
Total control – 8018 characters and the target datatype and length is correct.
One thing I did find quite frustrating was that I couldn’t see these Implicit conversions in any of the execution plans, the only way I would see them occur was by running the results into a table and then reading out the metadata from the table.
There are also a situation where I find that the above mentioned examples do not occur which I find very odd , here is an example where Data type Precedence occurs BUT truncation does not:
DECLARE @String VARCHAR(MAX) = REPLICATE('A',4000) DECLARE @Name NVARCHAR(128) = 'SQLUNDERCOVER' SELECT @String + @Name AS MyConcatenation INTO Results; SELECT LEN(MyConcatenation) AS MyConcatenation_ColumnLength FROM Results; --Check the DataLength for the Datatype created SELECT Columns.name AS [ColumnName], Columns.column_id AS [ID], Types.name AS [DataType], CAST(CASE WHEN UserTypes.name IN (N'nchar', N'nvarchar') AND Columns.max_length <> -1 THEN Columns.max_length/2 ELSE Columns.max_length END AS int) AS [Length], Columns.is_nullable AS [Nullable] FROM sys.tables AS tbl INNER JOIN sys.all_columns AS Columns ON Columns.object_id=tbl.object_id LEFT OUTER JOIN sys.types AS Types ON Types.user_type_id = Columns.user_type_id LEFT OUTER JOIN sys.types AS UserTypes ON (UserTypes.user_type_id = Columns.system_type_id and UserTypes.user_type_id = UserTypes.system_type_id) or ((UserTypes.system_type_id = Columns.system_type_id) and (UserTypes.user_type_id = Columns.user_type_id) and (UserTypes.is_user_defined = 0) and (UserTypes.is_assembly_type = 1)) WHERE tbl.name= N'Results' DROP TABLE Results
How Bizzare – this Time NVARCHAR has taken Precendence and has a data length of MAX so truncation did not occur! very odd.
Thanks for reading