The other day, I was answering a question on the ASK.SQLSERVERCENTRAL.COM forum from a programmer who had hit a
problem where he thought that LTRIM and RTRIM had failed to work. Although, I wasn’t entirely certain, I felt sure it
was because of the 'Nul' characters that he admitted had gotten into his VARCHAR column values. The ASCII 'Nul' character,
'control At' (^@) is, in SQL, CHAR(0). It should not to be confused with SQL’s NULL, meaning unknown
LTRIM and RTRIM consider a 'Nul' character to be a ‘non-space’ character, and will quit trimming at that point, whereas it
is represented by the result pane of the Query Window of SSMS as a space. The same is true of any other character that
is represented as spaces between words but isn't a space character, such as the non-breaking space Char(160). This can
cause hours of frustration to a developer trying to track down a bug. LTRIM and RTRIM are pretty-well broken as they
don't work like the TRIM functions of more enlightened languages.
Firstly, here are some examples of LTRIM and RTRIM doing things that are right and wrong.
SELECT paddedWord, '{' + LTRIM(RTRIM(paddedWord)) + '}' AS trimmed,
'{' + LTRIM(RTRIM(CHAR(0) + paddedWord + CHAR(0))) + '}' AS with_Nuls
FROM (SELECT CHAR(9) + ' leading-tab ' AS paddedWord
UNION ALL
SELECT ' trimmable '
UNION ALL
SELECT CHAR(160) + ' un-trimmable ' + CHAR(160)) x
/*
paddedWord trimmed with_Nuls
--------------------- ----------------------- -------------------------
leading-tab { leading-tab} { leading-tab }
trimmable {trimmable} { trimmable }
un-trimmable { un-trimmable } { un-trimmable }
So, we can see that LTRIM and RTRIM don't even conform to the marginally useful C tradition of counting space, tab, line feed, and carriage return characters as trimmable whitespace. WhiteSpace should be language-specific and will include many non-word characters. (some languages have a very different ideas as to which of the Unicode characters represent part of a word, and what represents whitespace). One of the first things a grey-muzzle programmer does when creating a database is to create a ‘trim’ function that works reasonably. So what shoud this be? There is quite a problem with TRIM, especially if one wants a function that will work for a variety of languages in Unicode, simply because the
TSQL string functions can't reliably handle the NCHAR(0) character. However, if we just stick to the European languages and the VARCHARwe're fairly safe
IF OBJECT_ID(N'Trim') IS NOT NULL
DROP FUNCTION Trim
GO
CREATE FUNCTION Trim
/**
summary: >
This procedure returns a string with all leading and trailing blank space removed. It is similar to the TRIM functions in most current computer languages. You can change the value of the string assigned to @BlankRange, which is then used by the PATINDEX function. The string can be a rangee.g. a-g or a list of characters such as abcdefg.
Author: Phil Factor
Revision: 1.1 changed list of control character to neater range.
date: 28 Jan 2011
example:
- code: dbo.Trim(' 678ABC ')
- code: dbo.Trim(' This has leading and trailing spaces ')
- code: dbo.Trim(' left-Trim This')
- code: dbo.Trim('Right-Trim This ')
returns: >
Input string without trailing or leading blank characters, however these characters are defined in @BlankRange
**/ (@String VARCHAR(MAX))
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE @BlankRange CHAR(255),
@FirstNonBlank INT,
@LastNonBlank INT
IF @String IS NULL
RETURN NULL--filter out null strings
SELECT @BlankRange = CHAR(0) + '- ' + CHAR(160)
/* here is where you set your definition of what constitutes a blank character. We've just chosen every 'control' character, the space character and the non-breaking space. Your requirements could be different!*/
SELECT @FirstNonBlank = PATINDEX('%[^' + @BlankRange + ']%', @String collate SQL_Latin1_General_CP850_Bin)
SELECT @lastNonBlank = 1 + LEN(@String + '|') - (PATINDEX('%[^' + @BlankRange + ']%',
REVERSE(@String) collate SQL_Latin1_General_CP850_Bin))
IF @FirstNonBlank > 0
RETURN SUBSTRING(@String,@FirstNonBlank, @LastNonBlank-@firstNonBlank)
RETURN '' --nothing would be left
END
GO