REPLACE using CHAR vs NCHAR

Question

REPLACE using CHAR vs NCHAR

Matthew Lehn

Ten Centuries

Points: 1263
More actions
October 3, 2011 at 10:47 am

#248880

I have an interesting issue that I am hoping someone could explain or otherwise help me diagnose. Suppose I have the following:
SELECT [UnicodeLower] = REPLACE(N'aerostar Inc',NCHAR(230),N'ae(lower)'),
[ASCIILower] = REPLACE('aerostar Inc',CHAR(230),'ae(lower)'),
[UnicodeUpper] = REPLACE(N'AEROSTAR INC',NCHAR(230),N'AE(upper)'),
[ASCIIUpper] = REPLACE('AEROSTAR INC',CHAR(230),'AE(upper)'),
[UnicodeMixed] = REPLACE(N'Aerostar Inc',NCHAR(230),N'Ae(mixed)'),
[ASCIIMixed] = REPLACE('Aerostar Inc',CHAR(230),'Ae(mixed)')
And receive the following results:
UnicodeLower,ASCIILower,UnicodeUpper,ASCIIUpper,UnicodeMixed,ASCIIMixed
ae(lower)rostar Inc,aerostar Inc,AE(upper)ROSTAR INC,AEROSTAR INC,Ae(mixed)rostar Inc,Aerostar Inc
Why am I getting different results, or I should say, why is the Unicode version replacing the lower case "ae" when that is NOT what nchar(230) is? The ASCII version works as expected. I should add that whether it is NCHAR/CHAR(230) [lowercase æ] or NCHAR/CHAR(198) [upper case Æ], I get similar results, even if I add a case-sensitive collation in the REPLACE function.
[font="Tahoma"]Zycon - The Search Engine for Manufacturers[/url]. Helping Engineers and Technical Buyers Locate Aluminum Extrusions, Metal Stampers, Plastic Fabricators, Valves, and More.[/font]

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

bkubicek SSChampion Points: 10948 More actions · Answer 1

Matthew Lehn (10/3/2011)
I have an interesting issue that I am hoping someone could explain or otherwise help me diagnose. Suppose I have the following:
SELECT [UnicodeLower] = REPLACE(N'aerostar Inc',NCHAR(230),N'ae(lower)'),
[ASCIILower] = REPLACE('aerostar Inc',CHAR(230),'ae(lower)'),
[UnicodeUpper] = REPLACE(N'AEROSTAR INC',NCHAR(230),N'AE(upper)'),
[ASCIIUpper] = REPLACE('AEROSTAR INC',CHAR(230),'AE(upper)'),
[UnicodeMixed] = REPLACE(N'Aerostar Inc',NCHAR(230),N'Ae(mixed)'),
[ASCIIMixed] = REPLACE('Aerostar Inc',CHAR(230),'Ae(mixed)')
And receive the following results:
UnicodeLower,ASCIILower,UnicodeUpper,ASCIIUpper,UnicodeMixed,ASCIIMixed
ae(lower)rostar Inc,aerostar Inc,AE(upper)ROSTAR INC,AEROSTAR INC,Ae(mixed)rostar Inc,Aerostar Inc
Why am I getting different results, or I should say, why is the Unicode version replacing the lower case "ae" when that is NOT what nchar(230) is? The ASCII version works as expected. I should add that whether it is NCHAR/CHAR(230) [lowercase æ] or NCHAR/CHAR(198) [upper case Æ], I get similar results, even if I add a case-sensitive collation in the REPLACE function.

I could be off here, but I am pretty sure that the database collation has to be turned on to case sensitive for you to see it in the replace, otherwise sql server is case insensitive.

Matthew Lehn Ten Centuries Points: 1263 More actions · Answer 2

You can actually put a collation into the replace function, i.e.

SELECT test = REPLACE('Lock' COLLATE SQL_Latin1_General_CP1_CS_AS,'l','kn')

which in this example will return "Lock" since I set it to case-sensitive. This actually overwrites any collation set in the database itself.

The problem is not with collation per se, but rather between the differences in running the REPLACE function on a Unicode (N[VAR]CHAR) compared to an ASCII/ANSI ([VAR]CHAR) string.

[font="Tahoma"]Zycon - The Search Engine for Manufacturers[/url]. Helping Engineers and Technical Buyers Locate Aluminum Extrusions, Metal Stampers, Plastic Fabricators, Valves, and More.[/font]

bkubicek SSChampion Points: 10948 More actions · Answer 3

Matthew Lehn (10/3/2011)
You can actually put a collation into the replace function, i.e.
SELECT test = REPLACE('Lock' COLLATE SQL_Latin1_General_CP1_CS_AS,'l','kn')
which in this example will return "Lock" since I set it to case-sensitive. This actually overwrites any collation set in the database itself.
The problem is not with collation per se, but rather between the differences in running the REPLACE function on a Unicode (N[VAR]CHAR) compared to an ASCII/ANSI ([VAR]CHAR) string.

It appears to me to be perhaps a bug in sql server. I tried this test:

declare @varchar varchar(100)

set @varchar = 'aerostar Inc '+char(230)

select Charindex(char(230),@varchar)

charindex returns postion 14 as expected

declare @nvarchar nvarchar(100)

set @nvarchar = 'aerostar Inc '+nchar(230)

select Charindex(char(230),@nvarchar)

charindex returns position 1, which is not expected.

Seems like a bug to me.

Matthew Lehn Ten Centuries Points: 1263 More actions · Answer 4

Sure does seem like it's a bug. "æ" is NOT the same as "ae" nor is "Æ" the same as "AE". Why SQL Server thinks so when using Unicode is a mystery.

[font="Tahoma"]Zycon - The Search Engine for Manufacturers[/url]. Helping Engineers and Technical Buyers Locate Aluminum Extrusions, Metal Stampers, Plastic Fabricators, Valves, and More.[/font]

jhutchens SSC Enthusiast Points: 197 More actions · Answer 5

I cannot tell you what is going on here but I ran into the same problem and was able to fix it using this collation: Danish_Norwegian_CS_AS. As soon as I adjusted the column to that collation and the column was set to nvarchar(50), SQL Server was differentiating between ascii 230 and "ae". Hope it helps.

Jeff Moden SSC Guru Points: 1004703 More actions · Answer 6

Something else you need to be aware of that some of us found out while testing code in one of my articles... certain collations will make your server run horribly slow compared to other collations. IIRC, there's a CONNECT item on the problem. I've not actually looked for that CONNECT item because I use the current default collation that SQL Server ships with and that collation doesn't have the problem. To the best of my knowledge, there isn't a publicly available list of collations that cause the performance problem.

Bottom line is, setup at least one machine with the default collation in place so you can test any machine that you change the system-wide collation on against.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)