October 3, 2011 at 10:47 am
I have an interesting issue that I am hoping someone could explain or otherwise help me diagnose. Suppose I have the following:
SELECT [UnicodeLower] = REPLACE(N'aerostar Inc',NCHAR(230),N'ae(lower)'),
[ASCIILower] = REPLACE('aerostar Inc',CHAR(230),'ae(lower)'),
[UnicodeUpper] = REPLACE(N'AEROSTAR INC',NCHAR(230),N'AE(upper)'),
[ASCIIUpper] = REPLACE('AEROSTAR INC',CHAR(230),'AE(upper)'),
[UnicodeMixed] = REPLACE(N'Aerostar Inc',NCHAR(230),N'Ae(mixed)'),
[ASCIIMixed] = REPLACE('Aerostar Inc',CHAR(230),'Ae(mixed)')
And receive the following results:
UnicodeLower,ASCIILower,UnicodeUpper,ASCIIUpper,UnicodeMixed,ASCIIMixed
ae(lower)rostar Inc,aerostar Inc,AE(upper)ROSTAR INC,AEROSTAR INC,Ae(mixed)rostar Inc,Aerostar Inc
Why am I getting different results, or I should say, why is the Unicode version replacing the lower case "ae" when that is NOT what nchar(230) is? The ASCII version works as expected. I should add that whether it is NCHAR/CHAR(230) [lowercase æ] or NCHAR/CHAR(198) [upper case Æ], I get similar results, even if I add a case-sensitive collation in the REPLACE function.
October 3, 2011 at 11:53 am
Matthew Lehn (10/3/2011)
I have an interesting issue that I am hoping someone could explain or otherwise help me diagnose. Suppose I have the following:
SELECT [UnicodeLower] = REPLACE(N'aerostar Inc',NCHAR(230),N'ae(lower)'),
[ASCIILower] = REPLACE('aerostar Inc',CHAR(230),'ae(lower)'),
[UnicodeUpper] = REPLACE(N'AEROSTAR INC',NCHAR(230),N'AE(upper)'),
[ASCIIUpper] = REPLACE('AEROSTAR INC',CHAR(230),'AE(upper)'),
[UnicodeMixed] = REPLACE(N'Aerostar Inc',NCHAR(230),N'Ae(mixed)'),
[ASCIIMixed] = REPLACE('Aerostar Inc',CHAR(230),'Ae(mixed)')
And receive the following results:
UnicodeLower,ASCIILower,UnicodeUpper,ASCIIUpper,UnicodeMixed,ASCIIMixed
ae(lower)rostar Inc,aerostar Inc,AE(upper)ROSTAR INC,AEROSTAR INC,Ae(mixed)rostar Inc,Aerostar Inc
Why am I getting different results, or I should say, why is the Unicode version replacing the lower case "ae" when that is NOT what nchar(230) is? The ASCII version works as expected. I should add that whether it is NCHAR/CHAR(230) [lowercase æ] or NCHAR/CHAR(198) [upper case Æ], I get similar results, even if I add a case-sensitive collation in the REPLACE function.
I could be off here, but I am pretty sure that the database collation has to be turned on to case sensitive for you to see it in the replace, otherwise sql server is case insensitive.
October 3, 2011 at 12:11 pm
You can actually put a collation into the replace function, i.e.
SELECT test = REPLACE('Lock' COLLATE SQL_Latin1_General_CP1_CS_AS,'l','kn')
which in this example will return "Lock" since I set it to case-sensitive. This actually overwrites any collation set in the database itself.
The problem is not with collation per se, but rather between the differences in running the REPLACE function on a Unicode (N[VAR]CHAR) compared to an ASCII/ANSI ([VAR]CHAR) string.
October 3, 2011 at 12:30 pm
Matthew Lehn (10/3/2011)
You can actually put a collation into the replace function, i.e.
SELECT test = REPLACE('Lock' COLLATE SQL_Latin1_General_CP1_CS_AS,'l','kn')
which in this example will return "Lock" since I set it to case-sensitive. This actually overwrites any collation set in the database itself.
The problem is not with collation per se, but rather between the differences in running the REPLACE function on a Unicode (N[VAR]CHAR) compared to an ASCII/ANSI ([VAR]CHAR) string.
It appears to me to be perhaps a bug in sql server. I tried this test:
declare @varchar varchar(100)
set @varchar = 'aerostar Inc '+char(230)
select Charindex(char(230),@varchar)
charindex returns postion 14 as expected
declare @nvarchar nvarchar(100)
set @nvarchar = 'aerostar Inc '+nchar(230)
select Charindex(char(230),@nvarchar)
charindex returns position 1, which is not expected.
Seems like a bug to me.
October 4, 2011 at 9:01 am
Sure does seem like it's a bug. "æ" is NOT the same as "ae" nor is "Æ" the same as "AE". Why SQL Server thinks so when using Unicode is a mystery.
October 9, 2011 at 2:05 pm
I cannot tell you what is going on here but I ran into the same problem and was able to fix it using this collation: Danish_Norwegian_CS_AS. As soon as I adjusted the column to that collation and the column was set to nvarchar(50), SQL Server was differentiating between ascii 230 and "ae". Hope it helps.
October 9, 2011 at 6:03 pm
Something else you need to be aware of that some of us found out while testing code in one of my articles... certain collations will make your server run horribly slow compared to other collations. IIRC, there's a CONNECT item on the problem. I've not actually looked for that CONNECT item because I use the current default collation that SQL Server ships with and that collation doesn't have the problem. To the best of my knowledge, there isn't a publicly available list of collations that cause the performance problem.
Bottom line is, setup at least one machine with the default collation in place so you can test any machine that you change the system-wide collation on against.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply