Why ''Æ'' equals ''AE'' ?

Question

Post reply

Why ''Æ'' equals ''AE'' ?

Carl B.

SSCertifiable

Points: 7931
More actions
May 11, 2005 at 12:37 pm

#67331

Hello Everyone,
Each and every day we learn something new...
We just noted that issuing those statement return the same record:
select dary_id from dary where dary_id = 'SAET'
select dary_id from dary where dary_id = 'SÆT'
The record that contains 'SAET'.
Our database collation is: Latin1_General_CS_AS
For me its a little bit surprising... why this behavior since Æ is a distinc character in the 1252 codepage?
Also is this behavior avoidable?
Thank's a lot,
Carl

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

bersileus sacamay-126226 SSChasing Mays Points: 642 More actions · Answer 1

hi carl,

>> Also is this behavior avoidable?

yes, it all has to do with Collation, you must be aware

of your data and what type of restrictions you want

performed when using the Sequel statement.

here is an example:

--DROP TABLE dary

CREATE TABLE [dary] (

[dary_id] [char] (10) COLLATE Latin1_General_CS_AS NULL

) ON [PRIMARY]

GO

INSERT INTO [dary] VALUES('SAET')

INSERT INTO [dary] VALUES('SÆT')

GO

--without using collation

select dary_id from dary where dary_id = 'SAET'

select dary_id from dary where dary_id = 'SÆT'

--resultset will return 2 rows for both query

--with using collation that focuses on kanatype-sensitive

select dary_id from dary where dary_id COLLATE Albanian_CS_AS_KS = 'SAET'

select dary_id from dary where dary_id COLLATE Albanian_CS_AS_KS = 'SÆT'

select dary_id from dary where dary_id COLLATE Latin1_General_BIN='SAET'

select dary_id from dary where dary_id COLLATE Latin1_General_BIN= 'SÆT'

--resultset will return only 1 row for each query

hope this helps

below is the collation name and description on how it treats

data while performing sql comparision:

--------------------------------

/*

SELECT *

FROM ::fn_helpcollations()

WHERE NAME = 'Albanian_CS_AS_KS'

--Albanian_CS_AS_KS

--Albanian,

--case-sensitive,

--accent-sensitive,

--kanatype-sensitive,

--width-insensitive

SELECT *

FROM ::fn_helpcollations()

WHERE NAME = 'Latin1_General_CS_AS'

--Latin1-General,

--case-sensitive,

--accent-sensitive,

--kanatype-insensitive,

--width-insensitive

*/

-------------------------------------------------

Carl B. SSCertifiable Points: 7931 More actions · Answer 2

Hello Bersileus,

Thanks for that info.

I never tougth kantype sensitiveness would have something to do with this behavior.

Look at what it is said in BOL about that:

"Specifies that SQL Server distinguish between the two types of Japanese kana characters: Hiragana and Katakana.

If not selected, SQL Server considers Hiragana and Katakana characters to be equal."

Far away from what is my problem now...

Ok starting from that why using the collation LATIN_GENERAL_CS_AS_KS does not works (does not distinguihs 'Æ' from 'AE')?

Regards,

Carl

Carl B. SSCertifiable Points: 7931 More actions · Answer 3

Hello Bersileus,

think it miss me something. There is something I don't understand.

How do you know which collation are "focussed" on kanatype sensitiveness?

I would have first tougth it is just related to the fact of using KS or not in the collation designator but it is clearly not the case.

DANISH_NORWEGIAN_CS_AS distinguish 'Æ' from 'AE' the same way as DANISH_NORWEGIAN_CS_AS_KS does.

Can you shed some ligth on this for me?

Thank's a lot,

Carl

bersileus sacamay-126226 SSChasing Mays Points: 642 More actions · Answer 4

hi carl,

references:

(nice reference)

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsqldev/html/sqldev_06112004.asp

(examples using collation - is an MVP thanks to umachandar)

http://www.umachandar.com/technical/SQL2000Scripts/Main10.htm

yes, you are right about the Danish Collation, collation (in a highlevel view) is kind of a language vocabulary (speech modex comparision), for each region/country. I have never been to denmark but i guessing that in their language Æ is a common letter or word to them. Each region treats the alphabet & characters differently (sounds the same, spelled different, etc..) - hence the use of case,accent,kanatype and width, using collation you can "Write Language-Portable Transact-SQL". Try the french it does not matter what collation you pick except for the French_BIN it always returns 2 resultset. (as a footnote the <country>_BIN always does comparative weight on each individual characters). (btw you should also have some knowledge on how the country treats its characters, to really understand things better - since SQL Server does this for us and is transparent - totally cool mechanism on characters weights). Collation is a rule on how characters are to be weighted against each other. See below definition, this extract was taken from BOL(using the reference above)

NOTE:

Whether SQL Server 2000 considers a character less than or greater than another character is collation-dependent: a collation defines the bit patterns that represent each character in a character string and the rules controlling

how characters are sorted and compared.

hope that sheds some light on the matter

Carl B. SSCertifiable Points: 7931 More actions · Answer 5

Thank's Barsileus,

The concept of collation is not the problem here the problem is that since SQL Server 2000 was deliver it was written in BOL:

About Kana-sensitivenes:

"Specifies that SQL Server distinguish between the two types of Japanese kana characters: Hiragana and Katakana.

If not selected, SQL Server considers Hiragana and Katakana characters to be equal."

And in this definition, there is nothing to do with the problem we just got .

Now what you said help me learn a little bit more about this "strange" behavior .

Best regards,

Carl

sara karasik SSCommitted Points: 1593 More actions · Answer 6

The collation sets the equality of characters.

If you must absolute distinguish between characters that SQL considers equal:

in SQL where A = a is true,

but

where convert(varbinary,'A') = select convert(varbinary,'a') is false

compare their binary values.

select convert(varbinary,'A')

select convert(varbinary,'a')

This gives different results.