May 25, 2010 at 8:24 am
malleswarareddy_m (5/25/2010)
...select string from #t where string COLLATE CZECH_CI_AS like '_'
the above you have given returns gives
CH
Ch
ch
¢h
four rows.
...
I was wondering why my SQL Query Analyzer output apparently listed "ch" twice -- thanks for listing the accurate output (in which the fourth row is not "ch"). Now try SELECT DISTINCTing the column 😉 -- the #rows should tell us precisely how many LIKE '_' characters actually exist between 'A' and 'Z' :-).
May 25, 2010 at 8:48 am
Michael Poppers (5/25/2010)
malleswarareddy_m (5/25/2010)
...select string from #t where string COLLATE CZECH_CI_AS like '_'
the above you have given returns gives
CH
Ch
ch
¢h
four rows.
...
I was wondering why my SQL Query Analyzer output apparently listed "ch" twice -- thanks for listing the accurate output (in which the fourth row is not "ch"). Now try SELECT DISTINCTing the column 😉 -- the #rows should tell us precisely how many LIKE '_' characters actually exist between 'A' and 'Z' :-).
Your idea with DISTINCT is great. It makes the things little bit more funny with case-insensitive collations.
May 25, 2010 at 8:55 am
Nice one, and you got me with that question. I absolutely forgot about "ch" and went with the popular "0" answer :crazy:
So-called "letter" ch can create some confusion because of the special alphabetical order it has. Anyone expects words beginning with "ch..." under the letter C in a dictionary/encyclopedia - but if it is a Czech one, you'll find them under a separate heading "CH", not even close to "C" (it is G, H, Ch, I...). IMHO this should be addressed in some future release of Czech orthography and we should declare, that "CH" is in fact two letters and standard ordering rules would apply from that time (in fact, it is long overdue... but I'm afraid no one will bother with it and we will be stuck with this "letter" forever).
May 25, 2010 at 9:03 am
Vladan (5/25/2010)
Nice one, and you got me with that question. I absolutely forgot about "ch" and went with the popular "0" answer :crazy:So-called "letter" ch can create some confusion because of the special alphabetical order it has. Anyone expects words beginning with "ch..." under the letter C in a dictionary/encyclopedia - but if it is a Czech one, you'll find them under a separate heading "CH", not even close to "C" (it is G, H, Ch, I...). IMHO this should be addressed in some future release of Czech orthography and we should declare, that "CH" is in fact two letters and standard ordering rules would apply from that time (in fact, it is long overdue... but I'm afraid no one will bother with it and we will be stuck with this "letter" forever).
I don't know our linguists (ÚPJC) but this will be very hard to do. Orthographic rules are very rigid and cannot be changed so easy. I remember last great changes in czech orthography or several years ago changes Germans made. It was always painful.
May 25, 2010 at 9:15 am
Thanks for the question. I bombed this one, I should of looked at the category or went with my gut, everything always "depends".
---------------------------------------------------------------------
Use Full Links:
KB Article from Microsoft on how to ask a question on a Forum
May 25, 2010 at 10:36 am
Great question
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
May 25, 2010 at 11:05 am
I think of instead of 'It Depends' the last choice should have been collation dependent. By itself 'It depends' is not much of an answer.
-- Mark D Powell --
May 25, 2010 at 1:07 pm
honza.mf (5/25/2010)
Michael Poppers (5/25/2010)
malleswarareddy_m (5/25/2010)
...select string from #t where string COLLATE CZECH_CI_AS like '_'
the above you have given returns gives
CH
Ch
ch
¢h
four rows.
...
I was wondering why my SQL Query Analyzer output apparently listed "ch" twice -- thanks for listing the accurate output (in which the fourth row is not "ch"). Now try SELECT DISTINCTing the column 😉 -- the #rows should tell us precisely how many LIKE '_' characters actually exist between 'A' and 'Z' :-).
Your idea with DISTINCT is great. It makes the things little bit more funny with case-insensitive collations.
If all characters were inserted into DB as upper case how come query is returning lower case?
May 25, 2010 at 2:23 pm
jlennartz (5/25/2010)
If all characters were inserted into DB as upper case how come query is returning lower case?
It's another aspect of this question not mentioned yet: All characters inserted into are not guaranted to be uppercase!
You feel they are uppercase due to "while @a2 <= 'Z'" condition. You must evaluate this condition in appropriate collation too. Most of us use case-insensitive collations and in such collations lowercase letters also fit this condition. And it si true for the special characters between 'Z' and 'a' in ASCII table like '[', '\', ']' - it's not obvious but for me it's clear. There can be even more characters that fit, it's collation dependant.
If someone uses case sensitive collation, he will have all characters in uppercase.
May 25, 2010 at 2:40 pm
honza.mf (5/25/2010)
If someone uses case sensitive collation, he will have all characters in uppercase.
Sorry, ignore this reply and see my next reply :-). Thanks.
May 25, 2010 at 2:43 pm
honza.mf (5/25/2010)
If someone uses case sensitive collation, he will have all characters in uppercase.
Sounds reasonable, but use COLLATE Czech_CS_AS (instead of the case-insensitive collation) -- SELECT DISTINCT will return three rows (instead of one), and one of the result rows appears (to my non-Czech eyes) to not be what in the Roman alphabet would be called "uppercase"!
May 25, 2010 at 2:47 pm
honza.mf (5/25/2010)
jlennartz (5/25/2010)
If all characters were inserted into DB as upper case how come query is returning lower case?It's another aspect of this question not mentioned yet: All characters inserted into are not guaranted to be uppercase!
You feel they are uppercase due to "while @a2 <= 'Z'" condition. You must evaluate this condition in appropriate collation too. Most of us use case-insensitive collations and in such collations lowercase letters also fit this condition. And it si true for the special characters between 'Z' and 'a' in ASCII table like '[', '\', ']' - it's not obvious but for me it's clear. There can be even more characters that fit, it's collation dependant.
If someone uses case sensitive collation, he will have all characters in uppercase.
Thank you for the answer it make sense, now. You won't get A-Z but whatever the Check equivqlent of their ascii number is.
Jerry
May 25, 2010 at 2:59 pm
honza.mf (5/25/2010)
...I don't know our linguists (ÚPJC) but this will be very hard to do. Orthographic rules are very rigid and cannot be changed so easy. I remember last great changes in czech orthography or several years ago changes Germans made. It was always painful.
First of all: very nice question!! I just "lost" one point, but learned something new. I was aware of the collation issue but I thought 'A' to 'Z' didn't make room for it.
Regarding our orthography changes (I'm German): at least it allows me to write the way I'd like to: most probably there was/is/will_be a time and place where the spelling I use was/is/will_be valid 🙂
But it definitely makes it really hard to help your kids through (school) homework... Ooops, getting off topic. Sorry. Again: Nice QOTD!
May 26, 2010 at 12:03 am
Michael Poppers (5/25/2010)
honza.mf (5/25/2010)
If someone uses case sensitive collation, he will have all characters in uppercase.Sounds reasonable, but use COLLATE Czech_CS_AS (instead of the case-insensitive collation) -- SELECT DISTINCT will return three rows (instead of one), and one of the result rows appears (to my non-Czech eyes) to not be what in the Roman alphabet would be called "uppercase"!
Sorry, it was my mistake as I haven't checked it. Case sensitivity of a collation is used probably in equality comparisons only. Loops in example are controled by less or equal operator, and it says 'a' <= 'Z'
May 26, 2010 at 10:46 am
honza.mf (5/25/2010)
jlennartz (5/25/2010)
If all characters were inserted into DB as upper case how come query is returning lower case?It's another aspect of this question not mentioned yet: All characters inserted into are not guaranted to be uppercase!
You feel they are uppercase due to "while @a2 <= 'Z'" condition. You must evaluate this condition in appropriate collation too. Most of us use case-insensitive collations and in such collations lowercase letters also fit this condition. And it si true for the special characters between 'Z' and 'a' in ASCII table like '[', '\', ']' - it's not obvious but for me it's clear. There can be even more characters that fit, it's collation dependant.
If someone uses case sensitive collation, he will have all characters in uppercase.
Quite correct. The actual number of rows in #t with a default installation is 20449. To get the 26 by 26 result requireswhile ASCII(@a1) <= ASCII('Z') begin
, etc.
This QotD turned out to be much more interesting than I initially thought it would.
Thanks for the lesson!
Viewing 15 posts - 31 through 45 (of 55 total)
You must be logged in to reply to this topic. Login to reply