September 12, 2012 at 6:02 am
Alex Fekken (9/12/2012)
I don't get it either: so 'feet' would come after 'foot'?
That puzzled me as well, so I did some more testing and found that
"final" comes before "initial"
"eleven" comes before "ten"
"lots and lots" comes before "not many at all"
and
"bags" comes before "nowt"
Should this be reported to Microsft as a bug?
September 12, 2012 at 6:10 am
vk-kirov (9/12/2012)
Well, I should note that nouns ending with the letter S are considered plural in English. But here we deal with a Latin collation, and the Romans had their Latin plural nouns ending with 'ae', 'i', 'ia', 'es' and other stuff :hehe:
If we substitute 'peppers' by 'pepperl', that 'pepperl' still will be in the end of the list... So I don't think that plural is the explanation.
Latin1_General_CS_AS is Windows collation so it is follows the same word sort rules as a normal Windows string compare.
Check the property of this collation by executing the below sql
SELECT
description
,COLLATIONPROPERTY(name, 'CodePage') AS CodePage
,COLLATIONPROPERTY(name, 'LCID') AS LCID
FROM fn_helpcollations()
WHERE name = N'Latin1_General_CS_AS';
This query shows that the collation is case-sensitive, accent-sensitive, and the code page 1252 and LCID 1033, if you have locale as 1033, then default Windows string comparision rules will provide the same behavior as the database engine like any other in plain english.
The interpretation of the word "latin" does not indicated the real latin language here. 🙂 and the code page 1252 is like most used common code page in general everywhere. So I also added the soundex code to show how the value changes when you add "s" at the end where the consideration of the sort will change.
ww; Raghu
--
The first and the hardest SQL statement I have wrote- "select * from customers" - and I was happy and felt smart.
September 12, 2012 at 6:21 am
Toreador (9/12/2012)
Alex Fekken (9/12/2012)
I don't get it either: so 'feet' would come after 'foot'?That puzzled me as well, so I did some more testing and found that
"final" comes before "initial"
"eleven" comes before "ten"
"lots and lots" comes before "not many at all"
and
"bags" comes before "nowt"
Should this be reported to Microsft as a bug?
SELECT SOUNDEX ('feet')
RESULTS: F300
SELECT SOUNDEX ('foot')
RESULTS: F300
SELECT SOUNDEX ('ball')
RESULTS: B400
SELECT SOUNDEX ('balls')
RESULTS: B420
for the foot and feet even though it is plural the soundex is same, but check on the ball and balls, where the soundex differs so the the plural takes the position after ball.
Like I earlier said "to my knowledge...." , sorting mainly happens on the word and its weight, so may be it takes other stuff also in to the consideration.
(hang on, some expert will come and will answer us with correctly )
EDIT: my saying refers to the char "s" which was added in the end, but not sure of the sentence sorting like what you have mentioned.
ww; Raghu
--
The first and the hardest SQL statement I have wrote- "select * from customers" - and I was happy and felt smart.
September 12, 2012 at 6:32 am
SQL Kiwi (9/11/2012)
Avocado 😀
Doh! Don't tell my kids, I've been tormenting them with the word for months. (Thank you!)
Toreador (9/12/2012)
None of the options gave the expected order, so it was obviously a question about the Latin1_General collation rather than case sensitivity.
The family of Latin1_General collation tend to be the ones I see as default. (Yes, a US bias, and I didn't play with the collations from other languages because the languages I know beyond English tend only to be used for programming.) Honestly I started playing with them a little after reading "Stairway to T-SQL DML Level 6: The Basics of Sorting Data Using the ORDER BY Clause[/url]". There were a couple surprises I found, apparently most critical to me was that what I always assumed was a 'correct' sort order was what SQL calls the binary sort order. After poking this bear for a while, I thought, N'I wonder if everyone else already knows this. I could imagine someone writing this up as a QotD, I wonder if I could.' 😉
vk-kirov (9/12/2012)
I don't understand why peppers are in the end of the list. Shouldn't they be between pepper and Pepper? 🙂
My thought exactly. My interpretation was that it considers peppers as a different word for purposes of dictionary sort order. I wish I could have found a better reference to include in the explanation. I think Raghavendra Mudugal is now providing a better explanation than I could possibly hope for. THANKS!
September 12, 2012 at 6:54 am
Thanks for the question... I never understand these collation quirks.
According to the documentation:
_CS_AS = Case-sensitive, accent-sensitive, kana-insensitive, width-insensitive
If it's case-sensitive but NOT width-sensitive, why is "peppers" at the end??
September 12, 2012 at 6:56 am
For my understanding:
1. It sorts the words regardless of their case one letter at a time. A shorter word will come before a longer one because spaces come before letters and numbers (and numbers before letters). eg.
p = p
e = e
p = p
p = p
e = e
r = r
s = ' '
2. It then sorts same words by case, sorting lower case first and then upper case.
September 12, 2012 at 6:58 am
sestell1 (9/12/2012)
Thanks for the question... I never understand these collation quirks.According to the documentation:
_CS_AS = Case-sensitive, accent-sensitive, kana-insensitive, width-insensitive
If it's case-sensitive but NOT width-sensitive, why is "peppers" at the end??
Width-sensitive will compare characters that are "the same" but will use one or two bytes depending on their definition.
It's not width sensitive of the strings but for the characters.
From BOL:
Width-sensitive (_WS)
Distinguishes between a single-byte character and the same character when represented as a double-byte character.
September 12, 2012 at 7:19 am
Luis Cazares (9/12/2012)Width-sensitive will compare characters that are "the same" but will use one or two bytes depending on their definition.
It's not width sensitive of the strings but for the characters.
Doh! Thanks Luis!!
September 12, 2012 at 7:31 am
September 12, 2012 at 7:45 am
Guacamole
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
September 12, 2012 at 9:21 am
Nice question. I almost never deal with case sensitivity but this is good to know anyways.
This talk of avocados and guacamole is making me hungry. 🙂
September 12, 2012 at 12:30 pm
Really good one - had me scratch my head. Thanks!
September 12, 2012 at 2:21 pm
Taking a step back, why should the sort order depend on either SOUNDEX (especially when case sensitive) or gramnatical interpretation at all?
That sounds like total stupidity to me.
// Corrected "atupid" typos
September 12, 2012 at 3:04 pm
I'm also a bit confused why peppers is at the end. If lowercase p is before uppercase p, which I knew it would be, why was it at the end? And the, I hope joking, about soundex and plurals is not the reason.
try this:
VALUES ('Asdfgh'),('ASDFGH'),('asdfgh'),('a3df3r'),('asdfghi');
which essentially takes the meaning of the word out of the picture. "asdfghi" still sorts at the end... even after the uppercase "a" words.
September 12, 2012 at 3:26 pm
emiddlebrooks (9/12/2012)
I'm also a bit confused why peppers is at the end. If lowercase p is before uppercase p, which I knew it would be, why was it at the end? And the, I hope joking, about soundex and plurals is not the reason.try this:
VALUES ('Asdfgh'),('ASDFGH'),('asdfgh'),('a3df3r'),('asdfghi');
which essentially takes the meaning of the word out of the picture. "asdfghi" still sorts at the end... even after the uppercase "a" words.
It is as if a case-insensitive sort is carried out first and then the collation order is used only to break the ties in the case-insensitive sort order.
Certainly not what I would have expected. Brilliant question....
Viewing 15 posts - 16 through 30 (of 45 total)
You must be logged in to reply to this topic. Login to reply