August 9, 2010 at 5:01 pm
I'm trying to use the Latin1_General_BIN2 collation on a varchar column to ensure results are sorted via their ASCII value, and not by some dictionary value, but it doesn't seem to be working as I expected:create table a (b varchar(10) COLLATE Latin1_General_BIN2)
insert into a values (char(1) + char(124))
insert into a values (char(1) + char(124) + char(1)+ char(124))
insert into a values (char(1) + char(124) + char(2)+ char(124))
insert into a values (char(70) + char(124))
insert into a values (char(70) + char(124) + char(70)+ char(124))
insert into a values (char(70) + char(124) + char(71)+ char(124))
select b from a order by b
Pardon the bad table names...
I would expect the values to be sorted by value and length and the order to end up the same as they were inserted (i.e. I thought I inserted them in order), but it doesn't seem to work that way - the beginning set of 3 is sorted in the opposite direction that I anticipated - longest before shortest. Does anyone have insight into why this might be?
Thanks,
Chad
August 10, 2010 at 4:35 pm
If no one has any ideas, does the result at least look as odd to you as it did to me? Or am I just crazy and missing something simple?
Thanks,
Chad
August 10, 2010 at 5:17 pm
Just to add to the confusion, this works the way I think you want:
select b from a order by cast(b as varbinary)
but I don't have a good explanation as to why.
August 11, 2010 at 12:11 pm
A good observation. I thought about going with a varbinary field, but hesitated due to concerns about readability (to check what was in the field) and sorting, but maybe that is the way to go. I won't have much readability with a ton of non-printable characters anyway.
Thanks,
Chad
August 12, 2010 at 2:21 am
The reason for the strange sort is that SQL server pads all strings with spaces (ASCII 32) to a common length before comparing.
For normal printable strings this means that shorter strings will sort before longer, but for strings with unprintable characters (below 32) the longer strings will sort first.
Once you know about this it is usually not a big problem, but if you really need a true binary sort you should use varbinary as already suggested.
August 16, 2010 at 8:40 am
Thanks Stefan, I didn't know that it padded the strings out before comparing. I don't understand how that changes the way the sort works though - wouldn't it still sort from left to right based on length with the spaces at the end? I can tell I missing something simple, and I just can't seem to catch what it is. I know when I open a file in a hex editor, sometimes the bytes are flipped in pairs, is it something like that where even though the space follows a character, it is stored in front of it?
Thanks,
Chad
August 16, 2010 at 2:55 pm
Well lets look at this in more detail then.
Your original data looks like this:
1:char(1) + char(124)
2:char(1) + char(124) + char(1)+ char(124)
3:char(1) + char(124) + char(2)+ char(124)
4:char(70) + char(124)
5:char(70) + char(124) + char(70)+ char(124)
6:char(70) + char(124) + char(71)+ char(124)
Before sorting SQL server pads all strings with spaces to a common length, like this:
1:char(1) + char(124) + char(32) + char(32)
2:char(1) + char(124) + char(1) + char(124)
3:char(1) + char(124) + char(2) + char(124)
4:char(70) + char(124) + char(32) + char(32)
5:char(70) + char(124) + char(70)+ char(124)
6:char(70) + char(124) + char(71)+ char(124)
It is now obvious that the final sorted order will be:
2:char(1) + char(124) + char(1) + char(124)
3:char(1) + char(124) + char(2) + char(124)
1:char(1) + char(124) + char(32) + char(32)
4:char(70) + char(124) + char(32) + char(32)
5:char(70) + char(124) + char(70)+ char(124)
6:char(70) + char(124) + char(71)+ char(124)
/SG
August 18, 2010 at 12:17 am
Ahhhhhhhh. Thank you Stefan. I wasn't thinking about the ASCII value of a space, even though you put that in your note. It makes perfect sense, and now I can sleep at night. Ok - it wasn't really keeping me up, but it did have me confused. Thanks again for the excellent explanation.
Chad
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply