December 30, 2008 at 5:27 pm
I have a varchar column using the Latin1_General_BIN collation so that it sorts in ASCII order. When I use non-printable characters in the string and sort, it sorts as I would expect on a character by character basis (lower ASCII values before higher ASCII values), but it isn't sorting by string length as I would expect. Here's an example that illustrates:CREATE TABLE A (pk int, columnA varchar(20) COLLATE Latin1_General_BIN)
INSERT INTO A VALUES (1, char(1))
INSERT INTO A VALUES (2, char(1) + char(1))
INSERT INTO A VALUES (3, char(1) + char(2))
INSERT INTO A VALUES (4, char(1) + char(2) + char(1))
INSERT INTO A VALUES (5, char(2))
INSERT INTO A VALUES (6, char(3))
INSERT INTO A VALUES (7, 'a')
INSERT INTO A VALUES (8, 'ab')
SELECT * FROM A ORDER BY ColumnA ASC
I expected the result to be in numerical order, but it was not. The result I got (by pk) was 2,4,3,1,5,6,7,8. It did sort the first 4 together (all started with char(1), but within that group, the longer strings were placed before the shorter strings, with the exception of the printable characters at the end. So it seems to be just for the non-printable characters.
Here's maybe an easier way to see the result:char(1) + char(1) --Expected this to come before the next two, but not before char(1) by itself
char(1) + char(2) + char(1) --Expected this to come after the next value, not before
char(1) + char(2)
char(1)
char(2)
char(3)
a
ab
I remember reading somewhere about how timestamp values were sorted funny, and I wonder if this is related. Since I haven't found that article again, can anyone help me understand why the sorting is different?
BTW - I've tried with a couple other collations and they end up with the same result, so I don't think it is collation specific...
Thanks,
Chad
December 30, 2008 at 6:23 pm
Very interesting. I found out that if you change data type to nvarchar or use newer collation like Latin1_General_CI_AS, everything works OK.
The BOL says in Using Binary Collations: "Older SQL Server binary collations compared the first character as WCHAR, followed by a byte-by-byte comparison. " It is not clear to me if this sentence pertains to sorting of unicode data or any data and what impact could it have. In any case, I would expect order 1, 2, 3, 4, 5, 6, 7, 8
...and your only reply is slàinte mhath
December 31, 2008 at 10:08 am
Maybe that's what I need to do - keep looking at different collations until I find one that works how I want it to. Latin1_General_CI_AS does take care of the long/short string issue, but it doesn't sort characters in their ascii value order. It goes from 1-8 then jumps to 14-31, and 9-13 comes after 160 (I'm speaking of the ascii values for the characters here).
Thanks,
Chad
December 31, 2008 at 1:38 pm
I've found that converting the varchar to a varbinary in the orderby works. Since I don't need the order very often, I think that will be a satisfactory (but not optimal) solution.
Chad
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply