Demystifying the use of CASE in an ORDER BY statement

  • Comments posted to this topic are about the item Demystifying the use of CASE in an ORDER BY statement

    Chuck Hoffman
    ------------
    I'm not sure why we're here, but I am sure that while
    we're here we're supposed to help each other
    ------------

  • Hello,

    Firstly, nice article. I have seen this technique a few times and always wondered why it was employed because to my eyes it appeared brittle and sub-optimal.

    Can I ask is the reason the failure occurs for different data types because the first time the query is executed a query plan is created and cached. Then the next time, with the different data type, the query is executed using the cached plan but the CASE statement now returns a different type to what the plan was expecting, hence the error?

    Secondly, my first statement about brittleness and sub-optimal is related to plan caching. How does this work with plans that contain order by statements for one column but then the case statement says we want the order by a different column? Is the plan re-used? How does it work?

    Thanks again, I enjoyed reading your post.

  • Wait a minute! What about that "ASC," after the END of the first CASE statement, above? If that CASE statement does not produce output, then wouldn't the four characters "ASC," simply be extra characters hanging out in the breeze? Wouldn't this cause a syntax error?

    Answer: Yes they are extra characters. No they do not cause a syntax error because somehow CASE forgives this.

    The reason that works is that the case evaluates to null, so you could write

    select *

    from Table

    order by (select null) ASC

    and that is valid SQL

  • Interesting. I've allways taken the approach of creating a proc for each order, then inserting the appropriate proc result into a (temp table with an auto increment) depending on which order I need. Need to do some testing to see which consistenly works better. Each proc will have its own plan and the overhead of one table variable should not be huge.

  • *facepalm*

    Let me see if I can explain this much more simply:

    ORDER BY takes as its arguments a set of expressions, each optionally followed by a direction indicator, separated by commas.

    Essentially, any expression you put in there either is or becomes a column in the intermediate result set, which is then sorted, and the final result set is returned from that.*

    So when you say

    ORDER BY

    CASE WHEN @Flag = 1 THEN 'FIELD DESC' END

    you're creating an expression which is equal to 'FIELD DESC' when @Flag = 1, or NULL otherwise, and telling SQL Server to sort the data by that expression (ASCENDING, since that's the default.) Since that expression is constant in either case, the data appears to be "not sorted."

    and when you say

    ORDER BY

    CASE WHEN @Flag = 1 THEN FIELD END DESC

    you're creating an expression which is equal to FIELD when @Flag is 1, or NULL otherwise, and telling SQL Server to sort by that expression, descending.

    The only way in which the first example would work is if SQL Server treated variables as macros and expanded them into the SQL query string before execution, which would make them

    a) no longer variable, and

    b) no longer useful to prevent SQL injection exploits.

    So if the former seems intuitively correct to you, then you need retake some basic programming logic classes.

    * This is the logical process; actual procedure used may vary.

  • Nice article!

    Edit: Please ignore this post. One of those days...

    ---------------------------------------------------------------------------------------------

    It's also interesting to note that ordinal positions returned by a CASE statement are not implicitly resolved to field objects. This results in the CASE statement being completely ignored by the ORDER BY clause in the same ways as (SELECT NULL) DESC if you specify the sort column using its ordinal position rather than its name.

    eg:

    SELECT 2 col

    INTO dbo.ConditionalSortTest

    UNION ALL

    SELECT 1

    UNION ALL

    SELECT 3;

    SELECT col FROM dbo.ConditionalSortTest; -- numbers are out of sequence

    DECLARE @s-2 bit;

    SET @s-2 = 0; -- SORT DESCENDING

    /*

    SET @s-2 = 1; -- SORT ASCENDING

    */

    SELECT col FROM dbo.ConditionalSortTest

    ORDER BY

    CASE WHEN @s-2 = 0 THEN 1 END DESC,

    CASE WHEN @s-2 = 1 THEN 1 END ASC;

    -- numbers still out of sequence indicating no sort was performed.

    DROP TABLE dbo.ConditionalSortTest;

  • andrew.diniz (4/26/2012)


    Nice article!

    It's also interesting to note that ordinal positions returned by a CASE statement are not implicitly resolved to field objects.

    That's because a CASE statement CAN'T return ordinal positions. The case statement you wrote returned an integer value, which is by no means the same thing.

    The key here is that the CASE statement is an expression to sort by, not a macro to generate the expression to sort by.

  • Quirks are interesting things. Like the various versions of SQL reporting errors differently in your example, what concerns exist that the next version of SQL will 'break' this solution?

  • sknox (4/26/2012)


    *facepalm*

    Let me see if I can explain this much more simply:

    ORDER BY takes as its arguments a set of expressions, each optionally followed by a direction indicator, separated by commas.

    Essentially, any expression you put in there either is or becomes a column in the intermediate result set, which is then sorted, and the final result set is returned from that.*

    So when you say

    ORDER BY

    CASE WHEN @Flag = 1 THEN 'FIELD DESC' END

    you're creating an expression which is equal to 'FIELD DESC' when @Flag = 1, or NULL otherwise, and telling SQL Server to sort the data by that expression (ASCENDING, since that's the default.) Since that expression is constant in either case, the data appears to be "not sorted."

    and when you say

    ORDER BY

    CASE WHEN @Flag = 1 THEN FIELD END DESC

    you're creating an expression which is equal to FIELD when @Flag is 1, or NULL otherwise, and telling SQL Server to sort by that expression, descending.

    The only way in which the first example would work is if SQL Server treated variables as macros and expanded them into the SQL query string before execution, which would make them

    a) no longer variable, and

    b) no longer useful to prevent SQL injection exploits.

    So if the former seems intuitively correct to you, then you need retake some basic programming logic classes.

    * This is the logical process; actual procedure used may vary.

    What he said!

    The observed effects described in the article have absolutely nothing to do with misbehaving case expressions and everything to do with a misunderstanding of what order by code actually means, and not realizing that it gets this meaning at complile time.

    Whatever constants or computed values you bring to it at runtime by means of an expression ("case expression" in the article), they will not be functional the same as the same text expressed in your code at compile time. Code and data are distinct and for very good reasons! With static SQL you can not alter the sorting behavior dynamically at runtime just by manipulating the order by clause.

    You could however build a SQL statement dynamically and then execute that, then your input texts become code and is executed as such. This however poses a security risk as you remove your build in protection against SQL injection (assuming you used parameterised queries in the first place).

    If you are not using parameterised queries, meaning you have user data input that is made part of your SQL statement as code, then you are in trouble and need to get that fixed as soon as you can. With paramaterised queries the actial SQL statement (code) contains placeholders (?), where the actual typed paramaters are placed into as data by SQL Server upon execution. The data is send together with the query to the server for execution, but never mix.

  • Can dynamic sql help?

  • ukarjee (4/26/2012)


    Can dynamic sql help?

    Functional speaking, ... it can, ... but see my earlier response as to why you should NOT do that! What you can do, and problablly should, is to create a SQL statement in your application dynamically. Provided you do not just take user input and inject that into the SQL code directly either.

    As a rule of thumb, never make untrusted data part of your SQL code as it allows users to execute code you never intended! Therefore, always use paramaterised queries as then untrusted user data is never part of the code and always treated as data....as it should!

  • If you're confused by any of this, first read sknox's first post above. Understanding that makes all of the mysteries go away.

    Here's a tip if it's still not clear. CASE works just the same in an ORDER BY as it does anywhere else. So if your SQL is not sorting as you think it should, try putting the same CASE statement in the query as a column so you can see what's going on. For example, take the first problem query from the article:

    SELECT

    Employee.EmpID,

    Employee.EmpName,

    Employee.EmpDOB

    FROM Employee

    ORDER BY

    CASE

    WHEN (@sortType = 0)

    THEN @sortKey_A

    ELSE @sortKey_D

    END

    You could modify the query , like this:

    SELECT

    Employee.EmpID,

    Employee.EmpName,

    Employee.EmpDOB,

    CASE

    WHEN (@sortType = 0)

    THEN @sortKey_A

    ELSE @sortKey_D

    END AS SortCol

    FROM Employee

    ORDER BY

    CASE

    WHEN (@sortType = 0)

    THEN @sortKey_A

    ELSE @sortKey_D

    END

    Notice I didn't change the ORDER BY, I just copied the same expression to the SELECT so we can see what we're really sorting by. This gives us this:

    EmpID EmpName EmpDOB SortCol

    ----- --------- ----------------------- ----------

    1 Chandler 1953-05-03 00:00:00.000 EmpDOB ASC

    2 Enos 1992-09-02 00:00:00.000 EmpDOB ASC

    3 Baker 1946-08-10 00:00:00.000 EmpDOB ASC

    4 Davis 1980-04-01 00:00:00.000 EmpDOB ASC

    5 Franklin 1976-07-04 00:00:00.000 EmpDOB ASC

    We're effectively sorting by SortCol, which has the same value (the string 'EmpDOB ASC') on every row. If effectively sorting by SortCol isn't good enough, try this:

    SELECT

    Employee.EmpID,

    Employee.EmpName,

    Employee.EmpDOB,

    CASE

    WHEN (@sortType = 0)

    THEN @sortKey_A

    ELSE @sortKey_D

    END AS SortCol

    FROM Employee

    ORDER BY SortCol

    This is equivalent. It will give the same results. Now we're actually sorting by SortCol. As far as the sorting goes, this is the same as the original problem query, but now we have turned the "column in the intermediate result set" into an actual column in the final result set so we can see behind the curtain.

  • Peter-757102 : whilst I agree you that the observed effects are down to misunderstanding the order by clause, and not some sort of misbehaving case statement, I don't understand your quote

    peter-757102 (4/26/2012)


    With static SQL you can not alter the sorting behavior dynamically at runtime just by maniumlating the order by clause.

    The final version of Chuck's code is exactly that - he is manipulating the order by clause through case evaluation - no dynamic sql in sight.

  • Seems a good example of how tired and overloaded SQL syntax has become.

    SQL syntax has become too obtuse for what we need it to do these days. Need a more powerful way to talk to the database. I'm ready for native C#/Linq queries on SQL without installing .NET assemblies and this kludgy 'Write it in C#/Linq, which is then translated to T-SQL'. Yuck.

  • kevriley (4/26/2012)


    Peter-757102 : whilst I agree you that the observed effects are down to misunderstanding the order by clause, and not some sort of misbehaving case statement, I don't understand your quote

    peter-757102 (4/26/2012)


    With static SQL you can not alter the sorting behavior dynamically at runtime just by manipulating the order by clause.

    The final version of Chuck's code is exactly that - he is manipulating the order by clause through case evaluation - no dynamic sql in sight.

    In that final statement he is not manipulating the order by clause as such, but manipulating the value that comes out of the 4 case expressions to get the effect desired. He is still statically ordering on 4 columns, that is what gets compiled. At runtime some columns returns null when that column is not applicable for the planned sort, by means of the default unhandled return value for case expressions.

    Just demonstrating different runtime behavior does not make it different SQL statements! And the resulting query plan is obviously horrible as there is ALWAYS a sort on 4 calculated expressions. No indexes or statistics of any kind can be used to speed up the execution of the query!

    Dynamic SQL is something different and is visible to the query optimizer when it compiles the SQL code. With dynamic SQL you would get different query plans that can natively tap into available statistics and indexes as no expressions have to be executed at runtime.

    ADDED:

    Re-reading your comment I realize I might not have answer it to your satisfaction.

    It's behavior is static in that only a fixed number of pre-coded "order by" behaviors are supported. It is not dynamically altering the order by, which is fixed and always on the outcome of the 4 computed expressions (some NULL, others not). You could say it is a simulation, limited by what ordering is initially concieved, not something you can extend later by sending trough other ordering instructions.

Viewing 15 posts - 1 through 15 (of 63 total)

You must be logged in to reply to this topic. Login to reply