How to Make Scalar UDFs Run Faster (SQL Spackle)

  • GSquared (8/6/2012)


    And the key to the article was the bit about inline 1X1 UDFs as "scalar" functions with better performance, and the point about Set Stats. Regardless of ICaps functionality, those are both very good points.

    Can't agree more. The prime focus should be on these very informative key points.

  • Usman Butt (8/7/2012)


    Jeff Moden (8/6/2012)


    Usman Butt (8/6/2012)


    Can I safely assume that you have approved my Anti-RBAR alliance membership? 😉

    Heh... make no "Butts" about it. 🙂 I know, I know... you've had "Mo-den" enough of those kinds of jokes. :w00t:

    :laugh: Since living in this "Mo-den" era we had no other option but to enjoy such creativity 😉 Keep them coming. I wonder haven't you got any request to write a super joke book yet :hehe:

    Seriously, though. You did a really nice job. Thanks for joining this discussion.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (8/7/2012)


    Seriously, though. You did a really nice job. Thanks for joining this discussion.

    The pleasure is always mine 😎 It is a great honor to receive such a generous compliment from you :Wow:

  • Jeff -

    I read this article last week and it was definitely an eye-opener. Great stuff!

    It is always exceedingly important to know your tools and most especially their limitations.

    Sorry I didn't get around to saying this until now. Thought I had but was busy last week and it must have slipped my mind.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Great article. I will have to remember the content, because it could make a big difference in many things.

    But you worried me a little in one of your comments, because some people may interpret as saying that performnce is the most important thing (which I'm sure you didn't mean at all:

    Jeff Moden (8/5/2012)


    Shifting gears, I'm one of the first to say that performance is always a prime factor ....

    For me, correctness comes before performance; I don't care how fast I can produce the wrong answer; yes, performance is important - but it is only important once I can produce the right answer. And next after that might be reusability (or adaptability), for the cases where I'm going to have to do the same thing (or pretty much the same thing) in dozens of places, or it might be flexibility (ease of changing - for when I know the requirements are a heap of wet shifting quicksand invented by sales consultants who haven't a clue what is needed), or (rarely, when the requirement is rock solid and the code is only to be used in one context) performance. So performance is never first and rarely second.

    Tom

  • rragno (8/1/2012)


    It is irritating how big the gap is between the performance here and high-performance code, though. Why should we have to be impressed that multiplying a million numbers takes 800 ms? A simplistic implementation of the same process, applying a function call, in C# clocks in at 0.8 ms on my old laptop. That is 1000x faster. Obviously, there are reasons for some slowdown, but this magnitude is hard to accept.

    It would be interesting to see the C# so that we could see what it is that's being optimised. Clearly either the C# optimiser is really awful, or the C# is deliberately written in such a way as to avoid the most insane optimisations. For example, it would be nice to know how many multiplications the optimised code does, it would be nice to know how many values it reads from an array, and so on.

    Decades ago I saw a Fortran system which would have run this, if naively written so that the optimiser could be really clever, in about 10 microseconds. Of course it would have done exactly one multiplication.

    Tom

  • GSquared (8/1/2012)


    Post-script: Name-case coding will be wrong much of the time. Bill O'Malley will likely end up as Bill O'malley, or Bob's will end up as Bob'S, unless you code some seriously smart algorithms into it. Any code that handles MacDonald correctly will probably mangle Macy or Macey, and vice-versa. The more "correct" you make it, the more complex it will be, and the slower it will run, and (likely) the buggier it will get.

    It's worse than that. Often the "correct" capitalisation is different from one person to the next; with surnames like MacinTosh/Macintosh/MacIntosh, MaclEan/MacLean, Mackay/MacKay, and Macintyre/MacinTyre/MacIntyre there's no imaginable way you can determine the "correct" capitalisation in any sort of scaler function, you just have to hope it was entered right in the first place. So the task is generally impossible for surnames of Gaelic (Scottish, Manx, or Irish) origin (the capitalisation in the Gaelic forms is of course determinate but that doesn't help with the English forms - for example in Macintyre only the m and the S are capitalised, which doesn't carry over very well into the English transliteration since the S isn't there. Names with other non-English origins display the same sort of problem.

    Tom

  • L' Eomot Inversé (9/23/2012)


    It would be interesting to see the C# so that we could see what it is that's being optimised. Clearly either the C# optimiser is really awful, or the C# is deliberately written in such a way as to avoid the most insane optimisations. For example, it would be nice to know how many multiplications the optimised code does, it would be nice to know how many values it reads from an array, and so on. Decades ago I saw a Fortran system which would have run this, if naively written so that the optimiser could be really clever, in about 10 microseconds. Of course it would have done exactly one multiplication.

    Please be assured that the .NET compiler and optimizer is state-of-the-art. The overhead is caused by CLR being in hosted in SQL Server. The C# code will indeed be compiled down to just one machine-code instruction (which may take less than one CPU cycle to execute) but it takes time to connect things up and switch between SQL Server and the hosted environment. Nevertheless, as Books Online says, "CLR functions benefit from a quicker invocation path than that of Transact-SQL user-defined functions. Additionally, managed code has a decisive performance advantage over Transact-SQL in terms of procedural code, computation, and string manipulation.".

  • rragno (8/5/2012)


    That is the C# code! The cruft needed to deploy might be difficult, but they made some things elegant. That method I posted is the whole method. I can send over a little package... Once you compile that into a DLL, it is a one-liner to add the assembly into your DB, and then another to declare a function that calls this method. Otherwise, the testing is identical to the TSQL function.

    This is a test I have performed many times myself. T-SQL scalar user-defined functions are implemented so poorly that even returning an unmodified value is faster in a CLR scalar function that a (non-inline) T-SQL one. Another highlight is that CLR scalar functions can use parallelism whereas the T-SQL equivalent forces the whole execution plan to run serially. Nastily, a query that references a table that incorporates a T-SQL scalar function will always run serially even if the column that uses the T-SQL function is not referenced anywhere in the query!

    I don't have a multiply-by-two assembly and function to hand, but I did post a CLR scalar function to sum the digits of an integer a couple of days ago that beats even in-line T-SQL (no functions at all):

    http://www.sqlservercentral.com/Forums/FindPost1362532.aspx

    T-SQL is an interpreted language, and intrinsic functions are generally slow. It doesn't take much at all to make a CLR scalar function perform faster (though in many cases it is not worth the effort). If and when SQL Server includes the ability to in-line CLR code within T-SQL, that might change. I think this feature is part of the draft SQL standard.

  • L' Eomot Inversé (9/23/2012)


    Great article. I will have to remember the content, because it could make a big difference in many things.

    Thanks for the feedback, Tom. I appreciate it.

    But you worried me a little in one of your comments, because some people may interpret as saying that performnce is the most important thing (which I'm sure you didn't mean at all:

    Jeff Moden (8/5/2012)


    Shifting gears, I'm one of the first to say that performance is always a prime factor ....

    For me, correctness comes before performance;

    Oh, for me to. That's why I said performance is only "a" prime factor. Perhaps it's wrong to do so but I take it for granted that everyone knows the answer has to be correct before any other factor comes into play. My personal mantra on the subject of code development is "Make it work, make it fast, make it pretty and it ain't done 'til it's pretty." 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Removed. (I quoted Jeff partially quoting Barry without reading all the way back to see that Barry had already referenced the document I wanted to mention).

  • SQL Kiwi (9/23/2012)


    L' Eomot Inversé (9/23/2012)


    It would be interesting to see the C# so that we could see what it is that's being optimised. Clearly either the C# optimiser is really awful, or the C# is deliberately written in such a way as to avoid the most insane optimisations. For example, it would be nice to know how many multiplications the optimised code does, it would be nice to know how many values it reads from an array, and so on. Decades ago I saw a Fortran system which would have run this, if naively written so that the optimiser could be really clever, in about 10 microseconds. Of course it would have done exactly one multiplication.

    Please be assured that the .NET compiler and optimizer is state-of-the-art. The overhead is caused by CLR being in hosted in SQL Server. The C# code will indeed be compiled down to just one machine-code instruction (which may take less than one CPU cycle to execute) but it takes time to connect things up and switch between SQL Server and the hosted environment.

    I think you misunderstood me Paul - my fault for not expressing myself more clearly.

    rragnmo seemed to me to be talking about two distinct thigs - a chunk of pure C# measured on its own (his first paragraph) and a CLR function embedded into SQL (his second paragraph). I was concerned only with his second paragraph. When an attempt is made to do the whole thing in c# (or Fortran, or any other procedural language) there's a risk that the optimiser notices that the first one million minus one function calls are irrelevant so it leaves them out. This doesn't happen with SQL, currently (but it might in the furture) so generally there';s an attempt to prevent the optimiser from doing this to avoid meaningless comparisons. My "one multiplication" was for the whole process, which needed only one multiplication (probably actually a left shift, depending on the target machine, as it's a multiplication by 2) instead of a million multiplications. If that sort of optimisation was happening, we would be comparing a forest with a single tree.

    Having seen people make utterly wrong decisions based on run times of tasks which could be optimied out just about completely, I just wondered how well protected rragnmo's measurements were aginst that sort of thing. One of the things he said in his comment seemed to indicate that he did have had it in mind, but was concerned that he might not have done enough to avoid it.

    Tom

  • SQL Kiwi (9/23/2012)


    The C# code will indeed be compiled down to just one machine-code instruction (which may take less than one CPU cycle to execute)...

    Are you sure, Paul? IIRC, even a simple JMP instruction takes at least 2 clock cycles.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • G Bryant McClellan (8/1/2012)


    While every testing method negatively affects the performance of the thing being tested by making the tester part of the experiment, some are obviously worse than others. I'd be interested to hear from some of the SQL Internals authors if they have some insight into how SET STATISTICS TIME ON works and how its use may or may not affect other situations where it is used. Paul? Kalen? Are you thinking about this yet?

    Having STATISTICS TIME ON causes SQL Server to measure things every time a T-SQL frame is entered and exited. The way T-SQL scalar functions are implemented now (with BEGIN...END syntax) an (expensive!) new T-SQL frame is created on every call.

    http://blogs.msdn.com/b/sqlclrperf/archive/2007/03/08/scalar-functions.aspx

  • RichB (8/1/2012)


    Have you tried looking at these with a server side trace? I'd be very curious to know whether that can have a similar problem.

    Yes a server-side trace has very similar effects compared with STATISTICS TIME for similar reasons. Your question reminded me of a real issue here on SSC from several years ago: http://www.sqlservercentral.com/Forums/FindPost886822.aspx

Viewing 15 posts - 61 through 75 (of 102 total)

You must be logged in to reply to this topic. Login to reply