''Is TSQL really declarative?'' or ''How to I get SQLserver to do obvious optimizations?''

  • I read an article recently which commented that TSQL is declarative, i.e. you are supposed to tell the server what you want not how to do it.

    The problem with this is that it doesn't seem to be true (unless I'm missing something). I have a table which has about 7000 records - relatively small, but too big to post here. The following query takes about 1 minute 25 secs.

    select

    queryid,closeddatetime,process

    from dwd_qtest

    where closeddatetime > dbo.fnMonthEnd('20070428',process,'UK')

    If I rewrite it as 3 queries:

    select

    distinct process into #p from dwd_qtest

    select process, dbo.fnMonthEnd('20070428',process,'UK') as MonthEnd into #m from #p

    select queryid,closeddatetime,dwd_qtest.process

    from dwd_qtest join #m on dwd_qtest.process=#m.process

    where closeddatetime > #m.MonthEnd

    drop table #p

    drop table #m

    It takes 110 milliseconds. In fact the best performance (about 15 milliseconds) was achieved by inserting the expansion of the user defined function as a join to the main query.

    select

    queryid,closeddatetime, qt.process

    from dwd_qtest qt join (

      select min(datevalue) as 'MonthEnd', process

      from ProcessMonthEnd pme

      join WorkingDays wd on pme.MonthEndWD = wd.WorkingDay

      where wd.datevalue > '20070428' and wd.Country = 'UK'

      group by process) p on qt.process = p.process

    where closeddatetime > p.monthend

    Hence the questions in the subject. Either TSQL isn't truly declarative or I'm missing something about how to tell SQlserver to identify subexpressions and hence optimize queries, since I would expected it to be able to identify that the udf call only depends on [process] and hence precalculate it. Since [process] only has 3 possible values, the number of udf calls is reduced from about 7000 to 3, hence the time saving. I can understand it missing the expansion of the UDF, but am disappointed in it's apparent lack of any optimization at all.

    Derek

  • It's been my experience that user defined functions in where clauses kill performance.

    Unlike say getdate(), UDF's are run against each record, while a getdate() is only run at the beginning of the query. 

  • I'd remember reading that somewhere.

    The frustrating thing is that SQLserver won't let you write a 'non-deterministic' UDF, so the implication is that it's going to assume that if you call a UDF with fixed arguments it will always return the same result. Then instead of making use of this fact, it will even call a UDF with fixed arguments for every record of a select (at least that's what I deduce from timings).

    So the conclusion is that T-SQL is *NOT* declarative. The user has to tell SQLserver the best way to execute a query as well, unless you want to risk a severe performance hit! And although using UDFs will make the logic clearer, they are best avoided in production applications or at least used very carefully.

     

    Derek

  • Your deduction is correct.  By making use of the udf you force a cursor like solution (pop open a trace and you can see it).  If you stick with standard sql you will see better results, but if you write queries in such a way that it cannot use the indexes (eg non sargable restriction clauses) or the statistics are out of date you will find it less so.

    Of course you could just write every join in a clr as a pair of select * froms and iterate through them, but that would be just stupid.  Wouldn't it?

     

     

  • I think what is meant by "T-SQL is declarative" is that in most cases, all you specify is what you want, and the optimizer will figure out, via cost-based algorithms, the cheapest query plan that will fetch that data.  

    Certain queries cannot be made efficient by the optimizer, and you have found one of them.  looping constructs, such as cursors and whiles, or UDFs in the where clause, will kill perf.

    But for other types of set based queries that don't involve UDFs in the where clause or join condition, the optimizer usually does a very good job of finding the optimal plan.  Imagine if you had to specify a query plan yourself for each query.  That would be extremely tedious and error prone.  but that's the situation a procedural programmer is in.  If you are writing C++ or C# code to process a raw file, you have to describe in painful detail every little step.  That's not the case in sql - all you specify is what data you want, not how it's to be fetched.  See the difference?

    I suppose the trick in SQL is to know which types of queries cannot generate efficient plans.  You have found one.  But the same is true in procedural programming.  Certain algorithms are just not efficient.  Bubble sort on 100m elements for example.  Knowing what algorithms and coding constructs are efficient is the reason programmers are paid more than fry cooks.

    btw here's a nice article on applying methods of procedural programming to set-based queries: http://weblogs.sqlteam.com/jeffs/archive/2007/04/30/60192.aspx

    ---------------------------------------
    elsasoft.org

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply