Query cost

  • mike.anderson 52709 (1/28/2010)


    mark.ross (1/28/2010)


    One of the reasons that the query plans are identical could be that the table being selected from in the 'IN' and 'EXISTS' clause is small. Try running them against a multi-million row table and the plans might not be quite the same.

    Really? Why?

    When running on a million row table, the execution plans still come up the same. That was a part of my initial testing. One thing that does make a difference is indexing. However, the two will still use the same exec plan.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Because the 'IN' clause, when run against a large number of rows could, potentially, return a large number of values to check against, whereas the 'EXISTS' clause should still only return a boolean indicating that a particular value exists or not. Also, the 'EXISTS' clause should be able to halt as soon as it finds 1 matching value.

    Hmm, technically, there might be nothing to stop the 'IN' clause from working that way, too. I suppose the optimizer could be set to 'rewrite' any simple 'IN' clause as an 'EXISTS' clause in order to simplify things. The question is, does it work that way? I suspect not, but advances in the query optimizer are unclear to me, so I really have no answer to that.

    Even if the optimizer DOESN'T work that way, it could in the future. All you really are supposed to expect is a logical and correct result. The 'how' is largely reserved to the optimizer.

    So, why, indeed? Gee, and I really did mean to defend my position there. Ahh, well.

  • mark.ross (1/28/2010)


    Because the 'IN' clause, when run against a large number of rows could, potentially, return a large number of values to check against, whereas the 'EXISTS' clause should still only return a boolean indicating that a particular value exists or not. Also, the 'EXISTS' clause should be able to halt as soon as it finds 1 matching value.

    Hmm, technically, there might be nothing to stop the 'IN' clause from working that way, too. I suppose the optimizer could be set to 'rewrite' any simple 'IN' clause as an 'EXISTS' clause in order to simplify things. The question is, does it work that way? I suspect not, but advances in the query optimizer are unclear to me, so I really have no answer to that.

    Even if the optimizer DOESN'T work that way, it could in the future. All you really are supposed to expect is a logical and correct result. The 'how' is largely reserved to the optimizer.

    So, why, indeed? Gee, and I really did mean to defend my position there. Ahh, well.

    Nice point/counter-point 😀

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • CirquedeSQLeil (1/28/2010)


    mike.anderson 52709 (1/28/2010)


    mark.ross (1/28/2010)


    One of the reasons that the query plans are identical could be that the table being selected from in the 'IN' and 'EXISTS' clause is small. Try running them against a multi-million row table and the plans might not be quite the same.

    Really? Why?

    When running on a million row table, the execution plans still come up the same. That was a part of my initial testing. One thing that does make a difference is indexing. However, the two will still use the same exec plan.

    Thanks Jason.

    I'm curious to know what difference indexing makes to the plan, aside from transforming table scans to index scans?

  • Gail Shaw actually covered this very topic in a blog post. In my first reply in this thread I included the link to that Post. It is a worthwhile read.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • mark.ross (1/28/2010)


    Because the 'IN' clause, when run against a large number of rows could, potentially, return a large number of values to check against, whereas the 'EXISTS' clause should still only return a boolean indicating that a particular value exists or not. Also, the 'EXISTS' clause should be able to halt as soon as it finds 1 matching value.

    Hmm, technically, there might be nothing to stop the 'IN' clause from working that way, too. I suppose the optimizer could be set to 'rewrite' any simple 'IN' clause as an 'EXISTS' clause in order to simplify things. The question is, does it work that way? I suspect not, but advances in the query optimizer are unclear to me, so I really have no answer to that.

    Even if the optimizer DOESN'T work that way, it could in the future. All you really are supposed to expect is a logical and correct result. The 'how' is largely reserved to the optimizer.

    So, why, indeed? Gee, and I really did mean to defend my position there. Ahh, well.

    Actually, the optimizer is that advanced already. To be honest, I'd fully expect it to do that kind of rewrite in 2000.

  • mike.anderson 52709 (1/28/2010)


    CirquedeSQLeil (1/28/2010)


    mike.anderson 52709 (1/28/2010)


    mark.ross (1/28/2010)


    One of the reasons that the query plans are identical could be that the table being selected from in the 'IN' and 'EXISTS' clause is small. Try running them against a multi-million row table and the plans might not be quite the same.

    Really? Why?

    When running on a million row table, the execution plans still come up the same. That was a part of my initial testing. One thing that does make a difference is indexing. However, the two will still use the same exec plan.

    Thanks Jason.

    I'm curious to know what difference indexing makes to the plan, aside from transforming table scans to index scans?

    Well, depending on the index, the table scan COULD be transformed all the way down to an index seek, which I would expect to perform much better than a table scan.

  • thanks CirquedeSQLeil for the link... even with and without the indexes, query plan was exactly same... :w00t:

    CirquedeSQLeil (1/26/2010)


    Based on your provided reference, I would agree. However, if one examines execution plans for these two queries and run them together, the query optimizer treats them as the same execution plan and equates both queries to the same cost.

    Here is a nice resource on the topic:

    http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/

  • mark.ross (1/28/2010)


    mike.anderson 52709 (1/28/2010)


    I'm curious to know what difference indexing makes to the plan, aside from transforming table scans to index scans?

    Well, depending on the index, the table scan COULD be transformed all the way down to an index seek, which I would expect to perform much better than a table scan.

    I'd agree if there were selection criteria, but there aren't, so a scan will be more efficient than seeks.

  • mike.anderson 52709 (1/29/2010)


    mark.ross (1/28/2010)


    mike.anderson 52709 (1/28/2010)


    I'm curious to know what difference indexing makes to the plan, aside from transforming table scans to index scans?

    Well, depending on the index, the table scan COULD be transformed all the way down to an index seek, which I would expect to perform much better than a table scan.

    I'd agree if there were selection criteria, but there aren't, so a scan will be more efficient than seeks.

    Sorry - I'll correct myself - I was assuming that all (or most) rows will be joined in both tables. If there is a large number of teachers without any students, then seeks will start to be more efficient.

  • Assuming that:

    * the inner select returns a reasonable amount of rows - so it fits in memory

    shouldn't the IN statement be better since the inner query executes only once, storing in memory all those returned int values (using some kind of ordered array), and then perform a bin search for each value returned by the outer select?

  • sistemas 95572 (1/29/2010)


    Assuming that:

    * the inner select returns a reasonable amount of rows - so it fits in memory

    shouldn't the IN statement be better since the inner query executes only once, storing in memory all those returned int values (using some kind of ordered array), and then perform a bin search for each value returned by the outer select?

    The plan is more likely to be like this:

    Both the teachers table and the students table are sorted by teacher_id (if there is an index, this isn't necessary);

    Duplicates are eliminated from the students list;

    The server 'walks' down each list, emitting rows when the teacher_ids match.

  • I think the answer options were corrupted somehow. Right now the question shows three answer possibilities, namely "Wrong Answer", "Right Answer" and "Wrong Answer".

  • An administrator changed the possible answers because of the question's subjectivity.

  • sistemas 95572 (1/29/2010)


    Assuming that:

    * the inner select returns a reasonable amount of rows - so it fits in memory

    shouldn't the IN statement be better since the inner query executes only once, storing in memory all those returned int values (using some kind of ordered array), and then perform a bin search for each value returned by the outer select?

    Well, that assumes that the inner query would only execute once. I'm not sure we can guarantee that. In fact, I'm such a pessimist that I would expect the worst case, that the inner query would execute once for each row returned by the outer query. Now, I'm quite sure that the optimizer would handle a good bit of this overhead and give me a pleasant surprise, but maybe not.

    You may be right in this case. I simply can't tell.

Viewing 15 posts - 46 through 60 (of 83 total)

You must be logged in to reply to this topic. Login to reply