January 28, 2010 at 10:31 am
mike.anderson 52709 (1/28/2010)
mark.ross (1/28/2010)
One of the reasons that the query plans are identical could be that the table being selected from in the 'IN' and 'EXISTS' clause is small. Try running them against a multi-million row table and the plans might not be quite the same.Really? Why?
When running on a million row table, the execution plans still come up the same. That was a part of my initial testing. One thing that does make a difference is indexing. However, the two will still use the same exec plan.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
January 28, 2010 at 10:39 am
Because the 'IN' clause, when run against a large number of rows could, potentially, return a large number of values to check against, whereas the 'EXISTS' clause should still only return a boolean indicating that a particular value exists or not. Also, the 'EXISTS' clause should be able to halt as soon as it finds 1 matching value.
Hmm, technically, there might be nothing to stop the 'IN' clause from working that way, too. I suppose the optimizer could be set to 'rewrite' any simple 'IN' clause as an 'EXISTS' clause in order to simplify things. The question is, does it work that way? I suspect not, but advances in the query optimizer are unclear to me, so I really have no answer to that.
Even if the optimizer DOESN'T work that way, it could in the future. All you really are supposed to expect is a logical and correct result. The 'how' is largely reserved to the optimizer.
So, why, indeed? Gee, and I really did mean to defend my position there. Ahh, well.
January 28, 2010 at 10:44 am
mark.ross (1/28/2010)
Because the 'IN' clause, when run against a large number of rows could, potentially, return a large number of values to check against, whereas the 'EXISTS' clause should still only return a boolean indicating that a particular value exists or not. Also, the 'EXISTS' clause should be able to halt as soon as it finds 1 matching value.Hmm, technically, there might be nothing to stop the 'IN' clause from working that way, too. I suppose the optimizer could be set to 'rewrite' any simple 'IN' clause as an 'EXISTS' clause in order to simplify things. The question is, does it work that way? I suspect not, but advances in the query optimizer are unclear to me, so I really have no answer to that.
Even if the optimizer DOESN'T work that way, it could in the future. All you really are supposed to expect is a logical and correct result. The 'how' is largely reserved to the optimizer.
So, why, indeed? Gee, and I really did mean to defend my position there. Ahh, well.
Nice point/counter-point 😀
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
January 28, 2010 at 10:47 am
CirquedeSQLeil (1/28/2010)
mike.anderson 52709 (1/28/2010)
mark.ross (1/28/2010)
One of the reasons that the query plans are identical could be that the table being selected from in the 'IN' and 'EXISTS' clause is small. Try running them against a multi-million row table and the plans might not be quite the same.Really? Why?
When running on a million row table, the execution plans still come up the same. That was a part of my initial testing. One thing that does make a difference is indexing. However, the two will still use the same exec plan.
Thanks Jason.
I'm curious to know what difference indexing makes to the plan, aside from transforming table scans to index scans?
January 28, 2010 at 10:55 am
Gail Shaw actually covered this very topic in a blog post. In my first reply in this thread I included the link to that Post. It is a worthwhile read.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
January 28, 2010 at 11:01 am
mark.ross (1/28/2010)
Because the 'IN' clause, when run against a large number of rows could, potentially, return a large number of values to check against, whereas the 'EXISTS' clause should still only return a boolean indicating that a particular value exists or not. Also, the 'EXISTS' clause should be able to halt as soon as it finds 1 matching value.Hmm, technically, there might be nothing to stop the 'IN' clause from working that way, too. I suppose the optimizer could be set to 'rewrite' any simple 'IN' clause as an 'EXISTS' clause in order to simplify things. The question is, does it work that way? I suspect not, but advances in the query optimizer are unclear to me, so I really have no answer to that.
Even if the optimizer DOESN'T work that way, it could in the future. All you really are supposed to expect is a logical and correct result. The 'how' is largely reserved to the optimizer.
So, why, indeed? Gee, and I really did mean to defend my position there. Ahh, well.
Actually, the optimizer is that advanced already. To be honest, I'd fully expect it to do that kind of rewrite in 2000.
January 28, 2010 at 11:32 am
mike.anderson 52709 (1/28/2010)
CirquedeSQLeil (1/28/2010)
mike.anderson 52709 (1/28/2010)
mark.ross (1/28/2010)
One of the reasons that the query plans are identical could be that the table being selected from in the 'IN' and 'EXISTS' clause is small. Try running them against a multi-million row table and the plans might not be quite the same.Really? Why?
When running on a million row table, the execution plans still come up the same. That was a part of my initial testing. One thing that does make a difference is indexing. However, the two will still use the same exec plan.
Thanks Jason.
I'm curious to know what difference indexing makes to the plan, aside from transforming table scans to index scans?
Well, depending on the index, the table scan COULD be transformed all the way down to an index seek, which I would expect to perform much better than a table scan.
January 29, 2010 at 12:50 am
thanks CirquedeSQLeil for the link... even with and without the indexes, query plan was exactly same... :w00t:
CirquedeSQLeil (1/26/2010)
Based on your provided reference, I would agree. However, if one examines execution plans for these two queries and run them together, the query optimizer treats them as the same execution plan and equates both queries to the same cost.Here is a nice resource on the topic:
http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/
January 29, 2010 at 2:58 am
mark.ross (1/28/2010)
mike.anderson 52709 (1/28/2010)
I'm curious to know what difference indexing makes to the plan, aside from transforming table scans to index scans?Well, depending on the index, the table scan COULD be transformed all the way down to an index seek, which I would expect to perform much better than a table scan.
I'd agree if there were selection criteria, but there aren't, so a scan will be more efficient than seeks.
January 29, 2010 at 5:11 am
mike.anderson 52709 (1/29/2010)
mark.ross (1/28/2010)
mike.anderson 52709 (1/28/2010)
I'm curious to know what difference indexing makes to the plan, aside from transforming table scans to index scans?Well, depending on the index, the table scan COULD be transformed all the way down to an index seek, which I would expect to perform much better than a table scan.
I'd agree if there were selection criteria, but there aren't, so a scan will be more efficient than seeks.
Sorry - I'll correct myself - I was assuming that all (or most) rows will be joined in both tables. If there is a large number of teachers without any students, then seeks will start to be more efficient.
January 29, 2010 at 6:00 am
Assuming that:
* the inner select returns a reasonable amount of rows - so it fits in memory
shouldn't the IN statement be better since the inner query executes only once, storing in memory all those returned int values (using some kind of ordered array), and then perform a bin search for each value returned by the outer select?
January 29, 2010 at 6:13 am
sistemas 95572 (1/29/2010)
Assuming that:* the inner select returns a reasonable amount of rows - so it fits in memory
shouldn't the IN statement be better since the inner query executes only once, storing in memory all those returned int values (using some kind of ordered array), and then perform a bin search for each value returned by the outer select?
The plan is more likely to be like this:
Both the teachers table and the students table are sorted by teacher_id (if there is an index, this isn't necessary);
Duplicates are eliminated from the students list;
The server 'walks' down each list, emitting rows when the teacher_ids match.
January 29, 2010 at 7:43 am
I think the answer options were corrupted somehow. Right now the question shows three answer possibilities, namely "Wrong Answer", "Right Answer" and "Wrong Answer".
January 29, 2010 at 8:12 am
An administrator changed the possible answers because of the question's subjectivity.
January 29, 2010 at 9:02 am
sistemas 95572 (1/29/2010)
Assuming that:* the inner select returns a reasonable amount of rows - so it fits in memory
shouldn't the IN statement be better since the inner query executes only once, storing in memory all those returned int values (using some kind of ordered array), and then perform a bin search for each value returned by the outer select?
Well, that assumes that the inner query would only execute once. I'm not sure we can guarantee that. In fact, I'm such a pessimist that I would expect the worst case, that the inner query would execute once for each row returned by the outer query. Now, I'm quite sure that the optimizer would handle a good bit of this overhead and give me a pleasant surprise, but maybe not.
You may be right in this case. I simply can't tell.
Viewing 15 posts - 46 through 60 (of 83 total)
You must be logged in to reply to this topic. Login to reply