August 15, 2005 at 3:50 am
hi all,
how can we minus two sets in SQLServer. e.g.
select * from table1
minus
select * from table2
August 15, 2005 at 4:58 am
You are right, there is no built-in relational minus in SQL Server (SQL Server 2005 will have EXCEPT though). This code should do what you want:
SELECT *
FROM Table1
WHERE NOT EXISTS (
SELECT *
FROM Table2
WHERE Col1 = Table1.Col1
AND Col2 = Table1.Col2
)
August 16, 2005 at 1:58 am
Right, no minus. I would recommend using JOIN like this:
SELECT *
FROM Table1 a
LEFT JOIN Table2 b
ON a.Col1 = b.Col1 AND a.Col2 = b.Col2
WHERE b.Col1 IS NULL
I think previous example would cause SQL server to run multiple queries against data, while this one simply joins tables and extracts rows that exist in table B.
August 16, 2005 at 2:41 am
I think you have misunderstood how the query engine in SQL Server works. One query is always one query, what is interesting is to look at the execution plans that SQL Server choose for exeuting the query. Both of our queries will probably result in similar plans (with the same physical join operation in most cases) and therefore perform more or less the same. However your query will always need a filter operator before outputting the results, which will cause it to perform slightly worse when there is a lot of rows returned.
August 16, 2005 at 11:22 am
If you check Kalen Delaney's Inside SQL Server 2000, Chapter 7, Corrolated Subqueries, Page 354, she explains that using the LEFT JOIN and testing for null uses less than half the logical I/O's of the corrolated subquery. Usually a join is more efficient than the alternative.
You may have indexes that could effect the outcome.
If you use alot of corrolated subqueries you may have to work on thinking more set based, and less row-by-row.
Jeff Lomax
August 16, 2005 at 3:59 pm
I have my copy of Inside SQL 2000 at my client's office where I won't be until thursday, so I can't check exactly what it says there. However, like I said, if you compare these queries you will see that they use almost the same execution plan.
Usually a join is more efficient than the alternative.
I usually look at what is most efficient in a specific situation.
If you use alot of corrolated [sic] subqueries you may have to work on thinking more set based, and less row-by-row.
Why is a correlated subquery row-by-row based?
August 16, 2005 at 10:39 pm
Well it does recheck the condition for every row... but it doesn't mean that the server doesn't use some sort a left join to do the check internally and fast .
August 19, 2005 at 3:56 am
But it is still not row-by-row, if by that it would mean "as opposed to being set based". You are talking about the physical implementation. In fact, take a look at the execution plan details. If the physical operator for the outer join that both of the queries above results in is a Nested Loop (depending on the data and indexes etc it might choose another, such as Hash Match, instead), take a look at the lower of the two inputs to the right of it. Note the Number of executes for both cases...
One of the problems with SQL and it's implementations, as opposed to how a true RDBMS should function, is that you can write different queries for getting the same result, but they perform differently. Some will be faster and some slower. That is the direct result of mixing the logical and the physical level. As a programmer (the user of the DBMS) I should not be able to tell the DBMS how it should actually produce my result, I should just tell it what I want. The DBMS would then find the best possible way to do produce that result for me.
Edit: Not really meant to Remi specifically, but to the discussion as a whole.
August 19, 2005 at 6:18 am
Ok, I have now checked what it says in Inside SQL Server 2000, and I do not make the same interpretation as you do. I cannot find anywhere where she says that "Usually a join is more efficient than the alternative". The exact words are these:
Depending on your data and indexes, the outer-join formulation might be faster or slower than a correlated subquery. But before deciding to write your query one way or the other, you might want to come up with a couple of alternative formulations and then choose the one that's fastest in your situation.
As I said in my previous post it is unfortunate that different queries can produce the same results but not perform equally well, but since that is the case we should always, like I (and Kalen) said, test our queries and use the best one. In the case presented in this thread I constantly get better results using a correlated subquery than when using a left join filtering for null, even when changing the amount of data and/or indexes. Both queries use very similar execution plans, with exactly the same amount of I/Os, but the left join uses a filter condition that adds a little cpu activity. Hardly noticeable, but still clearly readable in the execution plan and/or statistics time output.
What Kalen say regarding "fewer than half the number of logical I/Os" is that in the example that she provides in the book, that is the result. However, what is interesting is that a) her example is very similar to the one here and b) when I try her exact example I do not at all get "fewer than half the number of logical I/Os". In fact I get the exact same result as I do with the example in this thread, that is the exact same number of logical I/Os. I can not be really sure what she used for testing though, but it looks like the pubs database, so I can not see where she got those numbers from. Perhaps it was a flaw in the RTM engine that has later been fixed in a service pack.
Finally, another interesting thing to try is to use the EXCEPT keyword in SQL Server 2005. Like I said above, EXCEPT performs a relational minus. Now, the product is still in beta, but when I execute the two statements above as well as one using EXCEPT two of the three execution plans are the same. Guess which one differs, and is also the slower one? As expected EXCEPT uses the fastest execution plan to perform a relational minus on the two sets, and that execution plan happens to be the same one as the one used by the query with the correlated subquery.
August 19, 2005 at 7:18 am
Wow, can't wait for yukon . Thanx for the info.
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply