February 14, 2009 at 12:24 pm
can DISTINCT keyword be replaced with " IN" for better performance in some scenarios?
February 14, 2009 at 3:41 pm
your question is a bit vague. can you please provide a full explanation?
if you question is this:
field in (select ... from ...)
is functionally identical to
field in (select distinct ... from ...)
i'm a "less is better" person, so i'd omit the distinct since it causes a bit more work for the query engine to handle.
February 14, 2009 at 3:56 pm
Yeah, they are from totally different parts of a query. DISTINCT is part of the select clause and IN part of the where clause.
Some performance points though could be...
If you can use your where clause or joins so that duplicate results are not returned in the first place then not needing to use distinct will be a performance boost.
Also EXISTS is generally much better performance wise than IN
E.G.
SELECT a.col1,a.col2
FROM tableA a
WHERE a.col1 IN (SELECT col3 FROM tableB)
Would be more efficient as
SELECT a.col1,a.col2
FROM tableA a
WHERE EXISTS (SELECT 1 FROM tableB where col3 = a.col1)
Basically in the first query all rows are queried from tableB then compared to tableA. In the second only matching columns are returned from tableB. Also note using SELECT 1 in this instance instead of select *. Since there is no need to reference the values returned there is a small saving in not having to retrieve a value from the table (especially not retrieving every single column)
[font="Arial"]_______________________________________________________
Change is inevitable... Except from a vending machine.[/font]
February 15, 2009 at 11:33 am
can i write your second query like this...
SELECT a.col1,a.col2
FROM tableA a
WHERE a.col1 IN (SELECT 1 FROM tableB b inner join tableA a
ON a.col1=b.col1
and a.col1=b.col1)
February 15, 2009 at 1:07 pm
[font="Verdana"]Actually, if you replace DISTINCT with IN, because they are completely different parts of the query, the query will fail. However, it will fail really quickly. Hence, performance is improved. :P[/font]
February 16, 2009 at 6:18 am
I think you're referring back to another set of posts where someone suggested that using the IN clause in the query eliminated the need for the DISTINCT clause. Am I right?
What they meant was that the use of the IN clause filtered your records such that you had a distinct list, based on the key values included in the IN clause. That made trying to filter the records through the use of the DISTINCT clause unnecessary.
DISTINCT is, more often than not, a crutch to cover up a bad query, bad structure, or bad data. In the vast majority of cases, you should be able to appropriately filter your data through the proper use of JOINS and information in the WHERE clause, such as the IN statement, to get a unique set of rows without resorting to the use of DISTINCT. DISTINCT generally uses some method of aggregation to eliminate duplicate rows and frequently is unable to do this using existing indexes on the tables, so it's very slow.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
February 17, 2009 at 6:38 am
Mike Levan (2/15/2009)
can i write your second query like this...SELECT a.col1,a.col2
FROM tableA a
WHERE a.col1 IN (SELECT 1 FROM tableB b inner join tableA a
ON a.col1=b.col1
and a.col1=b.col1)
Note that aside from Grant's comments, which seem to put a different spin on your question, rewriting the query above is not achieving the result you want for several reasons.
1 - the "second query" you are referring to was using EXISTS, and therefore used a JOIN to filter the results and used a generic 'SELECT 1', since it was just checking for existence. IN and EXISTS are not equivalent, but you can achieve the same results in the overall query by using EXISTS.
2 - Because IN and EXISTS are not equivalent, the 'SELECT 1' in the subquery will return '1' if it exists, and therefore using a.col1 IN (SELECT 1 . . .blah blah blah) will only work if a.col1 = 1
Hope that helps to explain the difference a little more.
---------------------------------------------------------
How best to post your question[/url]
How to post performance problems[/url]
Tally Table:What it is and how it replaces a loop[/url]
"stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply