WHERE IN(sub-query) vs. INNER JOIN

Question

WHERE IN(sub-query) vs. INNER JOIN

jason_langston

Mr or Mrs. 500

Points: 553
More actions
December 15, 2004 at 11:47 am

#74816

Hey all,
Maybe I should've known better, but for clarity sake we were using WHERE IN (sub-query) in many of our procedures, as opposed to using an INNER JOIN. In most of our empirical tests, SQL was using similiar execution plans for the WHERE IN (sub-query) and the INNER JOIN, so we opted for WHERE IN since it made the intent of the statement clearer (there were many joins).
Today while tuning one of the procedures we noticed a huge performance increase using the INNER JOIN over the WHERE IN(sub-query). I have never seen this before, and wondered if I'd stepped into an anamoly or if INNER JOINs over WHERE IN(sub-query) should just be a best practice?
Any thoughts? comments?
Thanks,
JasonL

JasonL

Viewing 14 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply

Earl Pancrazio SSC Rookie Points: 41 More actions · Answer 1

Hey Jason,

I don't remember all of the specific details, but my understanding is that you should always use INNER JOIN over WHERE IN(subquery) when writing queries. The reason has something to do with the way SQL treats the WHERE clause. It is seen as an expression, so the columns you specify in it are not used in the execution plan optimization. Does this information jog anyone elses memory on this topic??

Earl

Goldy-192585 SSC Enthusiast Points: 156 More actions · Answer 2

I had read an article about using "exists" instead of "In". I think it mentioned that exists would just check for a condition to find out if its true or false; but "In" clause actually retrieves a result set and hence is much slower.

Frank Kalis SSC Guru Points: 111183 More actions · Answer 3

Here is a really nice explanation from Itzik Ben-Gan on the difference between (NOT) IN and EXISTS:

http://groups.google.de/groups?hl=de&lr=&selm=uFffAFPaBHA.1900%40tkmsftngp04&rnum=2

IMHO, when you can restate your query to using JOINs, you should do so. JOINs are usually process efficiently. But to really find out, as always, you need to test your alternatives in your environment.

--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

Antares686 SSC Guru Points: 125444 More actions · Answer 4

A big key is the Join can actually utilize any indexes that exist for that column when joining. The in is generating a simple const list to compare against or might generate a small internal table of the values before processing actually begins which is where the IN will generally cost you the most.

herb walles Say Hey Kid Points: 692 More actions · Answer 5

you should always try to use simple JOIN over WHERE IN , if it is possible.

The same goes for LEFT JOIN over Where NOT IN

David A. Long SSCrazy Eights Points: 8421 More actions · Answer 6

The trouble with IN clause is when the subquery returns NULL you then get all records. Not critical when you use it in a SELECT, but trouble in a DELETE WHERE clause. Can you say Got Backup...

Andy

jason_langston Mr or Mrs. 500 Points: 553 More actions · Answer 7

All ... thanks for the replies!

For the most part, the warnings everyone has given are things I am well aware of (Andy's reminder about returning NULL, etc.), and we honestly did do as Frank suggested - test both options in our environment. (The first thing we tried when we realized the WHERE IN() was taking too long was to try it as a JOIN).

I've heard the very generic "use Joins when you can", but having seen evidence to the contrary in *some* cases have not always followed that advice.

What I'm really wondering is if anyone has some good links that describe the trade-offs, particulary anything that would give solid evidence to back up claims like "joins can use indexes, WHERE IN cannot", "joins are processed more efficiently". I've dug through "Inside SQL Server" and googled like crazy and can't find anything concrete.

Appreciate all the help.

JasonL

Frank Kalis SSC Guru Points: 111183 More actions · Answer 8

Google smart and google in the Groups, not the web and you'll find

--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

herb walles Say Hey Kid Points: 692 More actions · Answer 9

Besides performance issues, I think that typical beginners usually prefers WHERE IN or WHERE NOT IN to include or exclude something, because this logic is kind of easier to comprehend. When person starts to think SQL in term of “joins” to include or exclude something to me it is indication of more advanced level

Fred Williams-134192 SSCrazy Points: 2532 More actions · Answer 10

Too bad I cannot find the old white paper on the SQL optimizer. It discussed techniques like reducing the query tree or parse tree; the point being that the SQL language offers many ways to express what is essentially the same relational algebra. The white paper stated that subqueries are translated to joins internally. Identical execution plans would be strong evidence for this.

Yoda-193403 SSC Eights! Points: 988 More actions · Answer 11

Yes!

And worse: IN and Not IN() require single-column primary keys. This leads beginners to deduce: every table must have an single-column (and thus, often, an identity) PK, otherwise, how can I express my WHERE clauses?

Frank Kalis SSC Guru Points: 111183 More actions · Answer 12

What, there are other PK's possible which have not defined the IDENTITY property on it???

Amazing!

--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

jason_langston Mr or Mrs. 500 Points: 553 More actions · Answer 13

Yeah, that was the kind of thing that I was looking for.

I understand the comments regarding WHERE IN() being a crutch for beginners. On the other hand I think an argument could be made for the self-documenting nature of WHERE IN() over JOINs for the very reason that it is easier to read/understand.

The project I ran into this on had some rather complicated logic, which translated into many joins. The WHERE IN() clearly documented which joins were necessary to get to the data, and what was the "logic" for when a particular modification ran. Naturally, we could use the joins instead (in some cases we have to for performance); and use good comments to describe what is going on. I'm just always a fan of self-documenting code over comments when ever possible.

I was just caught by surprise by the gross discrepancy in performance, having thought that both were evaluated similiarly internally. But that idea may be a holdover from older platforms (Sybase/6.5). If I recall there was even something in the Sybase documentation describing under what conditions the WHERE IN(sub-query) would be translated to joins internally.

In any case, thanks to all for the input. I'll be a little more leary of WHERE IN() in the future.

JasonL