In, Exists and join – a roundup

Over the last several months I’ve had a look at IN, Exists, Join and their opposites to see how they perform and whether there’s any truth in the advice that is often seen on forums and blogs advocating replacing one with the other.

Previous parts of this series can be found:

In this roundup post, I’m going to do multiple tests on the 6 query forms, with different numbers of rows, indexes, no indexes and, for the negative forms (NOT IN, NOT EXISTS), nullable and non-nullable join columns.

In the individual tests, I used 250000 rows in the first table and around 3000 rows in the secondary table. In this roundup, I’m going to use 3 different row counts, 1000000 rows, 100000 rows and 5000 rows. That should give a reasonable idea for performance at various table sizes. (Not much point in going smaller than 5000 rows. Everything’s fast on 100 rows)

Some notes on the tests.

The version of SQL is SQL Server 2008 SP1 x64 Developer Edition.
The tests were run on a laptop. Core-2 Duo, 3 GB memory. SQL limited to 1 processor, so no parallelism possible.
Each query will be run 10 times, reads, cpu and duration measured by profiler and averaged.
Each query will be run once before the tests start to ensure that the data is in cache and the execution plans are generated and cached.
Reproduction scripts will be available for download.

Exists vs. In vs. Inner Join

First, no indexes on the join columns

Table Size	Operator	CPU	Reads	Duration
Large	IN	1293	14585	9649
	Exists	1260	14585	9573
	Inner Join	1302	14585	9716
Medium	IN	59	747	538
	Exists	78	747	574
	Inner Join	69	747	523
Small	IN	7	41	65
	Exists	3	41	91
	Inner Join	4	41	65

Now with indexes on the join columns

Table Size	Operator	CPU	Reads	Duration
Large	IN	973	1760	9707
	Exists	956	1760	9483
	Inner Join	1173	1760	9539
Medium	IN	43	100	516
	Exists	53	100	548
	Inner Join	59	100	498
Small	IN	3	9	64
	Exists	1	9	80
	Inner Join	4	9	67

Not Exists vs. Not In vs. Left Outer Join … Is Null

First test with the columns join columns nullable, no indexes

Table Size	Operator	CPU	Reads	Duration
Large	NOT IN	3194	2014622	3251
	NOT Exists	820	14585	837
	Outer Join	962	14585	1025
Medium	NOT IN	174	100765	217
	NOT Exists	54	747	121
	Outer Join	53	747	79
Small	NOT IN	12	5043	13
	NOT Exists	4	41	6
	Outer Join	3	41	5

Then with join columns nullable with indexes

Table Size	Operator	CPU	Reads	Duration
Large	NOT IN	2677	2001762	2726
	NOT Exists	569	1760	586
	Outer Join	949	1760	1029
Medium	NOT IN	137	100102	164
	NOT Exists	40	100	104
	Outer Join	48	100	69
Small	NOT IN	11	5011	12
	NOT Exists	3	9	4
	Outer Join	6	9	6

Now, let’s make the join columns not nullable. Again, no indexes to start with.

Table Size	Operator	CPU	Reads	Duration
Large	NOT IN	741	14585	753
	NOT Exists	784	14585	790
	Outer Join	884	14585	937
Medium	NOT IN	43	747	103
	NOT Exists	49	747	120
	Outer Join	53	747	74
Small	NOT IN	4	41	4
	NOT Exists	1	41	5
	Outer Join	1	41	5

and finally, join columns not nullable, with indexes

Table Size	Operator	CPU	Reads	Duration
Large	NOT IN	578	1382	588
	NOT Exists	585	1382	597
	Outer Join	953	1382	1006
Medium	NOT IN	37	80	79
	NOT Exists	34	80	79
	Outer Join	39	80	84
Small	NOT IN	3	8	4
	NOT Exists	1	8	5
	Outer Join	4	8	5

These results seem to pretty much confirm the earlier conclusions.

Exists and IN perform much the same, whether there are indexes on the join column or not. When there are indexes on the join columns, the INNER JOIN is slightly (very slightly) slower, which is more noticeable on the large tables, much less on the medium or small ones. (Note I’m mostly looking at CPU time, as the duration is also affected by sending of results to client, in this case, lots and lots of results)

When it comes to NOT In and NOT Exists they perform much the same when the columns involved are not nullable. If the columns are nullable, Not In is significantly slower because it has a different behaviour when nulls are present.

The join is slightly slower than Not Exists (or Not In on non-nullable columns), again only noticeable on the large table, probably because the optimiser has to do a full join with a secondary filter rather than the anti-semi join that it can use for Not Exists and Not In.

My conclusion from earlier posts stands. If all you are doing is looking for matching or non-matching rows and you don’t need any columns from the second table, use IN or Exists (or their negations), as appropriate for the situation. Only when you need columns from the second table should Join be used.

I think (and hope) that this adequately concludes the discussion on the Exists and In and joins, both behaviour and performance.

Reproduction scripts

Book Review: Big Red - Voyage of a Trident Submarine

by Andy Warren

SQLServerCentral.com

Blogs

I've grown up reading Tom Clancy and probably most of you have at least seen Red October, so this book caught my eye when browsing used books for a recent trip. It's a fairly human look at what's involved in sailing on a Trident missile submarine...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-03-10

1,439 reads

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

by Robert Davis

SQLServerCentral.com

Blogs

Question: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? This question was sent to me via email. My reply follows. Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? Databases to be mirrored are currently running on 2005 SQL instances but will be upgraded to 2008 SQL in the near future.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-23

1,567 reads

Inserting Markup into a String with SQL

by Phil Factor

SQLServerCentral.com

T-SQL

In which Phil illustrates an old trick using STUFF to intert a number of substrings from a table into a string, and explains why the technique might speed up your code...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-18

1,631 reads

Networking - Part 4

by Andy Warren

SQLServerCentral.com

Blogs

You may want to read Part 1 , Part 2 , and Part 3 before continuing. This time around I'd like to talk about social networking. We'll start with social networking. Facebook, MySpace, and Twitter are all good examples of using technology to let...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-17

1,530 reads

Speaking at Community Events - More Thoughts

by Andy Warren

SQLServerCentral.com

Blogs

Last week I posted Speaking at Community Events - Time to Raise the Bar?, a first cut at talking about to what degree we should require experience for speakers at events like SQLSaturday as well as when it might be appropriate to add additional focus/limitations on the presentations that are accepted. I've got a few more thoughts on the topic this week, and I look forward to your comments.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-13

360 reads

In, Exists and join – a roundup

Exists vs. In vs. Inner Join

Not Exists vs. Not In vs. Left Outer Join … Is Null

Rate

Share

Share

Rate

In, Exists and join – a roundup

Exists vs. In vs. Inner Join

Not Exists vs. Not In vs. Left Outer Join … Is Null

Rate

Share

Share

Rate

Related content

Book Review: Big Red - Voyage of a Trident Submarine

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

Inserting Markup into a String with SQL

Networking - Part 4

Speaking at Community Events - More Thoughts