Introduction
Are you confused by all this "inner join" style
syntax that is becoming more and more prominent lately?
It’s happening because Microsoft is moving towards using
ANSI syntax. The old syntax still works, but it is strongly recommended you
learn the new syntax, Microsoft have stated that they will not support the old
syntax indefinitely.
It’s fairly easy to translate from the old to the new
syntax, and correctly translated results will usually produce identical results,
but there are exceptions that you should be aware of.
Inner or natural joins
Inner joins return rows where data matching exists in the
tables you are joining. This is the simplest type of join, and moving between
the old and new syntax should present no problems. In general, this:
select o.name, i.name
from sysobjects o, sysindexes i
where o.id = i.id
Is equivalent to this:
select o.name, i.name
from sysobjects o inner join sysindexes i
on o.id = i.id
Note that the components of the where clause that
define the joins between tables are now defined in the from clause
instead. The where is now only used for specifying selection criteria. To
illustrate this:
select o.name, i.name
from sysobjects o, sysindexes i
where o.id = i.id and
indid = 1
Becomes:
select o.name, i.name
from sysobjects o inner join sysindexes i on o.id = i.id
where indid = 1
Incidentally, what the query above does is to display table
names and index names in the current database where the table has a Clustered
Index. The sysindexes.indid column is always set to one for clustered indexes.
I will use similar examples later on, and if you are not too
clear on these system tables, then now would be a good time for you to review
the documentation on sysobjects and sysindexes in Books Online.
Update and Delete
These statements also support the ANSI join syntax, although
it’s not very well documented in Books Online for some versions of SQL Server
update t1
set t1.col = …………from t1 inner join t2 on t1.col = t2.col
delete t1
from t1 inner join t2 on t1.col = t2.col
Outer joins
When two tables are joined with an inner join, data will only
be returned if matching data exists in both tables. An outer join is like saying
"and also include the rows from one table if there are no matching rows in
the other one."
With an outer join the columns from the table where data is
"missing" are returned as NULL values.
Outer joins come in two basic flavours, called Left and Right.
Left outer joins mean that the data must be contained in the table defined to
the left side of the equivalence, but not necessarily the right hand side. Right
outer joins, of course, work the other way around.
To illustrate this, cut and paste the code below into a Query
Analyser window and try running it. I have used the newer ANSI syntax here, and
the older equivalents are included but commented out using the "—"
comment notation. Comment them back in if you want to try them.
set nocount on
go
create table #left (leftI int)
create table #right (rightI int)
insert #left select 1 union select 2 union select 3
insert #right select 2 union select 3 union select 4 union select 5 union select 6
print "*** Inner join ***"
select * from #left inner join #right on leftI = rightI
-- select * from #left, #right where leftI = rightI
print " "
print "*** left join ***"
select * from #left left outer join #right on leftI = rightI
-- select * from #left, #right where leftI *= rightI
print " "
print "*** right join ***"
select * from #left right outer join #right on leftI = rightI
-- select * from #left, #right where leftI =* rightI
drop table #left
drop table #right
set nocount off
go
Outer Join Gotchas
Unlike natural joins, outer joins have a few traps you can
fall into if you translate your query from the old syntax to the new and expect
it to produce the same results in both formats.
Returning to our sysobjects and sysindexes examples, the
following two queries should produce the same results
select o.name, i.name
from sysobjects o, sysindexes i
where o.id *= i.id and
o.type = "S"
select o.name, i.name
from sysobjects o left outer join sysindexes i on o.id = i.id
where o.type = "S"
However, if we qualify the select criteria on sysindexes as
follows, in order to look for details on clustered indexes, the two queries
should return different results.
select o.name, i.name
from sysobjects o, sysindexes i
where o.id *= i.id and
o.type = "S" and
i.indid = 1
select o.name, i.name
from sysobjects o left outer join sysindexes i on o.id = i.id
where o.type = "S" and
i.indid = 1
The reason this happens is that the SQL Server reacts
differently to the "Where" clause when the outer joined table does not
have any data.
When no data is found in the sysindexes table that matches the
sysobjects table, indid will be returned as NULL. In this case:
- With the old syntax, SQL Server will ignore the i.indid
= 1 selection criteria and return a row in which indid IS NULL,
even though, strictly speaking, the where i.indid =1 clause has
failed
- With the new syntax SQL Server takes the view that the test
i.indid = 1 fails because i.inid is NULL, so no value is returned.
Full outer joins
Full outer joins effectively combine the left and right outer
joins so that data will be returned if it matches in both tables, or if it exist
in either one.
The old join syntax has no direct equivalent of the full outer
join.
Try out the SQL below to illustrate the full outer join.
set nocount on
go
create table #left (leftI int)
create table #right (rightI int)
insert #left select 1 union select 2 union select 3
insert #right select 2 union select 3 union select 4 union select 5 union select 6
print "*** FULL join ***"
select * from #left full outer join #right on leftI = rightI
drop table #left
drop table #right
set nocount off
go
Cross Joins
Cross Joins create a Cartesian Product, rather like when you forget to
include a "where" clause to join two tables. There are not many cases
where you would want to use a cross join, but this article
discusses one possible use for them.
Orphan hunting
Prior to ANSI SQL, if you wanted to find records in one table
that did not match a record in another, the recommended solution was to use a
"NOT IN" or "NOT EXISTS" based query. It is now recommended
that you use outer joins instead. Here is a simple example using all three
possibilities – they should all produce the same results:
set nocount on
go
create table #left (leftI int)
create table #right (rightI int)
insert #left select 1 union select 2 union select 3
insert #right select 2 union select 3 union select 4 union select 5 union select 6
select *
from #right
where rightI not in (
select leftI from #left
)
select *
from #right
where not exists (
select * from #left where leftI = rightI
)
select #right.*
from #left right outer join #right on leftI = rightI
where leftI IS NULL
drop table #left
drop table #right
set nocount off
go
Beware of falling into the trap we discussed earlier. Looking
at our previous "Clustered Index" example, either of these two
following old-style queries would produce a list of system tables that do not
have clustered indices:
select *
from sysobjects o
where type = "S" and
id not in (
select id
from sysindexes
where indid = 1
)
select *
from sysobjects o
where type = "S" and
not exists (
select id
from sysindexes i
where indid = 1 and
o.id = i.id
)
Here is a simple translation of the above queries into the new
"left outer join" style. Try it out and see what happens.
select *
from sysobjects o left outer join sysindexes i on o.id = i.id
where type = "S" and
i.indid = 1 AND i.id IS NULL
No rows are returned.
The query now fails because the where i.indid = 1
clause makes no sense when we are looking for rows that do not exist in
sysindexes.
To get around this, the following example uses a "derived
query" which forces a select on sysindexes to return a subset of rows with indid
=1, which is subsequently used in the outer join.
select *
from sysobjects o left outer join
(select * from sysindexes i where indid = 1) as i on o.id = i.id
where type = "S" and
i.id IS NULL
In effect, the derived query -
(select * from sysindexes i where indid =
1) as i
creates a notional table called "i", which contains
a copy of all the sysindexes rows with an indid of 1. It is this notional table
that is then fed to the rest of the query. The final results should now match
the NOT EXISTS and NOT IN examples shown above.
About the author
Neil Boyle is an independant SQL Server consultant working out of London, England. Neil's free SQL Server guide is available on-line at http://www.impetus-sql.co.uk