Most of the time developers that need to get data will use a SELECT statement to retrieve the data. The query that a developer will use often will use a join between tables. In this article, we will look at how an outer join is different from an inner join and what this means for your queries.
The Difference Between Inner and Outer Joins
If your data is in multiple tables, you will often will use a join between the tables in a form like this:
SELECT u.DisplayName , b.Name AS BadgeName FROM dbo.Users AS u INNER JOIN dbo.Badges AS b ON u.Id = b.Id; GO
In this case, we are looking at the StackOverflow database (which I downloaded from links that Brent Ozar has on his site), and getting a list of users and their badges. This means that for each user that has earned a badge, I get a row back with the user DisplayName and the BadgeName. This is an inner join, and is an intersection of the data in the tables. This is shown in the image below.
In this diagram, we see the area where the circles overlap, with the values of John, Mary, Teacher, and Stellar Question, is an intersection. This is the inner join, where items in both sets are included in the results.
An outer join will include all data from one set. For example, if I recolor the diagram slightly, we see that a left outer join would be the area below in yellow. All the values from the Users circle, plus the values from Badges circle the overlap, are included.
The left in this case refers to the actual item on the left. A right outer join would include the circle on the right, with the overlapped area in the middle. In T-SQL code, we would see the left outer join expressed in our code like this:
SELECT u.DisplayName , b.Name AS BadgeName FROM dbo.Users AS u LEFT OUTER JOIN dbo.Badges AS b ON u.Id = b.Id; GO
In this case, we get all the users with their matching badges. However, if there is a user without a badge, we get their user DisplayName and a NULL for the badge.
How To Write These Joins
One of the problems many developers have with this construct is they confuse the way to build these joins. They often confuse the structure and order of execution of the outer join. This isn't that different from an inner join, but we often think about it differently.
A normal, or inner join, is the default. If you just include the JOIN keyword, the default is an inner join. For example, this produces just the same results as the query at the top of this article:
SELECT u.Id, u.DisplayName , b.Name AS BadgeName FROM dbo.Users AS u JOIN dbo.Badges AS b ON u.Id = b.Id GO
When we add the OUTER keyword, we need to add a LEFT, RIGHT, or FULL before it. This determines the type of join. In the case of a RIGHT or LEFT OUTER JOIN, the table on the right or left (literally) of the clause will return all its rows. For example, this query returns all rows from the Users table.
SELECT u.DisplayName , b.Name AS BadgeName FROM dbo.Users AS u LEFT OUTER JOIN dbo.Badges AS b ON u.Id = b.Id; GO
This is because the Users table is on the left. If I write the code differently, this is easy to see.
SELECT u.DisplayName , b.Name AS BadgeName FROM dbo.Users AS u LEFT OUTER JOIN dbo.Badges AS b ON u.Id = b.Id; GO
If we change to a RIGHT OUTER JOIN, then we would write this, and get all rows in the Badges table.
SELECT u.DisplayName , b.Name AS BadgeName FROM dbo.Users AS u RIGHT OUTER JOIN dbo.Badges AS b ON u.Id = b.Id; GO
There isn't much of a difference between these two, other than you must look to the right or left of the clause to determine which table is returning all its rows. For the values that don't have a match in the ON clause, NULL is returned for the other table. An example is shown here, where some of the users do not have a badge.
If we scroll further down, you will see there are rows with matching values between the tables that include a BadgeName, and rows that don't, with NULL.
A FULL OUTER JOIN includes all rows from both tables, but with the NULL Values for each side when the rows don't match. A quick repro of this is shown below with this code:
CREATE TABLE dbo.LeftTable (Id INT NOT NULL, UserName VARCHAR(100) NOT NULL) GO INSERT dbo.LeftTable (Id, UserName) VALUES (1, 'Bob'), (2, 'Bill'), (3, 'Susan') GO CREATE TABLE dbo.RightTable (Id INT NOT NULL, BadgeName VARCHAR(100) NOT NULL) GO INSERT dbo.RightTable (Id, BadgeName) VALUES (2, 'Teacher'), (3, 'Scholar'), (4, 'Master') GO SELECT * FROM dbo.LeftTable AS lt FULL OUTER JOIN dbo.RightTable AS rt ON lt.Id = rt.Id
And the results of the query are shown here:
There are NULL Values from both tables where there are not matching rows.
A Gotcha
One of the common problems people have with outer joins is that they expect to qualify or filter rows in the ON clause. This isn't the place for filtering. Instead, this is the place where we determine the matches and whether or not to show nulls for columns being selected. I have often found junior developers writing a query like this, with the expectation that they will find values that have nulls.
SELECT * FROM dbo.LeftTable AS lt LEFT OUTER JOIN dbo.RightTable AS rt ON lt.Id = rt.Id AND rt.Id IS NULL
I have used my smaller example, as the data is easier to understand. What results would you expect here? Just row 1? Instead, here is what is returned:
This is what we get with the regular left join. What most people expect with the above query is just the row with Id = 1.
Why is this? It's a quirk that we get all the rows from the left table, but no matches on the right table. After all, no rows in the Badges table have NULL. As a result, we still get all the rows from the left table, with NULLs for the non-matching rows on the right, which is all the rows.
What we want to do is filter our results, which is done in the WHERE clause. We should write the code like this:
SELECT * FROM dbo.LeftTable AS lt LEFT OUTER JOIN dbo.RightTable AS rt ON lt.Id = rt.Id WHERE rt.Id IS NULL
This gives me the results I expect, which is the Users that do not have any badges.
Summary
Outer joins are a way of returning all data from a table along with matching rows from another table. This is often used to see all customers, along with their orders, but ensuring I see customers without orders. This is also common when we want to see all classes with students, but also the classes without any students.
A common mistake is putting filtering in the ON clause instead of the WHERE clause. These are not interchangeable, and execute at different times, so be sure that if you are trying to filter rows from a result set, you do so in the proper place, the WHERE clause.
One final note on performance. While outer joins can be very helpful and handy, they do come with a performance penalty of sorts. Since all rows must be read from one table, this means a complete scan of the table. While reducing the columns returned to those in a non-clustered index can help, this is often an expensive operation, especially on large tables. Use these queries judiciously in your application.