Most people are very aware of the fact that having a foreign key constraint in place on your tables adds overhead to inserts and deletes. And many people believe that there is no benefit to foreign keys beyond referential integrity (which, the application can handle perfectly well, right?). But is that an accurate statement? Here’s the basis for our investigation, a query in AdventureWorks2008R2:
SELECT p.LastName + ', ' + p.FirstName AS 'PersonName' FROM Person.Address AS a JOIN Person.BusinessEntityAddress AS bea ON a.AddressID = bea.AddressID JOIN Person.BusinessEntity AS be ON bea.BusinessEntityID = be.BusinessEntityID JOIN Person.Person AS p ON be.BusinessEntityID = p.BusinessEntityID;
This query results in the following execution plan:
I know that is an ugly query and an ugly query plan, but bear with me for a moment. Do you notice anything about the query and the plan at this point? Count the tables and the access operators. That’s four tables and two access operators. That means, that despite the fact that I listed four tables in the query, the optimizer was smart enough to figure out that it only needed to pull data from two of the tables and completely ignored the others because, the key values in the Person.Person table and the Person.BusinessEntityAddress table were the same. It didn’t even bother with the Address table since nothing from that table is in the WHERE or SELECT and, it skipped the BusinessEntity table because, the foriegn key relationships ensure that the data in the other tables can be trusted.
Don’t believe me? For those trying this at home, run this script in AdventureWorks2008R2:
SELECT * INTO dbo.MyAddress FROM Person.Address; SELECT * INTO dbo.MyBusinessEntityAddress FROM Person.BusinessEntityAddress; SELECT * INTO dbo.MyBusinessEntity FROM Person.BusinessEntity; SELECT * INTO dbo.MyPerson FROM Person.Person;
Then modify the original query so that it looks like this:
SELECT p.LastName + ', ' + p.FirstName AS 'PersonName' FROM dbo.MyAddress AS a JOIN dbo.MyBusinessEntityAddress AS bea ON a.AddressID = bea.AddressID JOIN dbo.MyBusinessEntity AS be ON bea.BusinessEntityID = be.BusinessEntityID JOIN dbo.MyPerson AS p ON be.BusinessEntityID = p.BusinessEntityID
This query will produce the following execution plan:
That is the exact same query running against exact copies of the tables from the original, yet, now, instead of two operators and one join operation, we’re looking at four operators and three join operations. That my friends is the optimizer taking advantage of the fact that there are foreign keys in place that ensure trust in the data which enables the optimzer to eliminate unnecessary tables from the plan. Not only do the execution plans differ, but the execution time was about 1/6 slower, consistently, in the second query as it did all kinds of reads against the two tables that were eliminated in the original query plan.
I know some of you are saying, “Well, we can just put indexes on the tables to fix that problem, we still don’t need constraints.” OK. Let’s check it out. Here’s a script to put indexes in place, which will surely fix the heinous execution plan above:
ALTER TABLE dbo.MyAddress ADD CONSTRAINT PK_MyAddress_AddressID PRIMARY KEY CLUSTERED ( AddressID ASC ) CREATE NONCLUSTERED INDEX IX_MyBusinessEntityAddress_AddressID ON dbo.MyBusinessEntityAddress ( AddressID ASC ) ALTER TABLE dbo.MyBusinessEntityAddress ADD CONSTRAINT PK_MyBusinessEntityAddress_BusinessEntityID_AddressID_AddressTypeID PRIMARY KEY CLUSTERED ( BusinessEntityID ASC, AddressID ASC, AddressTypeID ASC ) ALTER TABLE dbo.MyBusinessEntity ADD CONSTRAINT PK_MyBusinessEntity_BusinessEntityID PRIMARY KEY CLUSTERED ( BusinessEntityID ASC ) USE [AdventureWorks2008R2] GO ALTER TABLE dbo.MyPerson ADD CONSTRAINT PK_Person_BusinessEntityID PRIMARY KEY CLUSTERED ( BusinessEntityID ASC )
After building out the indexes, we get this execution plan:
Oops. Except for the fact that we have Clustered Index Scan instead of Table Scan for most operators, the only real change to this is the inclusion of the Index Scan operation against the new index on the MyBusinessEntityAddress table. In short, no improvement at all.
OK, you might be thinking to yourself, what if this was a proper query and there was a WHERE clause. Let’s modify the queries:
SELECT p.LastName + ', ' + p.FirstName AS 'PersonName' FROM Person.Address AS a JOIN Person.BusinessEntityAddress AS bea ON a.AddressID = bea.AddressID JOIN Person.BusinessEntity AS be ON bea.BusinessEntityID = be.BusinessEntityID JOIN Person.Person AS p ON be.BusinessEntityID = p.BusinessEntityID WHERE p.LastName LIKE 'Ran%' SELECT p.LastName + ', ' + p.FirstName AS 'PersonName' FROM dbo.MyAddress AS a JOIN dbo.MyBusinessEntityAddress AS bea ON a.AddressID = bea.AddressID JOIN dbo.MyBusinessEntity AS be ON bea.BusinessEntityID = be.BusinessEntityID JOIN dbo.MyPerson AS p ON be.BusinessEntityID = p.BusinessEntityID WHERE p.LastName LIKE 'Ran%'
The first query resulted in this execution plan:
Now clearly, more tuning is probably possible here since we’re still looking at an Index Scan, but the point is not whether or not the query is tuned, the point is, that the optimizer can eliminate tables because of the trust created by the foreign key constraints. And what did the other query produce?
Okay… The less said the better. Once again, tuning opportunities do exist, but we’re still accessing tables where it is absolutely not necessary, as demonstrated by the original query and it’s execution plan. The data returned has been the same, each and every time.
Back to the question, do foreign key constraints help performance? Let me ask you one in return. Have you ever looked to see which was done more in your OLTP, reads or writes? If you haven’t, take a look. I’ll be the answer surprises you. My answer to the question, yes, it can. Not that it will, but that it can.