Find Missing FK Constraints and Fixing them

Question

Post reply

Find Missing FK Constraints and Fixing them

Steve Jones - SSC Editor

SSC Guru

Points: 741460
More actions
January 9, 2015 at 6:00 pm

#293629

I am looking for a technique to try and find missing FK constraints in a database. I know you cannot necessarily detect these, but using something like this: http://blogs.lessthandot.com/index.php/DataMgmt/DataDesign/missing-foreign-key-constraints/
How would you determine if the FK should exist and then create it. Show 2-3 examples of where you may, or may not, create an FK.

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

David.Poole SSC Guru Points: 76360 More actions · Answer 1

I'll give this one a go Steve. It dovetails nicely with some work I'm going to be doing on RedShift and Teradata.

LinkedIn Profile

Eric M Russell SSC Guru Points: 125623 More actions · Answer 2

My initial thought is using information_schema views or sys tables to enumerate what table/column combinations constitute primary keys, join to any other tables that contains the same column combination, while excluding any tables which already contain a foreign key on those same columns. That would be more comprehensive than the referenced example which seems to assume that primary keys have "ID" in the name and contain only that one column.

I'm in a new job, so I don't have bandwidth to complete the actual code, but here is some starter code for anyone who wants to mix and bake it:

SELECT ku.TABLE_SCHEMA,ku.TABLE_NAME,ku.COLUMN_NAME

FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS AS tc

INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE AS ku

ON tc.CONSTRAINT_TYPE = 'PRIMARY KEY'

AND tc.CONSTRAINT_NAME = ku.CONSTRAINT_NAME

ORDER BY ku.TABLE_SCHEMA,ku.TABLE_NAME,ku.COLUMN_NAME;

SELECT ku.TABLE_SCHEMA,ku.TABLE_NAME,ku.COLUMN_NAME

FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS AS tc

INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE AS ku

ON tc.CONSTRAINT_TYPE = 'FOREIGN KEY'

AND tc.CONSTRAINT_NAME = ku.CONSTRAINT_NAME

ORDER BY ku.TABLE_SCHEMA,ku.TABLE_NAME,ku.COLUMN_NAME;

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Steve Jones - SSC Editor SSC Guru Points: 741460 More actions · Answer 3

Steve Jones - SSC Editor

SSC Guru

Points: 741460

September 22, 2015 at 10:19 am

#1828491

Take it, David.

Kristen-173977 SSCrazy Eights Points: 8687 More actions · Answer 4

If (as I think you do?) you re-use column names in other tables then if you have CustomerID in your Customer Table, as the PKey or as a Unique constraint/index, then if there are columns on other tables also called CustomerID they would be potential candidates for a FKey.

My naming, which we've debated and you are not a fan ;-), would be:

EDIT: Sorry, it was Sean I debated this with, not you, apologies.

cus_id on the customer table

and

inv_cus_id on the invoice table.

We always use the full column name of the column from "the other table", so I do reliably use this to detect any column usage within another table and report where FKeys are missing.

But I don't think my cus_id / inv_cus_id naming is any different to your CustomerID naming (in this regard) - provided that you use it reliably, and "everywhere"?

But maybe I've misunderstood the question and you want to go beyond this?

It did cross my mind whether it was worth checking SQL statements (e.g. from Profiler or some DMV) and parsing the JOIN conditions out of them for anything that didn't match an FKey. I suppose the query planner only considers indexes (and not FKeys?) and therefore you cannot get a "suggestion" for an FKey out of the DMVs as a side effect?

Eric M Russell SSC Guru Points: 125623 More actions · Answer 5

Using the contents of the query plan cache, you can also parse the JOIN conditions of SELECT statement, to infer foreign key relationships. That would compensate for situations where foreign key columns are given different names. I don't think I've ever seen that attempted before, but it's theoretically sound. It would require at least a full uninterrupted day to code up.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

David.Poole SSC Guru Points: 76360 More actions · Answer 6

I've been playing with this sort of thing for a while.

Identify missing primary keys and unique constraints
Identify the number of fields in the key
Identify naming conventions for primary key fields and the equivalent in other tables

[Identify candidate FK where a table contains the equivalent fields as those participating in aa PK[/li]

Discount those where there is already an FK relationship
Test the data to precheck violations
Generate PK & FK scripts through automated means

Text parsing SYS.comments for joins in views should also help Identify candidates but is significantly more involved.

I Have a working box of bits for Vertica,, Teradata, MySql, SQL server and I'm looking to experiment with RedShift in the near future. I think working out how to do this for Postgres would be handy.

LinkedIn Profile

Kristen-173977 SSCrazy Eights Points: 8687 More actions · Answer 7

We have a filter index including WHERE MyKeyColumn IS NOT NULL on some of our FKey-supporting indexes - e.g. where we have an FKey on a User Defined Column which may not be used / may not be used much.

Might be an idea to consider if the FKey candidate column(s) is Nullable or not?

Lowell SSC Guru Points: 323755 More actions · Answer 8

I've got a script i call missing and implied foreign keys that i built over the years.

it's basically a peel the onion kind of attack, which kind of assumes some naming conventions are in place(the FK column exactly equals or contains the PK column name, maybe with a prefix/suffix.

find all PK's and UQ's, and then any columns that equal or contain, but are not featured in the current FK's

--current PK/UQ columns

;WITH CurrFKS

AS (SELECT Object_name(constid) AS constraintname,

OBJECT_SCHEMA_NAME(rkeyid) AS refschema,

Object_name(rkeyid) AS reftable,

COL_NAME(rkeyid, rkey) AS refcolumn,

OBJECT_SCHEMA_NAME(rkeyid) AS fkschema,

Object_name(fkeyid) AS fktable,

COL_NAME(fkeyid, fkey) AS fkcolumn,

' ALTER TABLE '

+ Object_name(fkeyid)

+ ' ADD CONSTRAINT [PLACEHOLDER]'

+ ' FOREIGN KEY ('

+ COL_NAME(fkeyid, fkey)

+ ') REFERENCES '

+ Object_name(rkeyid) + '('

+ COL_NAME(rkeyid, rkey) + ')' AS fksql

FROM sysforeignkeys

),

MyCTE

AS (SELECT Object_schema_name(colz.object_id) AS FKSchemaName,

Object_name(colz.object_id) AS FKTableName,

colz.NAME AS FKColumnName,

colz.column_id AS FKcolumn_id,

CurrentReferenceCandidates.*

FROM sys.columns colz

INNER JOIN (SELECT

idxz.object_id,

object_schema_name(idxz.object_id) As PKSchemaName,

object_name(idxz.object_id) As PKTableName,

pcolz.name As PKColumnName,

pcolz.column_id As PKColumnID,

idxz.is_primary_key ,

idxz.is_unique

from sys.indexes idxz

inner join sys.index_columns icolz

on idxz.object_id = icolz.object_id

inner join sys.columns pcolz

ON icolz.object_id = pcolz.object_id

AND icolz.column_id = pcolz.column_id

WHERE (idxz.is_primary_key = 1 OR idxz.is_unique = 1)

and object_schema_name(idxz.object_id) <> 'sys') CurrentReferenceCandidates

ON colz.NAME = CurrentReferenceCandidates.PKColumnName

----ON colz.name like '%' + CurrentReferenceCandidates.PKColumnName + '%'

AND colz.object_id != CurrentReferenceCandidates.object_id

WHERE Object_Schema_name(colz.object_id) != 'sys'

AND colz.object_id != CurrentReferenceCandidates.object_id

AND Object_schema_name(colz.object_id) = pkschemaname

AND colz.name = CurrentReferenceCandidates.PKColumnName --toggle below for second version

--AND colz.name like '%' + CurrentReferenceCandidates.PKColumnName + '%'

) SELECT ' ALTER TABLE ' + FKSchemaName + '.'

+ FKTableName + ' ADD CONSTRAINT [PLACEHOLDER]'

-- + FKTableName + '_' + FKColumnName + ']'

+ ' FOREIGN KEY (' + FKColumnName

+ ') REFERENCES ' + pkschemaname + '.'

+ PKTableName + '(' + PKColumnName + ')' AS fksql

FROM MYCTE

WHERE FKTableName NOT IN(SELECT NAME

FROM sys.views)

EXCEPT

SELECT FKSQL

FROM CurrFKS;

then i have variations of it, which assume someone didn't create primary keys in the first place.

two versions, one which assumes any identity columns should be a PK but the PK is missing, and the last version where i assume the first column in a table is the potential PK, and compare other tables based on the same naming convention assumption.

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

David.Poole SSC Guru Points: 76360 More actions · Answer 9

Nearly finished my solution to this. Turns out to be far more involved and have loads of gotchas than a casual approach would reveal.

LinkedIn Profile

Steve Jones - SSC Editor SSC Guru Points: 741460 More actions · Answer 10

Steve Jones - SSC Editor

SSC Guru

Points: 741460

January 4, 2016 at 1:50 pm

#1849349

Thanks, looking forward to it.