January 9, 2015 at 6:00 pm
I am looking for a technique to try and find missing FK constraints in a database. I know you cannot necessarily detect these, but using something like this: http://blogs.lessthandot.com/index.php/DataMgmt/DataDesign/missing-foreign-key-constraints/
How would you determine if the FK should exist and then create it. Show 2-3 examples of where you may, or may not, create an FK.
September 22, 2015 at 8:52 am
I'll give this one a go Steve. It dovetails nicely with some work I'm going to be doing on RedShift and Teradata.
September 22, 2015 at 9:36 am
My initial thought is using information_schema views or sys tables to enumerate what table/column combinations constitute primary keys, join to any other tables that contains the same column combination, while excluding any tables which already contain a foreign key on those same columns. That would be more comprehensive than the referenced example which seems to assume that primary keys have "ID" in the name and contain only that one column.
I'm in a new job, so I don't have bandwidth to complete the actual code, but here is some starter code for anyone who wants to mix and bake it:
SELECT ku.TABLE_SCHEMA,ku.TABLE_NAME,ku.COLUMN_NAME
FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS AS tc
INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE AS ku
ON tc.CONSTRAINT_TYPE = 'PRIMARY KEY'
AND tc.CONSTRAINT_NAME = ku.CONSTRAINT_NAME
ORDER BY ku.TABLE_SCHEMA,ku.TABLE_NAME,ku.COLUMN_NAME;
SELECT ku.TABLE_SCHEMA,ku.TABLE_NAME,ku.COLUMN_NAME
FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS AS tc
INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE AS ku
ON tc.CONSTRAINT_TYPE = 'FOREIGN KEY'
AND tc.CONSTRAINT_NAME = ku.CONSTRAINT_NAME
ORDER BY ku.TABLE_SCHEMA,ku.TABLE_NAME,ku.COLUMN_NAME;
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
September 22, 2015 at 10:19 am
Take it, David.
September 22, 2015 at 12:22 pm
If (as I think you do?) you re-use column names in other tables then if you have CustomerID in your Customer Table, as the PKey or as a Unique constraint/index, then if there are columns on other tables also called CustomerID they would be potential candidates for a FKey.
My naming, which we've debated and you are not a fan ;-), would be:
EDIT: Sorry, it was Sean I debated this with, not you, apologies.
cus_id on the customer table
and
inv_cus_id on the invoice table.
We always use the full column name of the column from "the other table", so I do reliably use this to detect any column usage within another table and report where FKeys are missing.
But I don't think my cus_id / inv_cus_id naming is any different to your CustomerID naming (in this regard) - provided that you use it reliably, and "everywhere"?
But maybe I've misunderstood the question and you want to go beyond this?
It did cross my mind whether it was worth checking SQL statements (e.g. from Profiler or some DMV) and parsing the JOIN conditions out of them for anything that didn't match an FKey. I suppose the query planner only considers indexes (and not FKeys?) and therefore you cannot get a "suggestion" for an FKey out of the DMVs as a side effect?
September 22, 2015 at 1:53 pm
Using the contents of the query plan cache, you can also parse the JOIN conditions of SELECT statement, to infer foreign key relationships. That would compensate for situations where foreign key columns are given different names. I don't think I've ever seen that attempted before, but it's theoretically sound. It would require at least a full uninterrupted day to code up.
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
September 22, 2015 at 3:03 pm
I've been playing with this sort of thing for a while.
[Identify candidate FK where a table contains the equivalent fields as those participating in aa PK[/li]
Text parsing SYS.comments for joins in views should also help Identify candidates but is significantly more involved.
I Have a working box of bits for Vertica,, Teradata, MySql, SQL server and I'm looking to experiment with RedShift in the near future. I think working out how to do this for Postgres would be handy.
September 23, 2015 at 3:03 am
We have a filter index including WHERE MyKeyColumn IS NOT NULL on some of our FKey-supporting indexes - e.g. where we have an FKey on a User Defined Column which may not be used / may not be used much.
Might be an idea to consider if the FKey candidate column(s) is Nullable or not?
September 23, 2015 at 5:38 am
I've got a script i call missing and implied foreign keys that i built over the years.
it's basically a peel the onion kind of attack, which kind of assumes some naming conventions are in place(the FK column exactly equals or contains the PK column name, maybe with a prefix/suffix.
find all PK's and UQ's, and then any columns that equal or contain, but are not featured in the current FK's
--current PK/UQ columns
;WITH CurrFKS
AS (SELECT Object_name(constid) AS constraintname,
OBJECT_SCHEMA_NAME(rkeyid) AS refschema,
Object_name(rkeyid) AS reftable,
COL_NAME(rkeyid, rkey) AS refcolumn,
OBJECT_SCHEMA_NAME(rkeyid) AS fkschema,
Object_name(fkeyid) AS fktable,
COL_NAME(fkeyid, fkey) AS fkcolumn,
' ALTER TABLE '
+ Object_name(fkeyid)
+ ' ADD CONSTRAINT [PLACEHOLDER]'
+ ' FOREIGN KEY ('
+ COL_NAME(fkeyid, fkey)
+ ') REFERENCES '
+ Object_name(rkeyid) + '('
+ COL_NAME(rkeyid, rkey) + ')' AS fksql
FROM sysforeignkeys
),
MyCTE
AS (SELECT Object_schema_name(colz.object_id) AS FKSchemaName,
Object_name(colz.object_id) AS FKTableName,
colz.NAME AS FKColumnName,
colz.column_id AS FKcolumn_id,
CurrentReferenceCandidates.*
FROM sys.columns colz
INNER JOIN (SELECT
idxz.object_id,
object_schema_name(idxz.object_id) As PKSchemaName,
object_name(idxz.object_id) As PKTableName,
pcolz.name As PKColumnName,
pcolz.column_id As PKColumnID,
idxz.is_primary_key ,
idxz.is_unique
from sys.indexes idxz
inner join sys.index_columns icolz
on idxz.object_id = icolz.object_id
inner join sys.columns pcolz
ON icolz.object_id = pcolz.object_id
AND icolz.column_id = pcolz.column_id
WHERE (idxz.is_primary_key = 1 OR idxz.is_unique = 1)
and object_schema_name(idxz.object_id) <> 'sys') CurrentReferenceCandidates
ON colz.NAME = CurrentReferenceCandidates.PKColumnName
----ON colz.name like '%' + CurrentReferenceCandidates.PKColumnName + '%'
AND colz.object_id != CurrentReferenceCandidates.object_id
WHERE Object_Schema_name(colz.object_id) != 'sys'
AND colz.object_id != CurrentReferenceCandidates.object_id
AND Object_schema_name(colz.object_id) = pkschemaname
AND colz.name = CurrentReferenceCandidates.PKColumnName --toggle below for second version
--AND colz.name like '%' + CurrentReferenceCandidates.PKColumnName + '%'
) SELECT ' ALTER TABLE ' + FKSchemaName + '.'
+ FKTableName + ' ADD CONSTRAINT [PLACEHOLDER]'
-- + FKTableName + '_' + FKColumnName + ']'
+ ' FOREIGN KEY (' + FKColumnName
+ ') REFERENCES ' + pkschemaname + '.'
+ PKTableName + '(' + PKColumnName + ')' AS fksql
FROM MYCTE
WHERE FKTableName NOT IN(SELECT NAME
FROM sys.views)
EXCEPT
SELECT FKSQL
FROM CurrFKS;
then i have variations of it, which assume someone didn't create primary keys in the first place.
two versions, one which assumes any identity columns should be a PK but the PK is missing, and the last version where i assume the first column in a table is the potential PK, and compare other tables based on the same naming convention assumption.
Lowell
January 1, 2016 at 7:53 am
Nearly finished my solution to this. Turns out to be far more involved and have loads of gotchas than a casual approach would reveal.
January 4, 2016 at 1:50 pm
Thanks, looking forward to it.
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply