There are many production databases that in practice follow a set of rules such as these:
- All tables must be created with an integer column named '%ID' to be used as the primary key for the table.
- Wherever the name of such an '%ID' column is used in another table, it must have a logical foreign key relation to the same-named primary key column, but do not create the physical foreign keys in the database.
- The tables used by the central applications must be in third normal form, but It's OK to create de-normalized tables if needed.
Recently, I was tasked with removing all data for a particular enterprise from all versions of a database. Some backup copies of the database could have tables that no longer exist in the current version. I discovered that the above rules were implicit in this database, so I wrote a simple recursion to find join paths to the "enterprise ID" in whatever tables are found in a version of the database.
This is a simplified model of tables in the database:
IF OBJECT_ID('ksl.enterprise') IS NOT NULL DROP TABLE ksl.enterprise; IF OBJECT_ID('ksl.company') IS NOT NULL DROP TABLE ksl.company; IF OBJECT_ID('ksl.division') IS NOT NULL DROP TABLE ksl.division; IF OBJECT_ID('ksl.department') IS NOT NULL DROP TABLE ksl.department; IF OBJECT_ID('ksl.oneOff') IS NOT NULL DROP TABLE ksl.oneOff; IF OBJECT_ID('ksl.oneOffChild') IS NOT NULL DROP TABLE ksl.oneOffChild; IF OBJECT_ID('ksl.denormalizedCollection') IS NOT NULL DROP TABLE ksl.denormalizedCollection; CREATE TABLE ksl.enterprise ( entID INT PRIMARY KEY , attributes VARCHAR(20) ); CREATE TABLE ksl.company ( companyID INT PRIMARY KEY , entID INT NOT NULL , attributes VARCHAR(20) ); CREATE TABLE ksl.division ( divID INT PRIMARY KEY , companyID INT NOT NULL , attributes VARCHAR(20) ); CREATE TABLE ksl.department ( deptID INT PRIMARY KEY , divID INT NOT NULL , attributes VARCHAR(20) ); CREATE TABLE ksl.oneOff ( oneOffID INT PRIMARY KEY , companyID INT NOT NULL , attributes VARCHAR(20) ); CREATE TABLE ksl.oneOffChild ( oneOffChildID INT PRIMARY KEY , oneOffID INT NOT NULL , attributes VARCHAR(20) ); CREATE TABLE ksl.denormalizedCollection ( denormCollId INT PRIMARY KEY , divID INT NOT NULL , entID INT NOT NULL , attributes VARCHAR(20) );
The "mother of all tables" is enterprise, which has entID as its primary key. The enterprise has many companies, represented in table company, and each company has many divisions, and each division has many departments. The task is to delete all data related to a particular entID value, the value that that identifies the "gone for good" enterprise. This is a database that has grown to hundreds of tables, their various structures represented by tables such as: oneOff, oneOffChild, and denormalizedCollection.
Before we proceed to find all join paths to enterprise.entID, we look at the SQL Server version and collation, so we won't be surprised by the SQL Server system catalog or case sensitivity. For my task, I used this code and received these results:
select @@version,serverproperty('collation')
Note that I created the model of the database in a non-dbo schema. The recursion to find joins requires a table that I create in the non-dbo schema, ksl, to keep it from being considered as another of the hundreds of tables that have to be examined for data related to the "gone for good" entID. Here is T-SQL used to manage schemas:
/* create schema ksl authorization dbo select schema_id from sys.schemas where name='ksl' schema_id --------- 6 */
The setup for the simple recursion is a "select into" from the system catalog to create a table in the ksl schema. The query selects the set of tablename.columnname candidates for finding joins between tables, that is, all columns named '%ID' in the relevant schema, with their table names. It is the naming convention that is followed for key columns that makes the join path to entID useful. The query creates an integer column named "level" and assigns the value 99 to all records selected for that column. This default value for the set of tablename.columnname is intended to mean "there is no join path from this column to an entID column."
The opposite of "there is no join path" is "this is a table with the entID column." The value for "top level of the hierarchy" is 0 (zero). The name of the ID column at the top of the enterprise hierarchy is storeded in the variable @topID. An update statement updates level=99 to level=0 for every table.column that is a column named entID or is a column in a table that has a column named entID. (The first condition is a subset of the second condition, but is stated explicitly to emphasize that identifying all join paths from all tables to entID is the goal.)
The recursion to find joins is simple, performed by a "while loop" that increments the value that is written to the column named level. At start, the value of the @level variable is 0 (zero), representing the subset of table.column values that are tables that include the entID column. From all the table.column values that are level=99, mark those that join to the ones that are level=0 as level=1. Increment the @level value by 1; from all the table.column values that are level=99, mark those that join to the ones that are level=1 as level=2. Keep looping, traversing the join path, until no joins are found. Here is the code:
IF OBJECT_ID('ksl.findjoins') IS NOT NULL DROP TABLE ksl.findjoins DECLARE @topID SYSNAME SET @topID = 'entId' SELECT o.name AS tblname , c.name AS colname , 99 AS level INTO ksl.findjoins FROM sys.objects o JOIN sys.columns c ON o.object_id = c.object_id AND c.name LIKE '%id' AND o.schema_id = 6 UPDATE ksl.findjoins SET level = 0 WHERE colname = @topID OR tblname IN ( SELECT tblname FROM ksl.findjoins WHERE colname = @topID ) DECLARE @level TINYINT SET @level = 0 DECLARE @rowcount INT WHILE 1 = 1 BEGIN UPDATE ksl.findjoins SET level = @level + 1 WHERE ( colname IN ( SELECT colname FROM ksl.findjoins WHERE level = @level ) OR tblname IN ( SELECT tblname FROM ksl.findjoins WHERE level = @level ) ) AND level = 99 SET @rowcount = @@rowcount IF @rowcount = 0 BREAK SET @level = @level + 1 END
This table, ksl.findjoins, holds the all data we need to write all the necessary delete statements -- "delete @tablename where @fkColumn in (select @pkColumn from @refTablename where entID = 'goneForGood')".
SELECT * FROM ksl.findjoins ORDER BY level , colname , tblname
This gives us:
...
Any columns named '%ID' that have no join path to entID will remain level = 99. In this simplified database there are only seven tables , all of which do have a join to entID, the join found by looping to level <= 4.
The join paths are made evident with a view on table, findjoins, augmented by the SQL Server catalog to identify columns that are the primary key columns. The minimum level value for each table in findjoins is the shortest join path to entID. There is one row in the view for each table, with the "min level" and the column to use to join and the indicator 'PK' if that column is a primary key column.
IF OBJECT_ID('ksl.vFindJoinsMinLevel', 'V') IS NOT NULL DROP VIEW ksl.vFindJoinsMinLevel go CREATE VIEW ksl.vFindJoinsMinLevel AS SELECT t.tblname , t.colname , x.minlevel , CASE WHEN sc.name IS NULL THEN '' ELSE 'PK' END AS isPK FROM ksl.findjoins t JOIN ( SELECT tblname , MIN(level) AS minlevel FROM ksl.findJoins GROUP BY tblname ) x ON t.tblname = x.tblname AND t.level = x.minlevel JOIN sys.objects so ON t.tblname = so.name AND so.schema_id = 6 AND so.type = 'U' LEFT JOIN sys.indexes si ON si.object_id = so.object_id LEFT JOIN sys.index_columns sic ON si.object_id = sic.object_id AND si.index_id = sic.index_id LEFT JOIN sys.columns sc ON sc.column_id = sic.column_id AND sc.object_id = so.object_id AND sc.name = t.colname AND si.is_primary_key = 1
SELECT * FROM ksl.vFindJoinsMinLevel ORDER BY 3 , 2 , 1
This query displays a join path to entID for a single table that is marked by the recursion as "level = 3":
DECLARE @level3Tablename SYSNAME
SET @level3Tablename = 'oneOffChild'
SELECT l0.tblname AS l0tblname
, l0.colname AS l0colname
, l1.tblname AS l1tblname
, l1.colname AS l1colname
, l2.tblname AS l2tblname
, l2.colname AS l2colname
, l3.tblname AS l3tblname
, l3.colname AS l3colname
FROM ksl.findjoins l0
JOIN ksl.findjoins l1
ON l0.colname = l1.colname
JOIN ksl.findjoins l2
ON l1.tblname = l2.tblname
JOIN ksl.findjoins l3
ON l2.colname = l3.colname
WHERE l3.tblname = @level3Tablename
AND l3.level = 3
AND l2.level = 2
AND l1.level = 1
AND l0.level = 0
ORDER BY 8
, 6
, 4
, 1
SELECT * FROM oneOffChild WHERE oneOffId IN ( SELECT oneOffId FROM oneOff WHERE companyID IN ( SELECT companyID FROM company WHERE entID = 12 --'goneForGood' ) )
or
DELETE ooc FROM oneOffChild ooc JOIN oneOff oo ON ooc.oneOffId = oo.oneOffId JOIN company co ON oo.companyID = co.companyID WHERE co.entID = 12 --'goneForGood'