Finding nested references

I originally planned this post just as an answer to DBA Stack Overflow question: How can I get the list of tables in all the stored procedure? After preparing a lab environment, I think it deserves its own blog post.

The problem

The original question lists these requirements:

Information about all the tables referenced by a stored procedure
Also, list tables from cross-DB query
Return DB, Schema and Table names

The currently highest-rated answer uses sys.dm_sql_referenced_entities which is a bit of an overkill for given requirements. This DMF returns column granularity so to just get the list of tables you have to aggregate.

I think a better answer would use sys.sql_expression_dependencies (which is a subset of sys.dm_sql_referenced_entities anyway) that would give just the tables.

But I thought - what about views or functions or nested procedures? I’d argue those are part of the main procedure as well and they can refer to tables that are otherwise hidden.

It could also be ~~turtles~~ procedures all the way down. That means I’d have to write a recursive query or a loop. It could even have *shudders* cyclic references.

We’ll need a lab environment to test this.

The lab

I’ve used AI Claude to help me generate several objects so I can test my reference script. I haven’t bothered much with formatting on this one.

Two databases (to test cross-db query)
Scalar function, view and inline table-valued function (for nested references)
The main procedure with several statements
Nested procedure with cyclic reference

Here it is:

CREATE DATABASE Sideline
GO
CREATE DATABASE Main
GO
USE Sideline
GO
CREATE TABLE dbo.Employee
(
    EmployeeID int PRIMARY KEY
  , FirstName  nvarchar(50)
  , LastName   nvarchar(50)
  , Department nvarchar(50)
)
GO
CREATE TABLE dbo.Salary
(
    SalaryID      int PRIMARY KEY
  , EmployeeID    int
  , Amount        decimal(10, 2)
  , EffectiveDate date
)
GO
CREATE VIEW dbo.EmployeeSalaryView
AS
SELECT
    e.EmployeeID, e.FirstName, e.LastName, s.Amount
FROM
    dbo.Employee AS e
    JOIN dbo.Salary AS s
        ON e.EmployeeID = s.EmployeeID
GO
USE Main
GO
CREATE TABLE dbo.Project
(
    ProjectID   int PRIMARY KEY
  , ProjectName nvarchar(100)
  , StartDate   date
  , EndDate     date
)
GO
CREATE TABLE dbo.Task
(
    TaskID     int PRIMARY KEY
  , ProjectID  int
  , TaskName   nvarchar(100)
  , AssignedTo int
  , Status     nvarchar(20)
)
GO
CREATE TABLE dbo.TimeEntry
(
    TimeEntryID int PRIMARY KEY
  , TaskID      int
  , EmployeeID  int
  , Hours       decimal(5, 2)
  , EntryDate   date
)
GO
CREATE TABLE dbo.ProjectMilestone
(
    MilestoneID    int          PRIMARY KEY
  , ProjectID      int
  , MilestoneName  nvarchar(100)
  , TargetDate     date
  , CompletionDate date         NULL
  , Status         nvarchar(20)
)
GO
CREATE VIEW dbo.ProjectTaskView
AS
SELECT
    p.ProjectID, p.ProjectName, t.TaskID, t.TaskName, t.Status
FROM
    dbo.Project AS p
    JOIN dbo.Task AS t
        ON p.ProjectID = t.ProjectID
GO
CREATE FUNCTION dbo.GetProjectTasks (@ProjectID int)
RETURNS table
AS
RETURN
     (
         SELECT
             Task.TaskID, Task.TaskName, Task.Status
         FROM dbo.Task
         WHERE Task.ProjectID = @ProjectID
     )
GO
CREATE FUNCTION dbo.GetProjectStatus (@ProjectID int)
RETURNS nvarchar(20)
AS
BEGIN
    DECLARE @Status nvarchar(20)
    SELECT
        @Status = CASE
                      WHEN Project.EndDate < GETDATE ()
                          THEN 'Completed'
                      WHEN Project.StartDate > GETDATE ()
                          THEN 'Not Started'
                      ELSE 'In Progress'
                  END
    FROM dbo.Project
    WHERE Project.ProjectID = @ProjectID
    RETURN @Status
END
GO
/* Cannot create circular reference in single step, I'll alter it after the main proc is created */CREATE PROCEDURE dbo.NestedProc
AS
BEGIN
    SELECT
        te.TimeEntryID, te.TaskID, te.EmployeeID, te.Hours, te.EntryDate
    FROM dbo.TimeEntry AS te
END
GO
CREATE PROCEDURE dbo.MainProc
AS
BEGIN
    CREATE TABLE #NewProjects (ProjectID int)
    INSERT INTO #NewProjects (ProjectID)
    VALUES
        (1)
      , (2)
    -- Statement 1: Join with temp table
    SELECT
        np.ProjectID, pm.MilestoneID, pm.ProjectID, pm.MilestoneName, pm.TargetDate, pm.CompletionDate, pm.Status
    FROM
        #NewProjects AS np
        JOIN dbo.ProjectMilestone AS pm
            ON np.ProjectID = pm.ProjectID
    -- Statement 2: Referencing a view, scalar and itv functions
    SELECT
        pv.ProjectID
        , pv.ProjectName
        , dbo.GetProjectStatus (pv.ProjectID) AS ProjectStatus
        , t.TaskName
        , t.Status
    FROM
        dbo.Project AS p
        JOIN dbo.ProjectTaskView AS pv
            ON pv.ProjectID = p.ProjectID
        CROSS APPLY dbo.GetProjectTasks (pv.ProjectID) AS t
    -- Statement 3: Cross-DB references (table and view)
    SELECT
        t.TaskID, t.TaskName, t.Status, esv.EmployeeID, esv.FirstName, esv.LastName, esv.Amount AS Salary
    FROM
        Main.dbo.Task AS t
        JOIN Sideline.dbo.EmployeeSalaryView AS esv
            ON t.AssignedTo = esv.EmployeeID
        JOIN Sideline.dbo.Salary AS s
            ON esv.EmployeeID = s.EmployeeID
    WHERE t.Status = 'In Progress';
    -- Statement 4: Call nested procedure with cyclic reference
    EXEC dbo.NestedProc
END
GO
/* Add the cyclic reference */CREATE OR ALTER PROCEDURE dbo.NestedProc
AS
BEGIN
    SELECT
        te.TimeEntryID, te.TaskID, te.EmployeeID, te.Hours, te.EntryDate
    FROM dbo.TimeEntry AS te
    -- Circular call to MainProc
    EXEC dbo.MainProc
END
GO

Tests

Stack Overflow answer

Let’s the the original SO answer first. I will be checking the dbo.MainProc’s dependencies for all the tests.

SELECT DISTINCT
    referenced_schema_name,
    referenced_entity_name
FROM sys.dm_sql_referenced_entities('dbo.MainProc', 'OBJECT')
ORDER BY referenced_entity_name /* added for comparison with other results */

Stack Overflow answer returning 9 tables

It doesn’t show that the Salary table and view are from a different database but nothing that can’t be fixed with an additional column.

I’ll compare the raw output with sys.sql_expression_dependencies

sql_expression_dependencies

SELECT
      dsre.referenced_database_name
    , dsre.referenced_schema_name
    , dsre.referenced_entity_name
    , dsre.referenced_minor_name
    , dsre.referenced_id
    , dsre.referenced_minor_id
FROM sys.dm_sql_referenced_entities('dbo.MainProc', 'OBJECT') AS dsre
ORDER BY
    referenced_entity_name
    , dsre.referenced_minor_id
SELECT 
    sed.referenced_database_name
    , sed.referenced_schema_name
    , sed.referenced_entity_name
    , sed.referenced_id
    , sed.referenced_minor_id
FROM sys.sql_expression_dependencies AS sed
WHERE sed.referencing_id = OBJECT_ID('dbo.MainProc')
ORDER BY
    sed.referenced_entity_name
    , sed.referenced_minor_id

dm_sql_referenced_entities returns more rows due to the column granularity but has objectid info for cross-db references

(1) the first result set has 23 rows due to column information and has to be aggregated to get distinct tables…
(2) or filter only entries where referenced_minor_id = 0 - those refer to the object level dependency
(3) the sys.sql_expression_dependencies is missing the object_id for any cross-db references

I think both of these DMOs are useful and it’s good to know their differences.

The recursive solution

This is my script. It uses a recursive CTE to get all the nested references.

I’m using a cycle detection logic that I saw Itzik Ben-Gan (b) use somewhere.

DROP TABLE IF EXISTS #References
CREATE TABLE #References
(
    ObjectId       int            NOT NULL
    , DbId         int            NOT NULL
    , DbName       nvarchar(128)  NOT NULL
    , SchemaName   nvarchar(128)  NOT NULL
    , ObjectName   nvarchar(128)  NOT NULL
    , FullName     AS CONCAT(SchemaName, '.', ObjectName) PERSISTED
    , objectType   nvarchar(60)   NOT NULL
    , NestingLevel int            NOT NULL
    , NestingPath  varchar(MAX)   NOT NULL
    , SortPath     varbinary(MAX) NOT NULL
/* add indexes as needed */)
; WITH allReferences
AS
(
    SELECT
        dsre.referenced_id AS Id
        , ca.DbId
        --, dsre.referenced_database_name
        --, dsre.referenced_schema_name
        --, dsre.referenced_entity_name
        , 0 AS NestingLevel
        , CAST
        (
            CONCAT
            (
                '.'
                , CAST(ca.DbId AS VARCHAR(MAX))
                , '_'
                ,  CAST(dsre.referenced_id AS VARCHAR(MAX))
                , '.'
            )
            AS VARCHAR(max)
         ) AS NestingPath /* format: .dbId_ObjectId. */         , CAST(CAST(1 AS BINARY(2)) AS VARBINARY(MAX)) AS SortPath
         , 0 AS CycleDetection
    FROM
        sys.dm_sql_referenced_entities('dbo.MainProc', 'object') AS dsre 
        CROSS APPLY (VALUES (ISNULL(DB_ID(dsre.referenced_database_name), DB_ID()))) AS ca(DbId)
    WHERE
        dsre.referenced_minor_id = 0 /* Removing column reference granularity */    UNION ALL
    SELECT
      dsre.referenced_id
        , ca.DbId
        --, dsre.referenced_database_name
        --, dsre.referenced_schema_name
        --, dsre.referenced_entity_name
        , ar.NestingLevel + 1
        , CAST
        (
            CONCAT
            (
                ar.NestingPath /* append object to existing path */                , CAST(ca.DbId AS VARCHAR(MAX))
                , '_'
                ,  CAST(dsre.referenced_id AS VARCHAR(MAX))
                , '.' 
            )
            AS VARCHAR(max)
         ) AS NestingPath /* format: .dbId_ObjectId. */         , ar.SortPath + CAST(ROW_NUMBER() OVER 
         (
            PARTITION BY ca.dbId, dsre.referenced_id
            ORDER BY (SELECT @@SPID)) AS BINARY(2)
         ) AS SortPath
         , IIF
         (
            ar.NestingPath LIKE CONCAT
            (
                '%.'
                , CAST(ca.DbId AS VARCHAR(MAX))
                , '_'
                ,  CAST(dsre.referenced_id AS VARCHAR(MAX))
                , '.%'
            )
            , 1 /* if nestingPath already contains the same subpath it's a cycle */            , 0
         ) AS CycleDetection
    FROM allReferences AS ar
    CROSS APPLY sys.dm_sql_referenced_entities(CONCAT(OBJECT_SCHEMA_NAME(ar.Id), '.', OBJECT_NAME(ar.Id)), 'Object') AS dsre
    CROSS APPLY (VALUES (ISNULL(DB_ID(dsre.referenced_database_name), DB_ID()))) AS ca(DbId)
    WHERE
        ar.CycleDetection = 0
        AND dsre.referenced_id IS NOT NULL
        AND dsre.referenced_minor_id = 0 /* Removing column reference granularity */)
INSERT INTO #References WITH (TABLOCKX)
(
    ObjectId
    , DbId
    , DbName
    , SchemaName
    , ObjectName
    , objectType
    , NestingLevel
    , NestingPath
    , SortPath
)
SELECT
    ar.Id AS ObjectId
    , ar.DbId
    , DB_NAME(ar.DbId) AS DbName
    , OBJECT_SCHEMA_NAME(ar.Id, ar.DbId) AS SchemaName
    , OBJECT_NAME(ar.Id, ar.DbId) AS ObjectName
    , IIF
    (
        o.object_id IS NOT NULL /* same DB reference */        , o.type_desc
        , 'N/A'
    ) AS objectType
    , ar.NestingLevel
    , ar.NestingPath
    , ar.SortPath
FROM allReferences AS ar
LEFT JOIN sys.objects AS o
    ON ar.Id = o.object_id
    AND ar.DbId = DB_ID()
WHERE
    ar.CycleDetection = 0
SELECT
    *
FROM #References AS r

Recursive solution showing many columns and reference details

This uses the sys.dm_sql_referenced_entities because I don’t want to lose the cross-db object_id.

The cross-DB view dbo.EmployeeSalaryView is not expanded though because the DMOs are database-scoped. I didn’t want to complicate this even further with a dynamic SQL (or linked servers for that matter).

I’m also getting duplicates (due to nested references) and that’s why I’m inserting it into temp table #References so you can slice and dice the data in any way you want it (That’s the way you need it).

For example:

SELECT
      r.DbName
    , r.FullName
    , COUNT(1) AS Cnt
FROM #References AS r
GROUP BY
    r.DbName
    , r.FullName
ORDER BY
    Cnt DESC

Thank you for reading!