Sorted UPDATE statement with hierarchical self-join

  • Hi,

    I have the following UPDATE statement of a table joined with itself:

    UPDATE childTbl

    SET childTbl.Date1 = parentTbl.Date1

    FROM dbo.tbl childTbl (NOLOCK)

    INNER JOIN dbo.tbl parentTbl (NOLOCK)

    ON childTbl.PrevID = parentTbl.ID

    AND childTbl.Col2 = parentTbl.Col2

    Columns ID and PrevID are of type INT.

    Col2 is of type VARCHAR.

    The table has a clustered PK on column ID; it has about 2 million records.

    The data contained in the table is structured in a hierarchical way, for example:

    ID PrevID Date1 Col2

    ------------------------------------

    5 4 2010-12-11 abc

    6 5 2010-12-12 def

    7 6 2010-12-12 ghi

    ...

    I need to be able to run the UPDATE above in a sorted way because it is important that for every record updated, all the "parent" records be updated first.

    However, ORDER BY is not allowed with the UPDATE.

    How can I get around this? Should I re-write the UPDATE as a CTE?

    Something like this?

    ;WITH CTE AS

    (SELECT TOP (5000000) CCTR.Date1 AS CHILD, PCTR.Date1 AS PARENT

    from dbo.tbl childTbl (NOLOCK)

    INNER JOIN dbo.tbl parentTbl (NOLOCK)

    ON childTbl.PrevID = parentTbl.ID

    AND childTbl.Col2 = parentTbl.Col2

    order by parentTbl.ID

    )

    update CTE

    SET CHILD = PARENT;

    Thank you for any responses!

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

  • How do you identify top-level nodes, that is rows where the PrevID column does not reference any other parent row? Is the PrevID column NULL valued in this case, or some other fixed value such as zero, or may be it holds the same value as the ID column?

    What is the approximate number of levels in this hierarchy?

    Is there any possibility of circular references?

    Why are you attempting to use the NOLOCK hint in an UPDATE query? I think SQL Server will ignore the NOLOCK hint in this case, but it's a dangerous hint to use in any UPDATE statement.

    Can you post the DDL for the table please and some easily consumable test data (i.e. in the form of INSERT statements).

  • andrewd.smith (12/13/2010)


    How do you identify top-level nodes, that is rows where the PrevID column does not reference any other parent row? Is the PrevID column NULL valued in this case, or some other fixed value such as zero, or may be it holds the same value as the ID column?

    What is the approximate number of levels in this hierarchy?

    Is there any possibility of circular references?

    Why are you attempting to use the NOLOCK hint in an UPDATE query? I think SQL Server will ignore the NOLOCK hint in this case, but it's a dangerous hint to use in any UPDATE statement.

    Can you post the DDL for the table please and some easily consumable test data (i.e. in the form of INSERT statements).

    Hi,

    In answer to your questions:

    How do you identify top-level nodes, that is rows where the PrevID column does not reference any other parent row? Is the PrevID column NULL valued in this case, or some other fixed value such as zero, or may be it holds the same value as the ID column?

    For top-level nodes PrevID is NULL.

    What is the approximate number of levels in this hierarchy?

    I don't have an answer to this at the moment, will try to get it for you.

    Is there any possibility of circular references?

    No.

    Why are you attempting to use the NOLOCK hint in an UPDATE query? I think SQL Server will ignore the NOLOCK hint in this case, but it's a dangerous hint to use in any UPDATE statement.

    I agree, I will need to ask the developer that question.

    Can you post the DDL for the table please and some easily consumable test data (i.e. in the form of INSERT statements).

    Here is the DDL:

    CREATE TABLE [tbl](

    [ID] [int] IDENTITY(1,1) NOT NULL,

    [Col2] [VARCHAR(30)] NOT NULL,

    [Date1] [datetime] NULL,

    [PrevID] [int] NULL,

    CONSTRAINT [PK_tbl] PRIMARY KEY CLUSTERED

    (

    [ID] ASC

    ) ON [DATA]

    ) ON [DATA]

    Here is a script to populate the table:

    DECLARE @cnt INT = 1

    while (1=1)

    BEGIN

    insert into [tbl] values ('xyz', DATEADD(hh,@cnt,getdate()), @cnt -1)

    SET @cnt = @cnt + 1

    END

    Thanks for your time!

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

  • It might be possible to perform the update using a single recursive CTE query, but I'm not sure that in this case it would be the most efficient method.

    Here is an iterative method that uses the OUTPUT clause of the UPDATE statement to populate a temporary table named #Visted that lists all the nodes in the hierarchy that have been processed up to that point. I haven't tested it on a table with a large number of rows, so I'm not sure how well it will perform.

    CREATE TABLE #Visited (

    ID int NOT NULL,

    Depth int NOT NULL,

    PRIMARY KEY CLUSTERED (Depth, ID)

    )

    DECLARE @N int

    DECLARE @Depth int

    /* Insert root nodes into #Visited table */

    INSERT INTO #Visited(ID, Depth)

    SELECT ID, 0 FROM #tbl WHERE (PrevID IS NULL)

    SELECT @N = @@ROWCOUNT, @Depth = 0

    WHILE (@N > 0) BEGIN

    /* Link to parent nodes from previous level of hierarchy */

    UPDATE tblChild SET Date1 = tblParent.Date1

    OUTPUT inserted.ID, @Depth + 1 INTO #Visited(ID, Depth)

    FROM #tbl tblChild

    INNER JOIN #tbl tblParent ON (tblParent.ID = tblChild.PrevID)

    INNER JOIN #Visited V ON (V.ID = tblParent.ID)

    WHERE (tblParent.Col2 = tblChild.Col2)

    AND (V.Depth = @Depth)

    SELECT @N = @@ROWCOUNT, @Depth = @Depth + 1

    END

    DROP TABLE #Visited

  • What's wrong with the DDL I posted?

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply