Recursive CTE: Is it possible to count records without an aggregate function?

  • sgmunson (2/3/2010)


    Why? That's not the path this is going to take... The level is only going to be relative to the root level, not to any other value. The hierarchy shown only has 3 levels - the root, the 1st level and the 2nd level. It traversed the hierarchy quite correctly, from my perspective. Also, the root level was designated as level 0, so you have a 0-based setup here. The 3rd level is thus level 2 instead of 3. If I'm somehow confused here, please point out where and why... I'm always eager to learn...

    Steve

    (aka sgmunson)

    :-):-):-)

    Chris Morris-439714 (2/3/2010)


    The lvl value should be 3 where the parentid value is 3. The windowing function worked fine, but the lvl increment in the recursive part of the CTE didn't.

    Heh you know what? You're exactly right! My apologies Steve. I was so bent on finding something wrong that I invented it.

    The query works. Rows with parentID 2 & 3 are hierarchy level 2, just as you say, and just as the query outputs.

    I was running this against a db set to compat level 100, SQL Server 2008. Will post up service pack info tomorrow.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • Interesting... I tried the "simple repro" query that Steve Kass submitted and sure enough, row_number() does not get properly evaluated - it's as if it gets a fresh evaluation from scratch for each row in the iteration. I also agree with Vladimir that "no allowing ROW_NUMBER or other ranking functions in CTE is like going back in time ..."

    Here's my most recent attempt at solving the balls/boxes problem recursively, but I know it shouldn't work correctly because the query following EXCEPT should ideally be one that takes every box id from BOX_MATCHES and pairs it with all possible ball ids, and then every ball id from BOX_MATCHES, pairing with all possible box ids. If I could create such a query without more than one reference to BOX_MATCHES, the problem would be solved.

    ;WITH BOX_MATCHES AS (

    SELECT BoxId, MIN(BallId) AS BallId --, MIN(PreferenceId) AS PreferenceId

    FROM TC22_Preferences

    GROUP BY BoxId

    HAVING COUNT(DISTINCT BallId) = 1

    UNION ALL

    SELECT BoxId, BallId --, PreferenceId

    FROM (

    SELECT BoxId, BallId, /*PreferenceId,*/ COUNT(TESTER) OVER (PARTITION BY BallId) AS COUNTER

    FROM (

    SELECT BoxId, BallId, 1 AS TESTER

    FROM TC22_Preferences

    EXCEPT (

    SELECT BoxId, BallId, 1 AS TESTER

    FROM BOX_MATCHES

    )

    ) X

    ) BM

    WHERE BM.COUNTER = 1

    )

    SELECT BOX.BoxName AS Box, BALL.BallName As Ball

    FROM BOX_MATCHES BM INNER JOIN TC22_Boxes BOX ON BM.BoxId = BOX.BoxId

    INNER JOIN TC22_Balls BALL ON BM.BallId = BALL.BallId

    --ORDER BY BM.BoxId

    OPTION (MAXRECURSION 0)

    See the attachment for the xml execution plan.

    Steve

    (aka sgmunson)

    :-):-):-)

    Paul White (2/3/2010)


    Steve (aka sgmunson) πŸ™‚ πŸ™‚ πŸ™‚
    Rent Servers for Income (picks and shovels strategy)

  • Paul White (2/3/2010)


    Thanks for posting this, Paul, it's made for a very interesting lunchtime.

    If you add a third anchor row, like this...

    with T(i, rn, lvl) as (

    select 0, cast(0 as bigint), 0 -- anchor 1

    union all

    select 1, 0, 0 -- anchor 2

    union all

    select 2, 0, 0 -- anchor 3

    union all

    select i+2, row_number() over (order by NEWID()), lvl+1

    from T

    where i<10

    )

    select i, rn, lvl from T

    order by i;

    ... then it makes sense.

    Here's the return:

    Rowirnlvl

    1000

    2100

    3200

    4211

    5311

    6411

    7412

    8512

    9612

    10613

    11713

    12813

    13814

    14914

    151014

    161015

    171115

    The rows are explained as follows:

    Row 1 Anchor row 1

    Row 2 Anchor row 2

    Row 3 Anchor row 3

    Row 4 Anchor row 1, first recursion: single-row source (Row 1) and product

    Row 5 Anchor row 2, first recursion: single-row source (Row 2) and product

    Row 6 Anchor row 3, first recursion: single-row source (Row 3) and product

    Row 7 Anchor row 1 second recursion: single-row source (first recursion, Row 4) and product

    Row 8 Anchor row 2 second recursion: single-row source (first recursion, Row 5) and product

    Row 9 Anchor row 3 second recursion: single-row source (first recursion, Row 6) and product

    Row 10 Anchor row 1 third recursion: single-row source (second recursion, Row 7)

    Row 11 Anchor row 2 third recursion: single-row source (second recursion, Row 8)

    Row 12 Anchor row 3 third recursion: single-row source (second recursion, Row 9)

    and so on.

    T as feed to each recursion (I think of it as the "recursion buffer", someone must know what Microsoft call it) only ever contains one row in this query, so ROW_NUMBER() will always return 1. The recursion buffer only contains the results of the last recursion for that anchor, not an accumulation of all recursions.

    Cheers

    ChrisM

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • sgmunson (2/4/2010)


    row_number() does not get properly evaluated - it's as if it gets a fresh evaluation from scratch for each row in the iteration

    But what if it's correctly "evaluating" only one row?

    Steve, I've PM'd you my solution to the puzzle rather than posting it here, don't want to get into trouble again :blush:

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Perhaps I misunderstand how this is supposed to work... I would have thought that the objective of allowing recursion is precisely to gather a series of records, and that each iteration through the recursion, by virtue of it's recursive reference, sees the entire set of data so far. If that's not the case, then I can see why there's a problem. However, what if a given iteration adds two rows to the set? Does that mean the next iteration only sees one of those rows, or is it just that each iteration only sees all the rows from previous one?

    Steve

    (aka sgmunson)

    :-):-):-)

    Chris Morris-439714 (2/4/2010)


    sgmunson (2/4/2010)


    row_number() does not get properly evaluated - it's as if it gets a fresh evaluation from scratch for each row in the iteration

    But what if it's correctly "evaluating" only one row?

    Steve, I've PM'd you my solution to the puzzle rather than posting it here, don't want to get into trouble again :blush:

    Steve (aka sgmunson) πŸ™‚ πŸ™‚ πŸ™‚
    Rent Servers for Income (picks and shovels strategy)

  • Just to let everyone know, Chris' solution works quite nicely. The key is in realizing that recursive CTE's recursive references only retrieve the data from the last iteration, and NOT the data from any previous iteration, so the net result of most of my previous attempts was inherently doomed. Let's hope Microsoft sees fit to provide a means to get past that limitation. I also think they need to figure out how to allow the aggregate functions as well as GROUP BY, and stop putting so many limitations on recursive CTEs.

    Steve

    (aka sgmunson)

    :-):-):-)

    Chris Morris-439714 (2/4/2010)


    sgmunson (2/4/2010)


    row_number() does not get properly evaluated - it's as if it gets a fresh evaluation from scratch for each row in the iteration

    But what if it's correctly "evaluating" only one row?

    Steve, I've PM'd you my solution to the puzzle rather than posting it here, don't want to get into trouble again :blush:

    Steve (aka sgmunson) πŸ™‚ πŸ™‚ πŸ™‚
    Rent Servers for Income (picks and shovels strategy)

  • Paul White (2/3/2010)


    Hi Paul

    This has turned out to be a real can of worms. I couldn't see anything wrong using the example code and data (earlier post in same thread) but figured it was worth a closer look. The attached code shows a simple non-hierarchical recursive CTE and the functional equivalent using a series of CTE's. The outputs of each are entirely consistent across all columns except where windowing functions have been used to attempt to measure the intermediate results. The recursive CTE correctly measures the first result set as having a rowcount of 2, but subsequent result sets, which should all have a rowcount of 2, appear to be incorrectly measured, returning a rowcount of 1.

    Here's the code, results are posted underneath.

    --------------------------------------------

    -- Make some data to play with

    --------------------------------------------

    DROP TABLE #Sample

    CREATE TABLE #Sample (RowID CHAR(1), Number INT)

    INSERT INTO #Sample (RowID, Number)

    VALUES ('A',1), ('B',2), ('C',3), ('D',4), ('E',5), ('F',5), ('G',6), ('H',7), ('I',8), ('J',9)

    --------------------------------------------

    -- A simple CTE recursion, no hierarchy

    -- Similar to "Join to the next row"

    --------------------------------------------

    ;WITH MyCTE AS (

    SELECT

    RowID AS sRowID, CAST('' AS CHAR(1)) AS CTERowID,

    Number AS sNumber, 0 AS CTEnumber,

    0 AS Recurse,

    1 AS BufferRowCount,

    CAST(1 AS INT) AS BufferRowID

    FROM #Sample WHERE Number = 1

    UNION ALL

    SELECT s.RowID, cte.sRowID,

    s.Number, cte.sNumber,

    cte.Recurse+1,

    COUNT(cte.sNumber) OVER(PARTITION BY 'X'),

    CAST(ROW_NUMBER() OVER (ORDER BY s.RowID, cte.sRowID) AS INT)

    FROM #Sample s

    INNER JOIN MyCTE cte ON cte.sNumber + 1 = s.Number

    ) SELECT * FROM MyCTE

    ORDER BY sRowID, CTERowID

    --------------------------------------------

    -- Now model this process using regular CTEs

    --------------------------------------------

    ;WITH

    CTE1 as (

    SELECT RowID AS sRowID, CAST('' AS CHAR(1)) AS CTERowID,

    Number AS sNumber, 0 AS CTEnumber,

    0 AS Recurse,

    1 AS BufferRowCount,

    CAST(1 AS INT) AS BufferRowID

    FROM #Sample WHERE Number = 1),

    CTE2 as (

    SELECT s.RowID AS sRowID, c.sRowID AS CTERowID,

    s.Number AS sNumber, c.sNumber AS CTEnumber,

    Recurse = c.Recurse+1,

    BufferRowCount = COUNT(c.sNumber) OVER(PARTITION BY 'X'),

    BufferRowID = CAST(ROW_NUMBER() OVER (ORDER BY s.RowID, c.sRowID) AS INT)

    FROM #Sample s

    INNER JOIN CTE1 c ON c.sNumber + 1 = s.Number),

    CTE3 AS (

    SELECT s.RowID AS sRowID, c.sRowID AS CTERowID,

    s.Number AS sNumber, c.sNumber AS CTEnumber,

    Recurse = c.Recurse+1,

    BufferRowCount = COUNT(c.sNumber) OVER(PARTITION BY 'X'),

    BufferRowID = CAST(ROW_NUMBER() OVER (ORDER BY s.RowID, c.sRowID) AS INT)

    FROM #Sample s

    INNER JOIN CTE2 c ON c.sNumber + 1 = s.Number),

    CTE4 AS (

    SELECT s.RowID AS sRowID, c.sRowID AS CTERowID,

    s.Number AS sNumber, c.sNumber AS CTEnumber,

    Recurse = c.Recurse+1,

    BufferRowCount = COUNT(c.sNumber) OVER(PARTITION BY 'X'),

    BufferRowID = CAST(ROW_NUMBER() OVER (ORDER BY s.RowID, c.sRowID) AS INT)

    FROM #Sample s

    INNER JOIN CTE3 c ON c.sNumber + 1 = s.Number),

    CTE5 AS (

    SELECT s.RowID AS sRowID, c.sRowID AS CTERowID,

    s.Number AS sNumber, c.sNumber AS CTEnumber,

    Recurse = c.Recurse+1,

    BufferRowCount = COUNT(c.sNumber) OVER(PARTITION BY 'X'),

    BufferRowID = CAST(ROW_NUMBER() OVER (ORDER BY s.RowID, c.sRowID) AS INT)

    FROM #Sample s

    INNER JOIN CTE4 c ON c.sNumber + 1 = s.Number),

    CTE6 AS (

    SELECT s.RowID AS sRowID, c.sRowID AS CTERowID,

    s.Number AS sNumber, c.sNumber AS CTEnumber,

    Recurse = c.Recurse+1,

    BufferRowCount = COUNT(c.sNumber) OVER(PARTITION BY 'X'),

    BufferRowID = CAST(ROW_NUMBER() OVER (ORDER BY s.RowID, c.sRowID) AS INT)

    FROM #Sample s

    INNER JOIN CTE5 c ON c.sNumber + 1 = s.Number),

    CTE7 AS (

    SELECT s.RowID AS sRowID, c.sRowID AS CTERowID,

    s.Number AS sNumber, c.sNumber AS CTEnumber,

    Recurse = c.Recurse+1,

    BufferRowCount = COUNT(c.sNumber) OVER(PARTITION BY 'X'),

    BufferRowID = CAST(ROW_NUMBER() OVER (ORDER BY s.RowID, c.sRowID) AS INT)

    FROM #Sample s

    INNER JOIN CTE6 c ON c.sNumber + 1 = s.Number),

    CTE8 AS (

    SELECT s.RowID AS sRowID, c.sRowID AS CTERowID,

    s.Number AS sNumber, c.sNumber AS CTEnumber,

    Recurse = c.Recurse+1,

    BufferRowCount = COUNT(c.sNumber) OVER(PARTITION BY 'X'),

    BufferRowID = CAST(ROW_NUMBER() OVER (ORDER BY s.RowID, c.sRowID) AS INT)

    FROM #Sample s

    INNER JOIN CTE7 c ON c.sNumber + 1 = s.Number),

    CTE9 AS (

    SELECT s.RowID AS sRowID, c.sRowID AS CTERowID,

    s.Number AS sNumber, c.sNumber AS CTEnumber,

    Recurse = c.Recurse+1,

    BufferRowCount = COUNT(c.sNumber) OVER(PARTITION BY 'X'),

    BufferRowID = CAST(ROW_NUMBER() OVER (ORDER BY s.RowID, c.sRowID) AS INT)

    FROM #Sample s

    INNER JOIN CTE8 c ON c.sNumber + 1 = s.Number)

    SELECT * FROM CTE1

    UNION ALL

    SELECT * FROM CTE2

    UNION ALL

    SELECT * FROM CTE3

    UNION ALL

    SELECT * FROM CTE4

    UNION ALL

    SELECT * FROM CTE5

    UNION ALL

    SELECT * FROM CTE6

    UNION ALL

    SELECT * FROM CTE7

    UNION ALL

    SELECT * FROM CTE8

    UNION ALL

    SELECT * FROM CTE9

    Results for recursive CTE:

    s_RowIDCTE_RowIDs_NumberCTE_numberRecurseBuffer_RowCountBuffer_RowID

    A 10011

    BA21111

    CB32211

    DC43311

    ED54421

    FD54422

    GE65511

    GF65511

    HG76611

    HG76611

    IH87711

    IH87711

    JI98811

    JI98811

    Results from a series of CTE's:

    sRowIDCTERowIDsNumberCTEnumberRecurseBufferRowCountBufferRowID

    A 10011

    BA21111

    CB32211

    DC43311

    ED54421

    FD54422

    GE65521

    GF65522

    HG76621

    HG76622

    IH87721

    IH87722

    JI98821

    JI98822

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 8 posts - 31 through 37 (of 37 total)

You must be logged in to reply to this topic. Login to reply