Recursion with a Twist

  • Phil Parkin (11/29/2012)


    I wrote the above response before trying Craig's solution, which certainly does the business - thanks! Maybe this is one instance where a loop is the best solution. I'll leave it a while longer (no pun intended) before I implement it, to see whether anyone comes up with a set-based solution.

    I poked at it a bit more yesterday and while I can make that process look cleaner (it's a bit disorganized) it's the best solution I have available without digging into indexes and possible shortcuts for some of the sub-data, like the BookingID list. The killer is in the self-referencing bookingIDs and the bi-directional hierarchy, where you're going both 'up and down' the chain simultaneously.

    Glad to help, though. 🙂


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Hi

    The following recursive option might work. It walks up and down from the queried invoice and joins the results.

    DECLARE @invoiceID int = 3

    ;with rListUp as (

    select a.invoiceid invoiceid, a.invoiceid currentid, a.bookingid, b.invoiceid nextinvoiceid

    from #IDs a

    CROSS APPLY (SELECT invoiceid, bookingid FROM #IDs m WHERE m.bookingid = a.bookingid ) b

    where b.invoiceid > a.invoiceid

    union all

    select a.invoiceid, a.nextinvoiceid, a.bookingid, c.invoiceid

    from rListUp a

    CROSS APPLY (SELECT l.invoiceid, l.bookingid FROM #IDs l WHERE l.invoiceID = a.nextinvoiceid) b

    CROSS APPLY (SELECT m.invoiceid, m.bookingid FROM #IDs m WHERE m.bookingid = b.bookingid and m.invoiceid > a.nextinvoiceid ) c

    ),

    rListDown as (

    select a.invoiceid invoiceid, a.invoiceid currentid, a.bookingid, b.invoiceid nextinvoiceid

    from #IDs a

    CROSS APPLY (SELECT invoiceid, bookingid FROM #IDs m WHERE m.bookingid = a.bookingid ) b

    where b.invoiceid < a.invoiceid

    union all

    select a.invoiceid, a.nextinvoiceid, a.bookingid, c.invoiceid

    from rListDown a

    CROSS APPLY (SELECT l.invoiceid, l.bookingid FROM #IDs l WHERE l.invoiceID = a.nextinvoiceid) b

    CROSS APPLY (SELECT m.invoiceid, m.bookingid FROM #IDs m WHERE m.bookingid = b.bookingid and m.invoiceid < a.nextinvoiceid ) c

    ),

    rList AS (

    SELECT invoiceid, nextinvoiceid FROM rListUp

    UNION

    SELECT invoiceid, nextinvoiceid FROM rListDown

    )

    select invoiceid, bookingid

    from #IDs

    where invoiceid in (

    select nextinvoiceid from rlist where invoiceID = @InvoiceID

    union

    select @InvoiceID

    )

    It could probably be prettied up a bit more and there is still a chance that recursion limits will get hit if there is a long change of invoice/bookings

  • mickyT (11/29/2012)


    dwain.c (11/29/2012)


    To all who participated in this thread:

    You may wish to review this new article:

    http://www.sqlservercentral.com/articles/String+Manipulation/94365/

    It provides a good utility function for solving this case and many other similar ones. Thanks to the OP for being the inspiration for the article!

    Thanks Dwain

    I've already had a look at it and I am going to have a really good read through it soon. Even though I haven't had a chance to dig into the functionality and digest it properly I was impressed with the methods you came up with. Great article:-)

    Thanks for noticing and reading it, and the praise!

    By all means leave some comments in the discussion thread and don't forget to rate the article!


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Hi Phil - allow me to propose something for you.

    DECLARE @RowCount INT

    DECLARE @IDs TABLE

    (InvoiceId int not null

    ,BookingId int not null)

    INSERT INTO @IDs

    SELECT InvoiceID, BookingID

    FROM #IDs

    WHERE InvoiceID = 1

    SET @RowCount = @@ROWCOUNT

    WHILE @RowCount <> 0

    BEGIN

    INSERT INTO @IDs

    SELECT a.InvoiceID, a.BookingID

    FROM #IDs a

    INNER JOIN @IDs b ON a.BookingID = b.BookingID

    UNION ALL

    SELECT a.InvoiceID, a.BookingID

    FROM #IDs a

    INNER JOIN @IDs b ON a.InvoiceID = b.InvoiceID

    EXCEPT

    SELECT a.InvoiceID, a.BookingID

    FROM @IDs a

    SET @RowCount = @@ROWCOUNT

    END

    SELECT * FROM @IDs

    I think this resolves the original and the alternate data setup (from Kraig?) properly. As it turns out, I recently solved an problem quite similar to this and this solution has its basis in that.

    Not to disappoint you Alan (as this uses a loop), it is at least a set-based loop and in the similar problem I solved any attempt at a recursive solution did not perform nearly as well. I have not done that testing here so I can't say for sure whether it will be faster or not.

    The other advantage to this approach is that it eliminates feedback loops which will completely disintegrate any effort to use a rCTE.

    Let me know of any cases it doesn't resolve. It is possible that another UNION ALL amidst the loop will take care of it.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Post deleted - SSC hung and posted this one to the wrong thread.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • No loops, no cursors, no cte

    IF OBJECT_ID('tempdb..#TempTable') IS NOT NULL

    DROP TABLE #TempTable

    CREATE TABLE #TempTable

    (

    InvoiceId INT NOT NULL

    ,BookingId INT NOT NULL

    ,PRIMARY KEY (InvoiceId,BookingId)

    )

    INSERT #TempTable

    (

    InvoiceId

    ,BookingId

    )

    SELECT

    *

    FROM

    ( VALUES ( 1, 9), ( 1, 10), ( 1, 11), ( 2, 11), ( 3, 11), ( 3, 12), ( 3, 13), ( 4, 14), ( 5, 14) ) data (InvoiceId,BookingId)

    SELECT DISTINCT

    t2.InvoiceId

    ,t2.BookingId

    FROM

    (

    SELECT

    MAX(BookingId) AS BookingId

    FROM

    #TempTable

    GROUP BY

    BookingId

    ) AS Sub1

    INNER JOIN

    #TempTable AS t1

    ON Sub1.BookingId = t1.BookingId

    INNER JOIN

    #TempTable AS t2

    ON t1.BookingId = t2.BookingId + 1

  • Steven Willis (11/30/2012)


    No loops, no cursors, no cte

    IF OBJECT_ID('tempdb..#TempTable') IS NOT NULL

    DROP TABLE #TempTable

    CREATE TABLE #TempTable

    (

    InvoiceId INT NOT NULL

    ,BookingId INT NOT NULL

    ,PRIMARY KEY (InvoiceId,BookingId)

    )

    INSERT #TempTable

    (

    InvoiceId

    ,BookingId

    )

    SELECT

    *

    FROM

    ( VALUES ( 1, 9), ( 1, 10), ( 1, 11), ( 2, 11), ( 3, 11), ( 3, 12), ( 3, 13), ( 4, 14), ( 5, 14) ) data (InvoiceId,BookingId)

    SELECT DISTINCT

    t2.InvoiceId

    ,t2.BookingId

    FROM

    (

    SELECT

    MAX(BookingId) AS BookingId

    FROM

    #TempTable

    GROUP BY

    BookingId

    ) AS Sub1

    INNER JOIN

    #TempTable AS t1

    ON Sub1.BookingId = t1.BookingId

    INNER JOIN

    #TempTable AS t2

    ON t1.BookingId = t2.BookingId + 1

    The problem with your code is that if the value (3,14) is present it does not get picked up and neither does (4,14) or (5,14)

    IF OBJECT_ID('tempdb..#TempTable') IS NOT NULL

    DROP TABLE #TempTable

    CREATE TABLE #TempTable

    (

    InvoiceId INT NOT NULL

    ,BookingId INT NOT NULL

    ,PRIMARY KEY (InvoiceId,BookingId)

    )

    INSERT #TempTable

    (

    InvoiceId

    ,BookingId

    )

    SELECT

    *

    FROM

    ( VALUES ( 1, 9), ( 1, 10), ( 1, 11), ( 2, 11), ( 3, 11), ( 3, 12), ( 3, 13), (3,14), ( 4, 14), ( 5, 14) ) data (InvoiceId,BookingId)

    so far an extension of my adjacency list by searching up and down (not the fastest thanks Mickey T. by the way) and dwains loop solution are the only ones that seem to work fully on the set of data we have.


    For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

    For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

    Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
    Jeff Moden's Cross tab and Pivots Part 1[/url]
    Jeff Moden's Cross tab and Pivots Part 2[/url]

  • capnhector (11/30/2012)

    so far an extension of my adjacency list by searching up and down (not the fastest thanks Mickey T. by the way) and dwains loop solution are the only ones that seem to work fully on the set of data we have.

    Erm, that was mine, as well. Dwain's is basically a variation. I still need a chance to check performance between the two.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Steven Willis (11/30/2012)


    No loops, no cursors, no cte

    You've answered the data, not the problem, with this. You cannot recurse in both directions, you cannot deal with more than 2 layers of heirarchy. You need to be able to traverse bi-directionally across bookingIDs and InvoiceIDs as well as deal with multiple invoice-booking-invoice-booking-invoice chains.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • OK, sorry. I only read the first few posts and made some faulty assumptions. After considering the last few comments I can see that it really is more complicated than it first appeared. Mea culpa.

     

  • Alan.B (11/28/2012)


    Evil Kraig F (11/28/2012)


    This is NOT pretty, but it IS functional.

    I am trying to understand why went with a loop vs a set-based approach. I posted a more set based method earlier that gets the same results notably faster. I am not trying to be confrontational, I'm just trying to understand your approach or what I did wrong.

    If the problem requires that, no matter which InvoiceID you pick, all associated rows for all associated InvoiceIDs will be returned, then it actually can't be done in what most folks consider to be a "set based" manner. There needs to be some form of recursion either as a WHILE loop or as an rCTE (Recursive CTE).

    I will say that the WHILE loop and the rCTE method can more closely come to "set based" in that sets of rows can be returned for each iteration but it can't be done in a single itermation (what most folks think of as "set based") without giving up on the trait of "unknown depth" as some of the set based solutions on this thread have.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Post withdrawn. I made a mistake in the code somewhere. I'll post it as a new post if I can figure out what the heck I did wrong and fix it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • *wipes away a tear from his eye* Damn, Jeff... that's... that's just beautiful. *sniff*

    I couldn't for the life of me figure out how to get out of the IN (SELECT InvoiceID From rCTE) situation.

    Only one problem.

    It doesn't recurse past the first series of booking IDs. Check this out:

    if object_id('tempdb..#IDs') is not null

    drop table #IDs

    create table #IDs (

    InvoiceId int not null

    ,BookingId int not null

    ) on [PRIMARY]

    go

    alter table #IDs add constraint PK_IDs primary key clustered (

    InvoiceId

    ,BookingId

    )

    with (

    STATISTICS_NORECOMPUTE = off

    ,IGNORE_DUP_KEY = off

    ,ALLOW_ROW_LOCKS = on

    ,ALLOW_PAGE_LOCKS = on

    ) on [PRIMARY]

    go

    INSERT #IDs VALUES (1,9)

    INSERT #IDs VALUES (1,10)

    INSERT #IDs VALUES (1,11)

    INSERT #IDs VALUES (2,11)

    INSERT #IDs VALUES (2,12)

    INSERT #IDs VALUES (3,12)

    INSERT #IDs VALUES (3,13)

    INSERT #IDs VALUES (4,14)

    INSERT #IDs VALUES (5,14)

    GO

    --insert #IDs ( InvoiceId, BookingId)

    --select * from (values (1,9), (1,10), (1,11), (2,11), (3,11), (3,12), (3,13), (4,14), (5,14)) data(InvoiceId,BookingId)

    select * from #IDs

    --=================================================================================

    -- Solve the problem using some "standard" "Traveling Salesman" or "Airline

    -- Flight Schedule" code. I believe you'll be absolutely amazed as the

    -- performance of this code.

    -- This code could easily be turned into an iTVF to be used in CROSS APPLY as

    -- if it were a parameterized view.

    --=================================================================================

    --===== Start a timer because I don't trust SET STATISTICS any more (1).

    -- Ref (1): http://www.sqlservercentral.com/articles/T-SQL/91724/

    DECLARE @StartTime DATETIME;

    SELECT @StartTime = GETDATE();

    --===== Declare a variable for the original InvoiceID that we want to find and

    -- stack the others against. This is like identifying the "Start City"

    -- in a "Traveling Salesman" problem.

    DECLARE @StartInvoiceID INT;

    SELECT @StartInvoiceID = 1; --<<< Change this to do other tests.

    --<<< Any number from 1 to 300,000 will do.

    --===== Solve the problem as if it were a "Traveling Salesman" problem. The key

    -- to this is to leave "breadcrumbs" as to where we've been by using a

    -- "Materialized Hierarchical Path" (the "BeenThere" column).

    WITH

    cteBuildPath AS

    ( --=== Find the "top" ID as we would in any hierarchy or net.

    SELECT a.InvoiceId, a.BookingID,

    BeenThere = CAST(QUOTENAME(a.InvoiceID,'"') AS VARCHAR(8000))

    FROM #IDs AS a --as in "anchor"

    WHERE InvoiceID = @StartInvoiceID

    UNION ALL

    --==== "Loop" through all of the possibilities with some "stop code" to prevent

    -- the inevitable cycles that will appear because it's not a "DAG".

    SELECT b.InvoiceID, b.BookingID,

    BeenThere = r.BeenThere + CAST(QUOTENAME(b.InvoiceID,'"') AS VARCHAR(8000))

    FROM #IDs b --as in "base table"

    JOIN cteBuildPath r --as in "recursive part"

    ON r.BookingID = b.BookingId

    WHERE r.BeenThere NOT LIKE '%'+QUOTENAME(b.InvoiceId,'"')+'%'

    )

    SELECT * FROM cteBuildPath

    --

    ----=== Find the full data for the InvoiceID's that we've touched at least once.

    -- -- Note that the WHERE IN acts as if it were a DISTINCT so no need to explicity

    -- -- use the word DISTINCT.

    -- SELECT o.*

    -- FROM #IDs o --as in "original"

    -- WHERE o.InvoiceID IN (SELECT InvoiceId FROM cteBuildPath)

    -- ORDER BY o.InvoiceID, o.BookingID

    --;

    --===== Let's see how long that took.

    PRINT 'Total Duration = ' + CONVERT(CHAR(12),GETDATE()-@StartTime,114)

    GO

    The result set stops at InvoiceID 2, and doesn't chain to 3 after picking up the BookingIDs for Invoice 2. Sorry for the horrible INSERT statement but I've only got 2k5 at the house.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Heh... dammit. I had it working and made a final tweek to the code. Now I have to go back and figure out what the heck I changed. Sorry for the mistake. I'll fix it (I think) and will repost it. I took down my original post because it was wrong.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Looking at the data that you've included, Craig, I'm thinking that maybe I didn't have it working at all when it comes to "disconnected" data like you did with the 2,12 entry. Thanks for the test. Back to the drawing board. :blush:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 16 through 30 (of 35 total)

You must be logged in to reply to this topic. Login to reply