Recursion with a Twist

  • Hi - can't seem to get my head around this one, so any assistance is welcome.

    I am trying to write a query against a table of invoices, returning all relevant rows for a particular InvoiceId. Here's some sample data:

    --Create temp table to hold the dummy data

    if object_id('tempdb..#IDs') is not null

    drop table #IDs

    create table #IDs (

    InvoiceId int not null

    ,BookingId int not null

    ) on [PRIMARY]

    go

    alter table #IDs add constraint PK_IDs primary key clustered (

    InvoiceId

    ,BookingId

    )

    with (

    STATISTICS_NORECOMPUTE = off

    ,IGNORE_DUP_KEY = off

    ,ALLOW_ROW_LOCKS = on

    ,ALLOW_PAGE_LOCKS = on

    ) on [PRIMARY]

    go

    insert #IDs ( InvoiceId, BookingId)

    select * from (values (1,9), (1,10), (1,11), (2,11), (3,11), (3,12), (3,13), (4,14), (5,14)) data(InvoiceId,BookingId)

    select * from #IDs

    Now, imagine that we're interested in returning data for InvoiceId 1.

    What I would like to return is this:

    SELECT *

    FROM (

    VALUES

    (1,9),

    (1,10),

    (1,11),

    (2,11),

    (3,11),

    (3,12),

    (3,13)) x

    ([InvoiceId],[BookingId])

    The 'twist' should now be obvious: InvoiceId 1 has BookingIds which relate to other invoices, which in turn may have BookingIds which relate to other invoices. I want to return all related rows.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • If there is a row (3,14) would all your sample data then be returned because of (4,14) and (5,14)? from your description i think that is correct. just trying to wrap my head around it.


    For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

    For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

    Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
    Jeff Moden's Cross tab and Pivots Part 1[/url]
    Jeff Moden's Cross tab and Pivots Part 2[/url]

  • Correct.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • This is a bit tricky and I am still working on this. Here's what I have thusfar...

    --Create temp table to hold the dummy data

    if object_id('tempdb..#IDs') is not null

    drop table #IDs

    CREATE TABLE #IDs

    (

    InvoiceId int not null,

    BookingId int not null,

    primary key clustered (InvoiceId, BookingId)

    );

    INSERT #IDs VALUES (1,9), (1,10), (1,11), (2,11), (3,11), (3,12), (3,13), (4,14), (5,14)

    GO

    DECLARE @val int = 1;

    WITH x AS

    (

    SELECT i.InvoiceId, i.BookingId

    from

    (

    SELECT InvoiceId, BookingId

    from #IDs

    WHERE InvoiceId = @val

    ) a

    JOIN #IDs i ON a.BookingId = i.BookingId

    )

    SELECT DISTINCT i.*

    FROM x

    RIGHT JOIN #IDs i ON x.InvoiceId = i.InvoiceId

    WHERE x.InvoiceId IS NOT NULL

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • What i came up with was to turn it into an adjacency list hierarchy and then we can recourse through that. of course i could be completly off base here but here is the code to turn it into the adjacency list

    CREATE TABLE #IDsParents (

    InvoiceId int not null

    ,BookingId int not null

    ,ParentInvoiceID INT

    )

    insert #IDsParents ( InvoiceId, BookingId)

    select * from (values (1,9), (1,10), (1,11), (2,11), (3,11), (3,12), (3,13), (4,14), (5,14)) data(InvoiceId,BookingId)

    ;WITH Parent AS(

    SELECT a.InvoiceId, b.InvoiceId AS Parent

    FROM #IDsParents a

    LEFT JOIN #IDsParents b

    ON a.BookingId = b.BookingId

    AND a.InvoiceId > b.InvoiceId

    WHERE b.InvoiceId IS NOT NULL

    )

    UPDATE b SET ParentInvoiceID = a.Parent

    FROM Parent a

    RIGHT JOIN #IDsParents b

    ON a.InvoiceId = b.InvoiceId

    SELECT * FROM #IDsParents


    For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

    For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

    Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
    Jeff Moden's Cross tab and Pivots Part 1[/url]
    Jeff Moden's Cross tab and Pivots Part 2[/url]

  • capnhector (11/28/2012)


    What i came up with was to turn it into an adjacency list hierarchy and then we can recourse through that. of course i could be completly off base here but here is the code to turn it into the adjacency list

    [/code]

    The only problem I have with that solution method is that you can't traverse in both directions. IE: Pulling Invoice 3 won't get you back to 1 (or vice versa) depending on the parenting.

    The biggest problem with this recursion is that it's self-joining. For those curious:

    --am trying to write a query against a table of invoices, returning all relevant rows for a particular InvoiceId. Heres some sample data:

    --Create temp table to hold the dummy data

    if object_id('tempdb..#IDs') is not null

    drop table #IDs

    create table #IDs (

    InvoiceId int not null

    ,BookingId int not null

    ) on [PRIMARY]

    go

    alter table #IDs add constraint PK_IDs primary key clustered (

    InvoiceId

    ,BookingId

    )

    with (

    STATISTICS_NORECOMPUTE = off

    ,IGNORE_DUP_KEY = off

    ,ALLOW_ROW_LOCKS = on

    ,ALLOW_PAGE_LOCKS = on

    ) on [PRIMARY]

    go

    insert #IDs ( InvoiceId, BookingId)

    select * from (values (1,9), (1,10), (1,11), (2,11), (3,11), (3,12), (3,13), (4,14), (5,14)) data(InvoiceId,BookingId)

    select * from #IDs

    DECLARE @InvoiceID INT

    SET @InvoiceID = 1

    ;WITH rCTE AS

    (SELECT

    InvoiceID,

    BookingID,

    1 AS HierarchyLevel

    FROM

    #IDs

    WHERE

    InvoiceID = @InvoiceID

    UNION ALL

    SELECT

    #IDs.InvoiceID,

    ids2.BookingID,

    rCTE.HierarchyLevel + 1 AS HierarchyLevel

    FROM

    rCTE

    JOIN

    #IDs

    ON#IDs.BookingID = rCTE.BookingID

    JOIN

    #IDs AS ids2

    ON#IDs.InvoiceID = ids2.InvoiceID

    WHERE

    rCTE.HierarchyLevel + 1

    )

    SELECT * FROM rCTE

    I'm trying to work out a way to exclude previously included invoices from the detection list but you can't double-reference the CTE in the recursion, so NOT IN (SELECT) clauses and the like are out. I'm actually thinking this may need to be looped as a baseline and then worked on from there.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • This is NOT pretty, but it IS functional.

    --am trying to write a query against a table of invoices, returning all relevant rows for a particular InvoiceId. Heres some sample data:

    --Create temp table to hold the dummy data

    if object_id('tempdb..#IDs') is not null

    drop table #IDs

    if object_id('tempdb..#TempStore') is not null

    drop table #TempStore

    create table #IDs (

    InvoiceId int not null

    ,BookingId int not null

    ) on [PRIMARY]

    go

    alter table #IDs add constraint PK_IDs primary key clustered (

    InvoiceId

    ,BookingId

    )

    with (

    STATISTICS_NORECOMPUTE = off

    ,IGNORE_DUP_KEY = off

    ,ALLOW_ROW_LOCKS = on

    ,ALLOW_PAGE_LOCKS = on

    ) on [PRIMARY]

    go

    insert #IDs ( InvoiceId, BookingId)

    select * from (values (1,9), (1,10), (1,11), (2,11), (3,11), (3,12), (3,13), (4,14), (5,14)) data(InvoiceId,BookingId)

    select * from #IDs

    DECLARE @InvoiceID INT,

    @Rowcount INT

    SELECT @InvoiceID = 1,

    @Rowcount = 1

    CREATE TABLE #TempStore

    (InvoiceID INT, BookingID INT)

    INSERT INTO #TempStore

    SELECT

    InvoiceID, BookingID

    FROM

    #IDs

    WHERE

    InvoiceID = @InvoiceID

    -- Set this here, might as well not hit the loop if no records to work from.

    SELECT @Rowcount = @@ROWCOUNT

    WHILE @Rowcount <> 0

    BEGIN

    INSERT INTO #TempStore

    SELECT

    ids2.InvoiceID, ids2.BookingID

    FROM

    #IDs

    JOIN

    (SELECT DISTINCT BookingID FROM #TempStore) AS drv

    ON#IDs.BookingID = drv.BookingID

    JOIN

    #IDs AS ids2

    ON#IDs.InvoiceID = ids2.InvoiceID

    WHERE

    #IDs.InvoiceID NOT IN(SELECT DISTINCT InvoiceID FROM #TempStore)

    SET @Rowcount = @@ROWCOUNT

    END

    SELECT * FROM #TempStore


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Evil Kraig F (11/28/2012)


    This is NOT pretty, but it IS functional.

    I am trying to understand why went with a loop vs a set-based approach. I posted a more set based method earlier that gets the same results notably faster. I am not trying to be confrontational, I'm just trying to understand your approach or what I did wrong.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan.B (11/28/2012)


    Evil Kraig F (11/28/2012)


    This is NOT pretty, but it IS functional.

    I am trying to understand why went with a loop vs a set-based approach. I posted a more set based method earlier that gets the same results notably faster. I am not trying to be confrontational, I'm just trying to understand your approach or what I did wrong.

    Whoops, my apologies Alan. I didn't realize you were code complete. This comment:

    Alan.B (11/28/2012)


    This is a bit tricky and I am still working on this. Here's what I have thusfar...

    Misled me to believe you were still working on your solution and I breezed over it before going back to trying to force the rCTE to work.

    However, you will miss recursive chains. For example, I've adjusted the inclusion set here:

    INSERT #IDs VALUES (1,9), (1,10), (1,11), (2,11), (2,12), (3,12), (3,13), (4,14), (5,14)

    GO

    You'll notice that your code will no longer pick up Invoice 3 because You need to go from 1->2 via Booking 11, then 2->3 via Booking 12.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Evil Kraig F (11/28/2012)


    Alan.B (11/28/2012)


    Evil Kraig F (11/28/2012)


    This is NOT pretty, but it IS functional.

    I am trying to understand why went with a loop vs a set-based approach. I posted a more set based method earlier that gets the same results notably faster. I am not trying to be confrontational, I'm just trying to understand your approach or what I did wrong.

    Whoops, my apologies Alan. I didn't realize you were code complete. This comment:

    Alan.B (11/28/2012)


    This is a bit tricky and I am still working on this. Here's what I have thusfar...

    Misled me to believe you were still working on your solution and I breezed over it before going back to trying to force the rCTE to work.

    However, you will miss recursive chains. For example, I've adjusted the inclusion set here:

    INSERT #IDs VALUES (1,9), (1,10), (1,11), (2,11), (2,12), (3,12), (3,13), (4,14), (5,14)

    GO

    You'll notice that your code will no longer pick up Invoice 3 because You need to go from 1->2 via Booking 11, then 2->3 via Booking 12.

    Ahhhh.... Now I see what I was doing wrong; I knew I was missing something.

    Thanks & nice work.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan.B (11/28/2012)


    Evil Kraig F (11/28/2012)


    This is NOT pretty, but it IS functional.

    I am trying to understand why went with a loop vs a set-based approach. I posted a more set based method earlier that gets the same results notably faster. I am not trying to be confrontational, I'm just trying to understand your approach or what I did wrong.

    @Alan.b If i look at your method correctly it misses InvoiceID 3.

    @Kraig your right about the issues with my method that you cant go both directions up the tree which i had forgotten about. (dont work with hierarchies much)

    EDIT: i sat to long to hit the post button so struck what was said before the post.


    For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

    For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

    Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
    Jeff Moden's Cross tab and Pivots Part 1[/url]
    Jeff Moden's Cross tab and Pivots Part 2[/url]

  • how about this one

    SET @InvoiceID = 1;WITH rCTE AS

    (

    SELECT InvoiceID, BookingID, 1 AS HierarchyLevel

    FROM #IDs

    WHERE InvoiceID = @InvoiceID

    UNION ALL

    SELECT IDs.InvoiceID, ids2.BookingID, rCTE.HierarchyLevel + 1 AS HierarchyLevel

    FROM rCTE

    JOIN #IDs ids

    ON IDs.BookingID = rCTE.BookingID AND

    ids.InvoiceId <> rcte.InvoiceID

    INNER JOIN #IDs ids2

    ON ids2.InvoiceId = ids.InvoiceID

    WHERE rCTE.InvoiceID<ids2.InvoiceId

    )

    select * from Rcte

    Every rule in a world of bits and bytes, can be bend or eventually be broken
    MyBlog About Common dialog control
    A Visualizer for viewing SqlCommand object script [/url]

  • thava (11/29/2012)


    how about this one

    It can't go backwards from 3 to get back to 1 because of the limitor to reduce duplication and endless recursion. It's similar to the issue above in the hierarchy model. You need to be able to go after either InvoiceID 3 or InvoiceID 1 and get the same result list.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • First of all, thanks to everyone who has taken the time to think about this problem and contribute. I spent over an hour blowing recursion limits left, right and centre yesterday without getting anywhere close, so I appreciate the input.

    I was excited to see Alan's solution working perfectly with my data, only to test it out using Craig's alternative (and valid) data and see it not work quite so well ...

    It remains a problem for us here, which for the moment we are dealing with by hard-coding multiple recursion levels. But we have examples where the actual number of recursions needed to scrape all the details together is greater than what we have coded. So any further input is most welcome.

    Thanks

    Phil

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • I wrote the above response before trying Craig's solution, which certainly does the business - thanks! Maybe this is one instance where a loop is the best solution. I'll leave it a while longer (no pun intended) before I implement it, to see whether anyone comes up with a set-based solution.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

Viewing 15 posts - 1 through 15 (of 35 total)

You must be logged in to reply to this topic. Login to reply