Consolidating records - TSQL problem

  • Lynn Pettis (9/25/2012)


    Abu Dina (9/25/2012)


    Lynne's solution works if we assume that the retained ID is always the maximum.

    But tthe solution doesn't work with the following record set:

    drop table dbo.testing

    create table dbo.testing (retained int, dropped int)

    insert into dbo.testing (retained, dropped)

    select 767884, 157441 union all

    select 1046261, 157441 union all

    select 6699, 157441 union all

    select 157441, 73635 union all

    select 767884, 73635 union all

    select 1046261, 73635 union all

    select 6699, 73635 union all

    select 1046261, 767884 union all

    select 6699, 767884 union all

    select 6699, 1046261

    I will keep trying to see if I can come up with a solution but if anyone else can think of something then that'd be great!

    Thanks in advance.

    I can only write code based on what you provided. Based on the sample data and expected results, what I saw was the max id being retained. With the new data, what are the rules for determining what ID is used? Also, you posted additional data but not additional expected results.

    Okay, let me try this sgain from my phone while eating lunch at McDonalds.

    Try this, I wrote it on a napkin while eating and looking at the data you posted.

    with rCTE as (

    select t1.retained, t1.dropped, t1.retained as TopLevel

    from dbo.testing t1 left outer join dbo.testing t2 on t1.retained = t2.dropped

    where t2.dropped is null

    union all

    select t1.retained, t1.dropped, r.TopLevel

    from dbo.testing t1inner join rCTE r on t1.retained = r.dropped)

    update tu set

    retained = r.TopLevel

    from dbo.testing tu inner join rCTE r on tu.retained = r.retained and tu.dropped = r.dropped;

    Let me know if it works. I can't test it on my phone.

  • Luis Cazares (9/25/2012)


    I know it could be a bad solution, and maybe you had tought about it.

    But since the rCTE is not scaling well (and it sure won't) why don't you use your original solution in a while loop for @@rowcount > 0?

    I'm not sure which will scale better, but it's still an option.

    Thanks Luis, your suggestion is valid. I see that Lynne has come back with a potential solution. I will test tomorrow and report back.

    Regards.

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • Here is my code, reformatted the way I like to it.

    with rCTE as (

    select

    t1.retained,

    t1.dropped,

    t1.retained as TopLevel

    from

    dbo.testing t1

    left outer join dbo.testing t2

    on t1.retained = t2.dropped

    where

    t2.dropped is null

    union all

    select

    t1.retained,

    t1.dropped,

    r.TopLevel

    from

    dbo.testing t1

    inner join rCTE r

    on t1.retained = r.dropped)

    update tu set

    retained = r.TopLevel

    from

    dbo.testing tu

    inner join rCTE r

    on tu.retained = r.retained and

    tu.dropped = r.dropped;

  • Abu Dina (9/25/2012)


    Luis Cazares (9/25/2012)


    I know it could be a bad solution, and maybe you had tought about it.

    But since the rCTE is not scaling well (and it sure won't) why don't you use your original solution in a while loop for @@rowcount > 0?

    I'm not sure which will scale better, but it's still an option.

    Thanks Luis, your suggestion is valid. I see that Lynne has come back with a potential solution. I will test tomorrow and report back.

    Regards.

    By the way, no 'e' at the end of Lynn.

  • Lynn Pettis (9/25/2012)


    Here is my code, reformatted the way I like to it.

    with rCTE as (

    select

    t1.retained,

    t1.dropped,

    t1.retained as TopLevel

    from

    dbo.testing t1

    left outer join dbo.testing t2

    on t1.retained = t2.dropped

    where

    t2.dropped is null

    union all

    select

    t1.retained,

    t1.dropped,

    r.TopLevel

    from

    dbo.testing t1

    inner join rCTE r

    on t1.retained = r.dropped)

    update tu set

    retained = r.TopLevel

    from

    dbo.testing tu

    inner join rCTE r

    on tu.retained = r.retained and

    tu.dropped = r.dropped;

    Nice one, Lynn. I was working on something very similar which includes "evidence" in the data that it works;

    ;WITH rCTE AS (

    SELECT [level] = 1, t1.retained, t1.dropped,

    --fullchain = CAST(CAST(t1.retained AS VARCHAR(10)) + ',' + CAST(t1.dropped AS VARCHAR(10)) AS VARCHAR(8000)),

    Toplevel = t1.retained

    FROM dbo.testing t1

    LEFT OUTER JOIN dbo.testing t2 ON t1.retained = t2.dropped

    WHERE t2.dropped IS NULL

    UNION ALL

    SELECT [level] = [level] + 1, t1.retained, t1.dropped,

    --fullchain = r.fullchain + ',' + CAST(t1.dropped AS VARCHAR(10)),

    Toplevel = r.Toplevel

    FROM dbo.testing t1

    INNER JOIN rCTE r ON t1.retained = r.dropped

    )

    --SELECT *

    --FROM rCTE

    --ORDER BY level, dropped

    UPDATE tu SET

    retained = r.TopLevel

    FROM dbo.testing tu

    INNER JOIN rCTE r ON tu.dropped = r.dropped;

    SELECT * FROM dbo.testing

    Where evidence is the column [fullchain].

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Your solution works great but unfortunately the rCTE isn't scaling well enough. With 150000 rows it takes 7 seconds to process. With 160000 records it takes 17 seconds. But when I try 180000+ the query just keeps running and running. Very strange.

    It might be easier just to stick a loop as Luis suggested as it takes less than 10 seconds to update 350000 records.

    It's been a good learning exercise and now I have a better understanding of recursive CTEs so not all is lost.

    Thanks for your efforts Lynn(e) ๐Ÿ˜‰

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • Abu Dina (9/26/2012)


    Your solution works great but unfortunately the rCTE isn't scaling well enough. With 150000 rows it takes 7 seconds to process. With 160000 records it takes 17 seconds. But when I try 180000+ the query just keeps running and running. Very strange.

    It might be easier just to stick a loop as Luis suggested as it takes less than 10 seconds to update 350000 records.

    It's been a good learning exercise and now I have a better understanding of recursive CTEs so not all is lost.

    Thanks for your efforts Lynn(e) ๐Ÿ˜‰

    It will scale a great deal better with indexes on the appropriate columns - often I've found clustering on one column and an ordinary index on the other does the trick.

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • I've messed around with this for a bit now and it's been a good learning exercise but I'm going to stick with the following (takes 3 seconds to work through 350K records)

    declare @UpdatedRecords int

    set @UpdatedRecords = 1

    WHILE @UpdatedRecords > 0

    BEGIN

    update b

    set b.retained=a.retained

    from testing1 as a

    inner join testing1 as b

    on b.retained = a.dropped

    set @UpdatedRecords = @@ROWCOUNT

    END

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • I like the possibilities that rCTEs give us, but I also believe that sometimes the simplest way is the best way to go.

    I'm glad I could help.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2

Viewing 9 posts - 16 through 23 (of 23 total)

You must be logged in to reply to this topic. Login to reply