Consolidating records - TSQL problem

  • Good afternoon,

    I've been working on a project to eliminate duplicates from a record set containing contact and address details. (ChrisM@Work/Home - thank you so much for all your help!!!!!).

    I have two columns in my final result set, one containing the ID of the record that I'm keeping and another containing the ID of thr row I am dropping.

    To concolidate my final result set I've come up with the following solution but I know it's not great because I'm having to run it several times to complete the concolidation:

    Let me explain with some sample code:

    create table testing (retained int, dropped int)

    insert into testing (retained, dropped)

    select 767884, 157441 union all

    select 1046261, 157441 union all

    select 1055257, 157441 union all

    select 157441, 73635 union all

    select 767884, 73635 union all

    select 1046261, 73635 union all

    select 1055257, 73635 union all

    select 1046261, 767884 union all

    select 1055257, 767884 union all

    select 1055257, 1046261

    select * from testing

    -- consoidate records:

    -- updates 6 records:

    update b

    set b.retained=a.retained

    from testing as a

    inner join testing as b

    on b.retained = a.dropped

    -- updates remaining 3

    update b

    set b.retained=a.retained

    from testing as a

    inner join testing as b

    on b.retained = a.dropped

    select * from testing

    drop table testing

    See below image, left table is what I started with, right table is the result I want:

    My solution works but I'm sure there is a better way to do this. Any suggestions to do this in one pass?!

    Thanks in advance.

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • I came up this:

    create table testing (retained int, dropped int)

    insert into testing (retained, dropped)

    select 767884, 157441 union all

    select 1046261, 157441 union all

    select 1055257, 157441 union all

    select 157441, 73635 union all

    select 767884, 73635 union all

    select 1046261, 73635 union all

    select 1055257, 73635 union all

    select 1046261, 767884 union all

    select 1055257, 767884 union all

    select 1055257, 1046261

    select * from testing

    ORDER BY dropped,retained;

    WITH UpdateValues AS (

    SELECT

    MAX(retained) AS NewValue,

    dropped

    FROM

    testing

    GROUP BY

    dropped

    )

    UPDATE t SET

    retained = uv.NewValue

    FROM

    testing t

    INNER JOIN UpdateValues uv

    ON (t.dropped = uv.dropped);

    select * from testing

    ORDER BY dropped,retained;

    DROP TABLE testing;

  • Lynn Pettis (9/24/2012)


    I came up this:

    --snip

    Interesting image that conjures up. I didn't take in the rest of your post. :w00t:

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Phil Parkin (9/24/2012)


    Lynn Pettis (9/24/2012)


    I came up this:

    --snip

    Interesting image that conjures up. I didn't take in the rest of your post. :w00t:

    Not feeling well today? ๐Ÿ˜‰

  • Nicely done Lynn,

    I started doing the CTE but you beat me to it.....

    Much appreciated!

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • Lynne's solution works if we assume that the retained ID is always the maximum.

    But tthe solution doesn't work with the following record set:

    drop table dbo.testing

    create table dbo.testing (retained int, dropped int)

    insert into dbo.testing (retained, dropped)

    select 767884, 157441 union all

    select 1046261, 157441 union all

    select 6699, 157441 union all

    select 157441, 73635 union all

    select 767884, 73635 union all

    select 1046261, 73635 union all

    select 6699, 73635 union all

    select 1046261, 767884 union all

    select 6699, 767884 union all

    select 6699, 1046261

    I will keep trying to see if I can come up with a solution but if anyone else can think of something then that'd be great!

    Thanks in advance.

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • Hi Abu, thank you for the kind mention - I take it that phase of the project is now concluded.

    Have a try with this. It works by first selecting retained rows which don't get a mention in discarded rows, then left joining to self. Hope that makes sense ๐Ÿ™‚

    SELECT a.*

    FROM testing a

    LEFT JOIN testing b ON b.retained = a.dropped

    WHERE NOT EXISTS (SELECT 1 FROM testing i WHERE a.retained = i.dropped)

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (9/25/2012)


    Hi Abu, thank you for the kind mention - I take it that phase of the project is now concluded.

    Hi Chris, not quite! It's a looong story lol .... will explain another time ๐Ÿ˜›

    Have a try with this. It works by first selecting retained rows which don't get a mention in discarded rows, then left joining to self. Hope that makes sense ๐Ÿ™‚

    SELECT a.*

    FROM testing a

    LEFT JOIN testing b ON b.retained = a.dropped

    WHERE NOT EXISTS (SELECT 1 FROM testing i WHERE a.retained = i.dropped)

    Not sure I get you.

    Here is another sample record set:

    drop table dbo.testing

    create table dbo.testing (retained int, dropped int)

    insert into dbo.testing (retained, dropped)

    select 972580, 697688 union all

    select 1354938, 697688 union all

    select 1354938, 972580 union all

    select 1555243, 1354938

    The result should be:

    Any ideas?

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • Abu Dina (9/25/2012)


    Lynne's solution works if we assume that the retained ID is always the maximum.

    But tthe solution doesn't work with the following record set:

    drop table dbo.testing

    create table dbo.testing (retained int, dropped int)

    insert into dbo.testing (retained, dropped)

    select 767884, 157441 union all

    select 1046261, 157441 union all

    select 6699, 157441 union all

    select 157441, 73635 union all

    select 767884, 73635 union all

    select 1046261, 73635 union all

    select 6699, 73635 union all

    select 1046261, 767884 union all

    select 6699, 767884 union all

    select 6699, 1046261

    I will keep trying to see if I can come up with a solution but if anyone else can think of something then that'd be great!

    Thanks in advance.

    I can only write code based on what you provided. Based on the sample data and expected results, what I saw was the max id being retained. With the new data, what are the rules for determining what ID is used? Also, you posted additional data but not additional expected results.

  • True... Based on the original sample data it does look like it would work based on maximum id.

    See my previous reply to ChrisM.

    As I said, I have a working solution but I'm having to run my update several times until it works. Just wondering if there is an alternative solution which works with one pass.

    Thanks for your efforts.

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • Abu Dina (9/25/2012)


    Lynne's solution works if we assume that the retained ID is always the maximum.

    But tthe solution doesn't work with the following record set:

    drop table dbo.testing

    create table dbo.testing (retained int, dropped int)

    insert into dbo.testing (retained, dropped)

    select 767884, 157441 union all

    select 1046261, 157441 union all

    select 6699, 157441 union all

    select 157441, 73635 union all

    select 767884, 73635 union all

    select 1046261, 73635 union all

    select 6699, 73635 union all

    select 1046261, 767884 union all

    select 6699, 767884 union all

    select 6699, 1046261

    I will keep trying to see if I can come up with a solution but if anyone else can think of something then that'd be great!

    Thanks in advance.

    Yep...

    ;WITH retained AS (

    SELECT t.*

    FROM testing t

    WHERE NOT EXISTS (SELECT 1 FROM testing i WHERE i.dropped = t.retained)

    )

    SELECT [Level] = 1, l1.retained, l1.dropped

    FROM retained l1

    UNION ALL

    SELECT [Level] = 2, l1.retained, l2.retained

    FROM retained l1

    inner JOIN testing l2 ON l2.retained = l1.dropped

    UNION ALL

    SELECT [Level] = 3, l1.retained, l3.retained

    FROM retained l1

    inner JOIN testing l2 ON l2.retained = l1.dropped

    inner JOIN testing l3 ON l3.retained = l2.dropped

    UNION ALL

    SELECT [Level] = 4, l1.retained, l3.dropped

    FROM retained l1

    inner JOIN testing l2 ON l2.retained = l1.dropped

    inner JOIN testing l3 ON l3.retained = l2.dropped

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thanks Chris,

    I'm looking at recursive CTEs but I don't think it will work. Well I've been trying for the last couple of hours and it's still not giving the same result as my original update (which I run multiple times until 0 rows are updated!).

    Thanks for your efforts.

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • Abu Dina (9/25/2012)


    Thanks Chris,

    I'm looking at recursive CTEs but I don't think it will work. Well I've been trying for the last couple of hours and it's still not giving the same result as my original update (which I run multiple times until 0 rows are updated!).

    Thanks for your efforts.

    You're welcome.

    I tried a rCTE too, and although it does resolve the hierarchy, I couldn't get it to come out in the required format - it would take quite a bit of post-processing which would have made it unfeasibly slow.

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Hi Chris,

    Here is my attempt doing this with a rCTE

    ;WITH retained AS (

    SELECT t.retained, dropped

    FROM testing t

    WHERE NOT EXISTS (SELECT 1 FROM testing i WHERE i.dropped = t.retained)

    UNION ALL

    SELECT b.retained, t1.dropped

    FROM testing AS t1

    INNER JOIN retained as b

    ON t1.retained = b.dropped)

    UPDATE t

    SET t.retained = uv.retained

    FROM testing t

    INNER JOIN retained uv

    ON t.dropped = uv.dropped

    This works with the sample data but something isn't quite right with this. Tried it with a sample of 160000 records, it took 17 seconds.

    When I increased the sample record set to 200000, the query just keeps going. It's scaling up badly or there's some dodgy data that the above logic just can't handle efficiently.

    This is what the estimated execution plan is telling me:

    Tried adding a couple of indexes on retained and dropped columns but that made no difference to the estimated execution plan.

    Arrrggghhh! :w00t:

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • I know it could be a bad solution, and maybe you had tought about it.

    But since the rCTE is not scaling well (and it sure won't) why don't you use your original solution in a while loop for @@rowcount > 0?

    I'm not sure which will scale better, but it's still an option.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply