remove specific row only if there is a duplicate row

  • Hi,

    I have the following dataset:

    SELECT 'W1' AS A, 'WWW' AS B, 'W2' AS C

    UNION ALL

    SELECT 'X1', 'XXX', 'X1'

    UNION ALL

    SELECT 'X1', 'XXX', 'X2'

    UNION ALL

    SELECT 'Y1', 'YYY', 'Y1'

    UNION ALL

    SELECT 'Y1', 'YYY', 'Y3'

    UNION ALL

    SELECT 'Z1', 'ZZZ', 'Z1'

    I need to remove the rows where A = C But only when there are two or more rows where A and B are the same

    What I need to see after it is filtered is the result of the following code:

    SELECT 'W1' AS A, 'WWW' AS B, 'W2' AS C

    UNION ALL

    SELECT 'X1', 'XXX', 'X2'

    UNION ALL

    SELECT 'Y1', 'YYY', 'Y3'

    UNION ALL

    SELECT 'Z1', 'ZZZ', 'Z1'

    Any Ideas?

    Regards.

  • Here is one way. There are probably others that maybe faster and/or more effecient.

    with TestData as (

    SELECT 'W1' AS A, 'WWW' AS B, 'W2' AS C

    UNION ALL

    SELECT 'X1', 'XXX', 'X1'

    UNION ALL

    SELECT 'X1', 'XXX', 'X2'

    UNION ALL

    SELECT 'Y1', 'YYY', 'Y1'

    UNION ALL

    SELECT 'Y1', 'YYY', 'Y3'

    UNION ALL

    SELECT 'Z1', 'ZZZ', 'Z1'

    )

    , BaseData as (

    select

    A,

    B,

    C,

    COUNT(*) over (partition by A, B) as GrpCnt

    from

    TestData

    )

    select

    A,

    B,

    C

    from

    BaseData

    where

    NOT (A = C

    and GrpCnt > 1)

    ;

  • Thanks,

    I'll report back if it worked.

    Appreciate the help.

    Phil

  • Worked Perfectly.

    Thanks for the help, much appreciated.

    Phil

  • This might work too:

    ;With CTE

    As

    (Select *, Row_Number() Over (Partition by a,b Order By a Desc) As rownum From

    (SELECT 'W1' AS A, 'WWW' AS B, 'W2' AS C

    UNION ALL

    SELECT 'X1', 'XXX', 'X1'

    UNION ALL

    SELECT 'X1', 'XXX', 'X2'

    UNION ALL

    SELECT 'Y1', 'YYY', 'Y1'

    UNION ALL

    SELECT 'Y1', 'YYY', 'Y3'

    UNION ALL

    SELECT 'Z1', 'ZZZ', 'Z1') As a)

    Select A, B, C From CTE Where rownum = 1

    Vinu Vijayan

    For better and faster solutions please check..."How to post data/code on a forum to get the best help" - Jeff Moden[/url] 😉

  • I have a rather strange alternative for you here. I say strange because the query plan looks ridiculously more complicated than either the one produced by Lynn's or by Vinus's suggestions.

    ;with TestData as (

    SELECT 'W1' AS A, 'WWW' AS B, 'W2' AS C

    UNION ALL

    SELECT 'X1', 'XXX', 'X1'

    UNION ALL

    SELECT 'X1', 'XXX', 'X2'

    UNION ALL

    SELECT 'Y1', 'YYY', 'Y1'

    UNION ALL

    SELECT 'Y1', 'YYY', 'Y3'

    UNION ALL

    SELECT 'Z1', 'ZZZ', 'Z1'

    )

    SELECT *

    FROM TestData

    WHERE A <> C

    UNION ALL

    SELECT *

    FROM TestData a

    WHERE A = C AND (SELECT COUNT(A) FROM TestData b WHERE a.A = b.A) = 1

    And yet, despite the immense complexity of the plan, the cost is much lower.

    Give it a try but be suspicious (run a timing test to be sure).


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • I'll give them both a try and see which one works the best for my situation.

    Thanks again for all the replies.

    Phil

  • Please do and tell us what you find.

    I must confess though I ran a timing test on the 3 and found that mine was quite slow so take care with it.

    Note that in the test I ran, I constructed a bunch of made up data (sorry, I had to ditch before I had a chance to post it) and I got different row counts coming out of the 3 queries and I'm not sure why.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply