Help on Dups

  • Hi,

    I'm new to MS SQL 2005, can anyone help me in getting the second occurrence of duplicates onwards for every batch of B's?

    Here are two examples with their desired output (@Output).

    First:

    DECLARE @Table TABLE (

    id bigint IDENTITY PRIMARY KEY CLUSTERED,

    lang varchar(100),

    code char(1)

    )

    DECLARE @Output TABLE(id bigint)

    INSERT @Table

    SELECT 'en','A' UNION ALL

    SELECT 'en','A' UNION ALL

    SELECT 'en','B' UNION ALL

    SELECT 'de','B' UNION ALL

    SELECT 'zh','B' UNION ALL

    SELECT 'en','A' UNION ALL

    SELECT 'en','A' UNION ALL

    SELECT 'en','B' UNION ALL

    SELECT 'de','B' UNION ALL

    SELECT 'en','B' UNION ALL

    SELECT 'en','B' UNION ALL

    SELECT 'en','A' UNION ALL

    SELECT 'en','A'

    SELECT * FROM @Table

    INSERT @Output

    SELECT 10 UNION ALL

    SELECT 11

    SELECT * FROM @Output

    And second:

    DECLARE @Table TABLE (

    id bigint IDENTITY PRIMARY KEY CLUSTERED,

    lang varchar(100),

    code char(1)

    )

    DECLARE @Output TABLE(id bigint)

    INSERT @Table

    SELECT 'en','A' UNION ALL

    SELECT 'en','A' UNION ALL

    SELECT 'en','A' UNION ALL

    SELECT 'de','B' UNION ALL

    SELECT 'zh','B' UNION ALL

    SELECT 'zh','B' UNION ALL

    SELECT 'en','B' UNION ALL

    SELECT 'en','B' UNION ALL

    SELECT 'de','A' UNION ALL

    SELECT 'en','A' UNION ALL

    SELECT 'en','B' UNION ALL

    SELECT 'en','B' UNION ALL

    SELECT 'en','B'

    SELECT * FROM @Table

    INSERT @Output

    SELECT 6 UNION ALL

    SELECT 8 UNION ALL

    SELECT 12 UNION ALL

    SELECT 13

    SELECT * FROM @Output

    Any help and sample code would be greatly appreciated.

    Thank you.

  • HI there

    Are you sure your outputs are correct?

    I'm a bit confused.

    thanks

    Chris

    ----------------------------------------------
    Try to learn something about everything and everything about something. - Thomas Henry Huxley

    :w00t:
    Posting Best Practices[/url]
    Numbers / Tally Tables[/url]

    SQL-4-Life
  • Hello

    Try this out, have a look at the result set (it will contain all the iterations of the dupes) and use it to help explain what you're trying to do.

    [font="Courier New"]DECLARE @Table TABLE (

            id bigint IDENTITY PRIMARY KEY CLUSTERED,

            lang VARCHAR(100),

            code CHAR(1) )

    INSERT @Table

    SELECT 'en','A' UNION ALL

    SELECT 'en','A' UNION ALL

    SELECT 'en','B' UNION ALL

    SELECT 'de','B' UNION ALL

    SELECT 'zh','B' UNION ALL

    SELECT 'en','A' UNION ALL

    SELECT 'en','A' UNION ALL

    SELECT 'en','B' UNION ALL

    SELECT 'de','B' UNION ALL

    SELECT 'en','B' UNION ALL

    SELECT 'en','B' UNION ALL

    SELECT 'en','A' UNION ALL

    SELECT 'en','A'

    SELECT a.*

    FROM @Table a

    INNER JOIN (

       SELECT lang, code

       FROM @Table

       WHERE code = 'B'

       GROUP BY lang, code

       HAVING COUNT(*) > 1) d

    ON d.lang = a.lang AND d.code = a.code

    ORDER BY a.lang, a.code

    [/font]

    Cheers

    ChrisM

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Christopher Stobbs (9/22/2008)


    HI there

    Are you sure your outputs are correct?

    I'm a bit confused.

    thanks

    Chris

    Yes I am Sure with my output. If we're going to look at the first example:

    DECLARE @Table TABLE (

    id bigint IDENTITY PRIMARY KEY CLUSTERED,

    lang varchar(100),

    code char(1)

    )

    DECLARE @Output TABLE(id bigint)

    INSERT @Table

    SELECT 'en','A' UNION ALL

    SELECT 'en','A' UNION ALL

    SELECT 'en','B' UNION ALL

    SELECT 'de','B' UNION ALL

    SELECT 'zh','B' UNION ALL

    SELECT 'en','A' UNION ALL

    SELECT 'en','A' UNION ALL

    SELECT 'en','B' UNION ALL

    SELECT 'de','B' UNION ALL

    SELECT 'en','B' UNION ALL

    SELECT 'en','B' UNION ALL

    SELECT 'en','A' UNION ALL

    SELECT 'en','A'

    SELECT * FROM @Table

    INSERT @Output

    SELECT 10 UNION ALL

    SELECT 11

    SELECT * FROM @Output

    Why is the output 10 and 11? because on the first batch of B's (id's 3-5), it doesn't have a duplicate. if you go to second batch of B's which are id's 8-11, you can see duplicates on id's 8,10, and 11. With this, I just need ids 10 and 11 since they are the second occurrence of duplicate onwards (meaning not the first one)

    I created a simple code but it won't give me the exact output since it will group all the B's in the entire data, not by batch of B's. Anyway, here's my code:

    ;WITH test AS (

    SELECT *,

    rank() OVER (PARTITION BY lang ORDER BY id) dr

    FROM @Table

    WHERE code = 'B'

    )

    SELECT * FROM Test

    WHERE dr > 1

    ORDER BY id

    Thanks.

  • Now I gotcha, thanks for the extra explanation.

    [font="Courier New"]DECLARE @Table TABLE (

            [id] bigint IDENTITY PRIMARY KEY CLUSTERED,

            lang VARCHAR(100),

            code CHAR(1) )

    INSERT @Table

    SELECT 'en','A' UNION ALL -- 1

    SELECT 'en','A' UNION ALL -- 2

    SELECT 'en','B' UNION ALL -- 3

    SELECT 'de','B' UNION ALL -- 4

    SELECT 'zh','B' UNION ALL -- 5

    SELECT 'en','A' UNION ALL -- 6

    SELECT 'en','A' UNION ALL -- 7

    SELECT 'en','B' UNION ALL -- 8 dupe of 3

    SELECT 'de','B' UNION ALL -- 9 dupe of 4

    SELECT 'en','B' UNION ALL -- 10 dupe of 3

    SELECT 'en','B' UNION ALL -- 11 dupe of 3

    SELECT 'en','A' UNION ALL -- 12

    SELECT 'en','A'       -- 13

    SELECT a.*

    FROM @Table a

    INNER JOIN (

       SELECT MIN([id]) AS FirstID, lang, code

       FROM @Table

       WHERE code = 'B'

       GROUP BY lang, code

       HAVING COUNT(*) > 1) d

    ON d.lang = a.lang AND d.code = a.code AND a.[id] <> d.FirstID

    ORDER BY a.[id]

    [/font]

    Results:

    id lang code

    ---- ------ ----

    8 en B

    9 de B

    10 en B

    11 en B

    Cheers

    ChrisM

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • HI ,

    Why then is 6 & 8 a result in the second example.

    Also Chris your code doesn't return 10 & 11 as the OP expected.

    thanks

    Chris

    ----------------------------------------------
    Try to learn something about everything and everything about something. - Thomas Henry Huxley

    :w00t:
    Posting Best Practices[/url]
    Numbers / Tally Tables[/url]

    SQL-4-Life
  • Ok at last I see what is need, now just to work it out ๐Ÿ™‚

    ----------------------------------------------
    Try to learn something about everything and everything about something. - Thomas Henry Huxley

    :w00t:
    Posting Best Practices[/url]
    Numbers / Tally Tables[/url]

    SQL-4-Life
  • Hello Chris,

    Thanks for the response and help, I really appreciate it.

    Actually the code you supplied doesn't give me the correct output. The first example should output 10,11 while the second example should output 6,8,12,13. The code should take a look at the second occurrence of duplicates onwards per batch of B's and not as the whole B's.

    So in example one(1), the first batch of B's is 3-5 and the second is 8-11 and so on if I have more data of B's in it. In example one, the output should be 10,11

    DECLARE @Table TABLE (

    id bigint IDENTITY PRIMARY KEY CLUSTERED,

    lang varchar(100),

    code char(1)

    )

    DECLARE @Output TABLE(id bigint)

    INSERT @Table

    SELECT 'en','A' UNION ALL -- 1

    SELECT 'en','A' UNION ALL -- 2

    SELECT 'en','B' UNION ALL -- 3 [batch 1 of B's]

    SELECT 'de','B' UNION ALL -- 4 [batch 1 of B's]

    SELECT 'zh','B' UNION ALL -- 5 [batch 1 of B's]

    SELECT 'en','A' UNION ALL -- 6

    SELECT 'en','A' UNION ALL -- 7

    SELECT 'en','B' UNION ALL -- 8 [batch 2 of B's]

    SELECT 'de','B' UNION ALL -- 9 [batch 2 of B's]

    SELECT 'en','B' UNION ALL -- 10 dupe of 8 [batch 2 of B's] - second occurrence of dupe

    SELECT 'en','B' UNION ALL -- 11 dupe of 8 [batch 2 of B's] - third occurrence of dupe

    SELECT 'en','A' UNION ALL -- 12

    SELECT 'en','A' -- 13

    SELECT * FROM @Table

    INSERT @Output

    SELECT 10 UNION ALL

    SELECT 11

    SELECT * FROM @Output

    In the second(2) example, the first batch of B's is from 4-8 and the second batch is from 11-13. In this example, the output should be 6,8,12,13.

    DECLARE @Table TABLE (

    id bigint IDENTITY PRIMARY KEY CLUSTERED,

    lang varchar(100),

    code char(1)

    )

    DECLARE @Output TABLE(id bigint)

    INSERT @Table

    SELECT 'en','A' UNION ALL -- 1

    SELECT 'en','A' UNION ALL -- 2

    SELECT 'en','A' UNION ALL -- 3

    SELECT 'de','B' UNION ALL -- 4 [batch 1 of B's]

    SELECT 'zh','B' UNION ALL -- 5 [batch 1 of B's]

    SELECT 'zh','B' UNION ALL -- 6 dupe of 5 [batch 1 of B's] - second occurence of dupe

    SELECT 'en','B' UNION ALL -- 7 [batch 1 of B's]

    SELECT 'en','B' UNION ALL -- 8 dupe of 7 [batch 1 of B's] - second occurence of dupe

    SELECT 'de','A' UNION ALL -- 9

    SELECT 'en','A' UNION ALL -- 10

    SELECT 'en','B' UNION ALL -- 11 [batch 2 of B's]

    SELECT 'en','B' UNION ALL -- 12 dupe of 11 [batch 2 of B's] - second occurence of dupe

    SELECT 'en','B' -- 13 dupe of 11 [batch 2 of B's] - third occurence of dupe

    SELECT * FROM @Table

    INSERT @Output

    SELECT 6 UNION ALL

    SELECT 8 UNION ALL

    SELECT 12 UNION ALL

    SELECT 13

    SELECT * FROM @Output

    I hope this would help clear things.

    Thank you so much.

  • Christopher Stobbs (9/23/2008)


    Ok at last I see what is need, now just to work it out ๐Ÿ™‚

    Thank you Chris for the help.

  • Are there any other cols in the table that might help with identifying the groupings?

    ----------------------------------------------
    Try to learn something about everything and everything about something. - Thomas Henry Huxley

    :w00t:
    Posting Best Practices[/url]
    Numbers / Tally Tables[/url]

    SQL-4-Life
  • mmm ok here is what I have.

    It involves a new column and an update statement.

    [font="Courier New"]

    DECLARE @Table TABLE (

            id bigint IDENTITY PRIMARY KEY CLUSTERED,

            lang VARCHAR(100),

            code CHAR(1),

           GRP INT

    )

    DECLARE @Output TABLE(id bigint)

    INSERT @Table(lang,code)

    SELECT 'en','A' UNION ALL

    SELECT 'en','A' UNION ALL

    SELECT 'en','B' UNION ALL

    SELECT 'de','B' UNION ALL

    SELECT 'zh','B' UNION ALL

    SELECT 'en','A' UNION ALL

    SELECT 'en','A' UNION ALL

    SELECT 'en','B' UNION ALL

    SELECT 'de','B' UNION ALL

    SELECT 'en','B' UNION ALL

    SELECT 'en','B' UNION ALL

    SELECT 'en','A' UNION ALL

    SELECT 'en','A'

    SELECT * FROM @Table

    INSERT @Output

    SELECT 10 UNION ALL

    SELECT 11

    SELECT * FROM @Output

    --SOLUTION

    --********************************

    DECLARE @grp INT

    SET @grp = 0

    UPDATE @Table  SET GRP = 0 WHERE id = 1

    -- CREATE THE GROUPING

    UPDATE t

    SET @grp = GRP = CASE WHEN  a.code != t.Code

                         THEN @grp + 1

                         ELSE @grp

                    END

    FROM @Table t

       INNER JOIN @Table a ON a.Id +1 = t.Id

    --REMOVE THE FIRST ROW FROM EACH SUB-GROUP

    SELECT id,Lang,Code

    FROM

       (SELECT

           ROW_NUMBER() OVER (PARTITION BY Lang,Code,grp ORDER BY id) num,

           *

       FROM @Table

       WHERE Code = 'b'

       ) nest

    WHERE  Num ! =1

    ORDER BY id

    -[/font]

    ----------------------------------------------
    Try to learn something about everything and everything about something. - Thomas Henry Huxley

    :w00t:
    Posting Best Practices[/url]
    Numbers / Tally Tables[/url]

    SQL-4-Life
  • Christopher Stobbs (9/23/2008)


    Are there any other cols in the table that might help with identifying the groupings?

    There are no other columns to determine groupings, another information though is that the table here are dynamic and also columns are dynamic, so there is a possibility that there would be another column that will be added aside from "lang" in which to check for dupes. Anyway, I have an idea on how to make it dynamic and i think the most important thing is the logic and the performance of the code because eventually it is anticipated to have 1 million or more data with the said patterns.

    Christopher Stobbs (9/23/2008)


    mmm ok here is what I have.

    It involves a new column and an update statement.

    [font="Courier New"]

    DECLARE @Table TABLE (

            id bigint IDENTITY PRIMARY KEY CLUSTERED,

            lang VARCHAR(100),

            code CHAR(1),

           GRP INT

    )

    DECLARE @Output TABLE(id bigint)

    INSERT @Table(lang,code)

    SELECT 'en','A' UNION ALL

    SELECT 'en','A' UNION ALL

    SELECT 'en','B' UNION ALL

    SELECT 'de','B' UNION ALL

    SELECT 'zh','B' UNION ALL

    SELECT 'en','A' UNION ALL

    SELECT 'en','A' UNION ALL

    SELECT 'en','B' UNION ALL

    SELECT 'de','B' UNION ALL

    SELECT 'en','B' UNION ALL

    SELECT 'en','B' UNION ALL

    SELECT 'en','A' UNION ALL

    SELECT 'en','A'

    SELECT * FROM @Table

    INSERT @Output

    SELECT 10 UNION ALL

    SELECT 11

    SELECT * FROM @Output

    --SOLUTION

    --********************************

    DECLARE @grp INT

    SET @grp = 0

    UPDATE @Table  SET GRP = 0 WHERE id = 1

    -- CREATE THE GROUPING

    UPDATE t

    SET @grp = GRP = CASE WHEN  a.code != t.Code

                         THEN @grp + 1

                         ELSE @grp

                    END

    FROM @Table t

       INNER JOIN @Table a ON a.Id +1 = t.Id

    --REMOVE THE FIRST ROW FROM EACH SUB-GROUP

    SELECT id,Lang,Code

    FROM

       (SELECT

           ROW_NUMBER() OVER (PARTITION BY Lang,Code,grp ORDER BY id) num,

           *

       FROM @Table

       WHERE Code = 'b'

       ) nest

    WHERE  Num ! =1

    ORDER BY id

    -[/font]

    I can't thank you enough for the help Chris. Though there is a GRP column added on the sample table, since I have dynamic table, is it possible to add a column 'GRP' dynamically?

    Thanks.

  • Yeah it shouldn't be a problem to hard code that Column, cause looks like that Column and the Code Column will be the only two that you kow for certian.

    I would suggest creating a sample to test with a million rows.

    I would also suggest testing it with temp tables and real tables, as I would think that 1mil rows in a table variable might cause performance issues.

    Thanks

    Chris

    ----------------------------------------------
    Try to learn something about everything and everything about something. - Thomas Henry Huxley

    :w00t:
    Posting Best Practices[/url]
    Numbers / Tally Tables[/url]

    SQL-4-Life
  • Christopher Stobbs (9/24/2008)


    Yeah it shouldn't be a problem to hard code that Column, cause looks like that Column and the Code Column will be the only two that you kow for certian.

    I would suggest creating a sample to test with a million rows.

    I would also suggest testing it with temp tables and real tables, as I would think that 1mil rows in a table variable might cause performance issues.

    Thanks

    Chris

    Thank you for the response and suggestion Chris, if you can show me how to dynamically add column GRP on every dynamic table then that would great. I'll try to create test data with 1 million rows but basically its pretty much the same pattern with my 2 examples and repeated until 1 million data.

    Thanks.

  • salvysmith08 (9/24/2008)


    ...if you can show me how to dynamically add column GRP on every dynamic table then that would great.

    Old gaming trick... when you can't move, change color... Temp Table to the rescue... ๐Ÿ˜›

    --===== Add the Grp column to all the other columns in a temp table

    SELECT *,0 AS Grp

    INTO #MyHead

    FROM @Table

    --===== Add the same primary key as the original table

    ALTER TABLE #MyHead

    ADD PRIMARY KEY CLUSTERED (ID)

    --===== Populate the Grp column using a special update

    DECLARE @PrevGrp INT; SET @PrevGrp = 0

    DECLARE @PrevCode CHAR(1); SET @PrevCode = ''

    UPDATE #MyHead

    SET @PrevGrp = Grp = CASE WHEN Code = @PrevCode THEN @PrevGrp ELSE @PrevGrp+1 END,

    @PrevCode = Code

    --===== Return the ID's of only Code "B" rows where dupes exist within a group.

    ;WITH

    cteCounts AS

    (SELECT ROW_NUMBER() OVER (PARTITION BY Code,Lang,Grp ORDER BY ID) AS MyCount,*

    FROM #MyHead)

    SELECT ID FROM cteCounts WHERE Code = 'B' AND MyCount > 1 ORDER BY ID

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply