Help on Dups

Question

Post reply

Help on Dups

salvysmith08-887568

SSC Veteran

Points: 215
More actions
September 22, 2008 at 9:38 am

#198632

Hi,
I'm new to MS SQL 2005, can anyone help me in getting the second occurrence of duplicates onwards for every batch of B's?
Here are two examples with their desired output (@Output).
First:
DECLARE @Table TABLE (
id bigint IDENTITY PRIMARY KEY CLUSTERED,
lang varchar(100),
code char(1)
)
DECLARE @Output TABLE(id bigint)
INSERT @Table
SELECT 'en','A' UNION ALL
SELECT 'en','A' UNION ALL
SELECT 'en','B' UNION ALL
SELECT 'de','B' UNION ALL
SELECT 'zh','B' UNION ALL
SELECT 'en','A' UNION ALL
SELECT 'en','A' UNION ALL
SELECT 'en','B' UNION ALL
SELECT 'de','B' UNION ALL
SELECT 'en','B' UNION ALL
SELECT 'en','B' UNION ALL
SELECT 'en','A' UNION ALL
SELECT 'en','A'
SELECT * FROM @Table
INSERT @Output
SELECT 10 UNION ALL
SELECT 11
SELECT * FROM @Output
And second:
DECLARE @Table TABLE (
id bigint IDENTITY PRIMARY KEY CLUSTERED,
lang varchar(100),
code char(1)
)
DECLARE @Output TABLE(id bigint)
INSERT @Table
SELECT 'en','A' UNION ALL
SELECT 'en','A' UNION ALL
SELECT 'en','A' UNION ALL
SELECT 'de','B' UNION ALL
SELECT 'zh','B' UNION ALL
SELECT 'zh','B' UNION ALL
SELECT 'en','B' UNION ALL
SELECT 'en','B' UNION ALL
SELECT 'de','A' UNION ALL
SELECT 'en','A' UNION ALL
SELECT 'en','B' UNION ALL
SELECT 'en','B' UNION ALL
SELECT 'en','B'
SELECT * FROM @Table
INSERT @Output
SELECT 6 UNION ALL
SELECT 8 UNION ALL
SELECT 12 UNION ALL
SELECT 13
SELECT * FROM @Output
Any help and sample code would be greatly appreciated.
Thank you.

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply

Christopher Stobbs SSC-Insane Points: 21098 More actions · Answer 1

HI there

Are you sure your outputs are correct?

I'm a bit confused.

thanks

Chris

----------------------------------------------
Try to learn something about everything and everything about something. - Thomas Henry Huxley

:w00t:
Posting Best Practices[/url]
Numbers / Tally Tables[/url]

SQL-4-Life

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 2

Hello

Try this out, have a look at the result set (it will contain all the iterations of the dupes) and use it to help explain what you're trying to do.

[font="Courier New"]DECLARE @Table TABLE (

id bigint IDENTITY PRIMARY KEY CLUSTERED,

lang VARCHAR(100),

code CHAR(1) )

INSERT @Table

SELECT 'en','A' UNION ALL

SELECT 'en','B' UNION ALL

SELECT 'de','B' UNION ALL

SELECT 'zh','B' UNION ALL

SELECT 'en','A' UNION ALL

SELECT 'en','B' UNION ALL

SELECT 'de','B' UNION ALL

SELECT 'en','B' UNION ALL

SELECT 'en','A' UNION ALL

SELECT 'en','A'

SELECT a.*

FROM @Table a

INNER JOIN (

SELECT lang, code

FROM @Table

WHERE code = 'B'

GROUP BY lang, code

HAVING COUNT(*) > 1) d

ON d.lang = a.lang AND d.code = a.code

ORDER BY a.lang, a.code

[/font]

Cheers

ChrisM

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

salvysmith08-887568 SSC Veteran Points: 215 More actions · Answer 3

Christopher Stobbs (9/22/2008)
HI there
Are you sure your outputs are correct?
I'm a bit confused.
thanks
Chris

Yes I am Sure with my output. If we're going to look at the first example:

DECLARE @Table TABLE (

id bigint IDENTITY PRIMARY KEY CLUSTERED,

lang varchar(100),

code char(1)

)

DECLARE @Output TABLE(id bigint)

INSERT @Table

SELECT 'en','A' UNION ALL

SELECT 'en','B' UNION ALL

SELECT 'de','B' UNION ALL

SELECT 'zh','B' UNION ALL

SELECT 'en','A' UNION ALL

SELECT 'en','B' UNION ALL

SELECT 'de','B' UNION ALL

SELECT 'en','B' UNION ALL

SELECT 'en','A' UNION ALL

SELECT 'en','A'

SELECT * FROM @Table

INSERT @Output

SELECT 10 UNION ALL

SELECT 11

SELECT * FROM @Output

Why is the output 10 and 11? because on the first batch of B's (id's 3-5), it doesn't have a duplicate. if you go to second batch of B's which are id's 8-11, you can see duplicates on id's 8,10, and 11. With this, I just need ids 10 and 11 since they are the second occurrence of duplicate onwards (meaning not the first one)

I created a simple code but it won't give me the exact output since it will group all the B's in the entire data, not by batch of B's. Anyway, here's my code:

;WITH test AS (

SELECT *,

rank() OVER (PARTITION BY lang ORDER BY id) dr

FROM @Table

WHERE code = 'B'

)

SELECT * FROM Test

WHERE dr > 1

ORDER BY id

Thanks.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 4

Now I gotcha, thanks for the extra explanation.

[font="Courier New"]DECLARE @Table TABLE (

[id] bigint IDENTITY PRIMARY KEY CLUSTERED,

lang VARCHAR(100),

code CHAR(1) )

INSERT @Table

SELECT 'en','A' UNION ALL -- 1

SELECT 'en','A' UNION ALL -- 2

SELECT 'en','B' UNION ALL -- 3

SELECT 'de','B' UNION ALL -- 4

SELECT 'zh','B' UNION ALL -- 5

SELECT 'en','A' UNION ALL -- 6

SELECT 'en','A' UNION ALL -- 7

SELECT 'en','B' UNION ALL -- 8 dupe of 3

SELECT 'de','B' UNION ALL -- 9 dupe of 4

SELECT 'en','B' UNION ALL -- 10 dupe of 3

SELECT 'en','B' UNION ALL -- 11 dupe of 3

SELECT 'en','A' UNION ALL -- 12

SELECT 'en','A' -- 13

SELECT a.*

FROM @Table a

INNER JOIN (

SELECT MIN([id]) AS FirstID, lang, code

FROM @Table

WHERE code = 'B'

GROUP BY lang, code

HAVING COUNT(*) > 1) d

ON d.lang = a.lang AND d.code = a.code AND a.[id] <> d.FirstID

ORDER BY a.[id]

[/font]

Results:

id lang code

---- ------ ----

8 en B

9 de B

10 en B

11 en B

Cheers

ChrisM

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Christopher Stobbs SSC-Insane Points: 21098 More actions · Answer 5

HI ,

Why then is 6 & 8 a result in the second example.

Also Chris your code doesn't return 10 & 11 as the OP expected.

thanks

Chris

----------------------------------------------
Try to learn something about everything and everything about something. - Thomas Henry Huxley

:w00t:
Posting Best Practices[/url]
Numbers / Tally Tables[/url]

SQL-4-Life

Christopher Stobbs SSC-Insane Points: 21098 More actions · Answer 6

Ok at last I see what is need, now just to work it out 🙂

----------------------------------------------
Try to learn something about everything and everything about something. - Thomas Henry Huxley

:w00t:
Posting Best Practices[/url]
Numbers / Tally Tables[/url]

SQL-4-Life

salvysmith08-887568 SSC Veteran Points: 215 More actions · Answer 7

Hello Chris,

Thanks for the response and help, I really appreciate it.

Actually the code you supplied doesn't give me the correct output. The first example should output 10,11 while the second example should output 6,8,12,13. The code should take a look at the second occurrence of duplicates onwards per batch of B's and not as the whole B's.

So in example one(1), the first batch of B's is 3-5 and the second is 8-11 and so on if I have more data of B's in it. In example one, the output should be 10,11

DECLARE @Table TABLE (

id bigint IDENTITY PRIMARY KEY CLUSTERED,

lang varchar(100),

code char(1)

)

DECLARE @Output TABLE(id bigint)

INSERT @Table

SELECT 'en','A' UNION ALL -- 1

SELECT 'en','A' UNION ALL -- 2

SELECT 'en','B' UNION ALL -- 3 [batch 1 of B's]

SELECT 'de','B' UNION ALL -- 4 [batch 1 of B's]

SELECT 'zh','B' UNION ALL -- 5 [batch 1 of B's]

SELECT 'en','A' UNION ALL -- 6

SELECT 'en','A' UNION ALL -- 7

SELECT 'en','B' UNION ALL -- 8 [batch 2 of B's]

SELECT 'de','B' UNION ALL -- 9 [batch 2 of B's]

SELECT 'en','B' UNION ALL -- 10 dupe of 8 [batch 2 of B's] - second occurrence of dupe

SELECT 'en','B' UNION ALL -- 11 dupe of 8 [batch 2 of B's] - third occurrence of dupe

SELECT 'en','A' UNION ALL -- 12

SELECT 'en','A' -- 13

SELECT * FROM @Table

INSERT @Output

SELECT 10 UNION ALL

SELECT 11

SELECT * FROM @Output

In the second(2) example, the first batch of B's is from 4-8 and the second batch is from 11-13. In this example, the output should be 6,8,12,13.

DECLARE @Table TABLE (

id bigint IDENTITY PRIMARY KEY CLUSTERED,

lang varchar(100),

code char(1)

)

DECLARE @Output TABLE(id bigint)

INSERT @Table

SELECT 'en','A' UNION ALL -- 1

SELECT 'en','A' UNION ALL -- 2

SELECT 'en','A' UNION ALL -- 3

SELECT 'de','B' UNION ALL -- 4 [batch 1 of B's]

SELECT 'zh','B' UNION ALL -- 5 [batch 1 of B's]

SELECT 'zh','B' UNION ALL -- 6 dupe of 5 [batch 1 of B's] - second occurence of dupe

SELECT 'en','B' UNION ALL -- 7 [batch 1 of B's]

SELECT 'en','B' UNION ALL -- 8 dupe of 7 [batch 1 of B's] - second occurence of dupe

SELECT 'de','A' UNION ALL -- 9

SELECT 'en','A' UNION ALL -- 10

SELECT 'en','B' UNION ALL -- 11 [batch 2 of B's]

SELECT 'en','B' UNION ALL -- 12 dupe of 11 [batch 2 of B's] - second occurence of dupe

SELECT 'en','B' -- 13 dupe of 11 [batch 2 of B's] - third occurence of dupe

SELECT * FROM @Table

INSERT @Output

SELECT 6 UNION ALL

SELECT 8 UNION ALL

SELECT 12 UNION ALL

SELECT 13

SELECT * FROM @Output

I hope this would help clear things.

Thank you so much.

salvysmith08-887568 SSC Veteran Points: 215 More actions · Answer 8

Christopher Stobbs (9/23/2008)
Ok at last I see what is need, now just to work it out 🙂

Thank you Chris for the help.

Christopher Stobbs SSC-Insane Points: 21098 More actions · Answer 9

Are there any other cols in the table that might help with identifying the groupings?

----------------------------------------------
Try to learn something about everything and everything about something. - Thomas Henry Huxley

:w00t:
Posting Best Practices[/url]
Numbers / Tally Tables[/url]

SQL-4-Life

Christopher Stobbs SSC-Insane Points: 21098 More actions · Answer 10

mmm ok here is what I have.

It involves a new column and an update statement.

[font="Courier New"]

DECLARE @Table TABLE (

id bigint IDENTITY PRIMARY KEY CLUSTERED,

lang VARCHAR(100),

code CHAR(1),

GRP INT

)

DECLARE @Output TABLE(id bigint)

INSERT @Table(lang,code)

SELECT 'en','A' UNION ALL

SELECT 'en','B' UNION ALL

SELECT 'de','B' UNION ALL

SELECT 'zh','B' UNION ALL

SELECT 'en','A' UNION ALL

SELECT 'en','B' UNION ALL

SELECT 'de','B' UNION ALL

SELECT 'en','B' UNION ALL

SELECT 'en','A' UNION ALL

SELECT 'en','A'

SELECT * FROM @Table

INSERT @Output

SELECT 10 UNION ALL

SELECT 11

SELECT * FROM @Output

--SOLUTION

--********************************

DECLARE @grp INT

SET @grp = 0

UPDATE @Table SET GRP = 0 WHERE id = 1

-- CREATE THE GROUPING

UPDATE t

SET @grp = GRP = CASE WHEN a.code != t.Code

THEN @grp + 1

ELSE @grp

END

FROM @Table t

INNER JOIN @Table a ON a.Id +1 = t.Id

--REMOVE THE FIRST ROW FROM EACH SUB-GROUP

SELECT id,Lang,Code

FROM

(SELECT

ROW_NUMBER() OVER (PARTITION BY Lang,Code,grp ORDER BY id) num,

*

FROM @Table

WHERE Code = 'b'

) nest

WHERE Num ! =1

ORDER BY id

-[/font]

----------------------------------------------
Try to learn something about everything and everything about something. - Thomas Henry Huxley

:w00t:
Posting Best Practices[/url]
Numbers / Tally Tables[/url]

SQL-4-Life

salvysmith08-887568 SSC Veteran Points: 215 More actions · Answer 11

Christopher Stobbs (9/23/2008)
Are there any other cols in the table that might help with identifying the groupings?

There are no other columns to determine groupings, another information though is that the table here are dynamic and also columns are dynamic, so there is a possibility that there would be another column that will be added aside from "lang" in which to check for dupes. Anyway, I have an idea on how to make it dynamic and i think the most important thing is the logic and the performance of the code because eventually it is anticipated to have 1 million or more data with the said patterns.

Christopher Stobbs (9/23/2008)
mmm ok here is what I have.
It involves a new column and an update statement.
[font="Courier New"]
DECLARE @Table TABLE (
        id bigint IDENTITY PRIMARY KEY CLUSTERED,
        lang VARCHAR(100),
        code CHAR(1),
       GRP INT
)
DECLARE @Output TABLE(id bigint)
INSERT @Table(lang,code)
SELECT 'en','A' UNION ALL
SELECT 'en','A' UNION ALL
SELECT 'en','B' UNION ALL
SELECT 'de','B' UNION ALL
SELECT 'zh','B' UNION ALL
SELECT 'en','A' UNION ALL
SELECT 'en','A' UNION ALL
SELECT 'en','B' UNION ALL
SELECT 'de','B' UNION ALL
SELECT 'en','B' UNION ALL
SELECT 'en','B' UNION ALL
SELECT 'en','A' UNION ALL
SELECT 'en','A'
SELECT * FROM @Table
INSERT @Output
SELECT 10 UNION ALL
SELECT 11
SELECT * FROM @Output
--SOLUTION
--********************************
DECLARE @grp INT
SET @grp = 0
UPDATE @Table  SET GRP = 0 WHERE id = 1
-- CREATE THE GROUPING
UPDATE t
SET @grp = GRP = CASE WHEN  a.code != t.Code
                     THEN @grp + 1
                     ELSE @grp
                END
FROM @Table t
   INNER JOIN @Table a ON a.Id +1 = t.Id
--REMOVE THE FIRST ROW FROM EACH SUB-GROUP
SELECT id,Lang,Code
FROM
   (SELECT
       ROW_NUMBER() OVER (PARTITION BY Lang,Code,grp ORDER BY id) num,
       *
   FROM @Table
   WHERE Code = 'b'
   ) nest
WHERE  Num ! =1
ORDER BY id
-[/font]

I can't thank you enough for the help Chris. Though there is a GRP column added on the sample table, since I have dynamic table, is it possible to add a column 'GRP' dynamically?

Thanks.

Christopher Stobbs SSC-Insane Points: 21098 More actions · Answer 12

Yeah it shouldn't be a problem to hard code that Column, cause looks like that Column and the Code Column will be the only two that you kow for certian.

I would suggest creating a sample to test with a million rows.

I would also suggest testing it with temp tables and real tables, as I would think that 1mil rows in a table variable might cause performance issues.

Thanks

Chris

----------------------------------------------
Try to learn something about everything and everything about something. - Thomas Henry Huxley

:w00t:
Posting Best Practices[/url]
Numbers / Tally Tables[/url]

SQL-4-Life

salvysmith08-887568 SSC Veteran Points: 215 More actions · Answer 13

Christopher Stobbs (9/24/2008)
Yeah it shouldn't be a problem to hard code that Column, cause looks like that Column and the Code Column will be the only two that you kow for certian.
I would suggest creating a sample to test with a million rows.
I would also suggest testing it with temp tables and real tables, as I would think that 1mil rows in a table variable might cause performance issues.
Thanks
Chris

Thank you for the response and suggestion Chris, if you can show me how to dynamically add column GRP on every dynamic table then that would great. I'll try to create test data with 1 million rows but basically its pretty much the same pattern with my 2 examples and repeated until 1 million data.

Thanks.

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 14

salvysmith08 (9/24/2008)
...if you can show me how to dynamically add column GRP on every dynamic table then that would great.

Old gaming trick... when you can't move, change color... Temp Table to the rescue... 😛

--===== Add the Grp column to all the other columns in a temp table

SELECT *,0 AS Grp

INTO #MyHead

FROM @Table

--===== Add the same primary key as the original table

ALTER TABLE #MyHead

ADD PRIMARY KEY CLUSTERED (ID)

--===== Populate the Grp column using a special update

DECLARE @PrevGrp INT; SET @PrevGrp = 0

DECLARE @PrevCode CHAR(1); SET @PrevCode = ''

UPDATE #MyHead

SET @PrevGrp = Grp = CASE WHEN Code = @PrevCode THEN @PrevGrp ELSE @PrevGrp+1 END,

@PrevCode = Code

--===== Return the ID's of only Code "B" rows where dupes exist within a group.

;WITH

cteCounts AS

(SELECT ROW_NUMBER() OVER (PARTITION BY Code,Lang,Grp ORDER BY ID) AS MyCount,*

FROM #MyHead)

SELECT ID FROM cteCounts WHERE Code = 'B' AND MyCount > 1 ORDER BY ID

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)