September 22, 2008 at 9:38 am
Hi,
I'm new to MS SQL 2005, can anyone help me in getting the second occurrence of duplicates onwards for every batch of B's?
Here are two examples with their desired output (@Output).
First:
DECLARE @Table TABLE (
id bigint IDENTITY PRIMARY KEY CLUSTERED,
lang varchar(100),
code char(1)
)
DECLARE @Output TABLE(id bigint)
INSERT @Table
SELECT 'en','A' UNION ALL
SELECT 'en','A' UNION ALL
SELECT 'en','B' UNION ALL
SELECT 'de','B' UNION ALL
SELECT 'zh','B' UNION ALL
SELECT 'en','A' UNION ALL
SELECT 'en','A' UNION ALL
SELECT 'en','B' UNION ALL
SELECT 'de','B' UNION ALL
SELECT 'en','B' UNION ALL
SELECT 'en','B' UNION ALL
SELECT 'en','A' UNION ALL
SELECT 'en','A'
SELECT * FROM @Table
INSERT @Output
SELECT 10 UNION ALL
SELECT 11
SELECT * FROM @Output
And second:
DECLARE @Table TABLE (
id bigint IDENTITY PRIMARY KEY CLUSTERED,
lang varchar(100),
code char(1)
)
DECLARE @Output TABLE(id bigint)
INSERT @Table
SELECT 'en','A' UNION ALL
SELECT 'en','A' UNION ALL
SELECT 'en','A' UNION ALL
SELECT 'de','B' UNION ALL
SELECT 'zh','B' UNION ALL
SELECT 'zh','B' UNION ALL
SELECT 'en','B' UNION ALL
SELECT 'en','B' UNION ALL
SELECT 'de','A' UNION ALL
SELECT 'en','A' UNION ALL
SELECT 'en','B' UNION ALL
SELECT 'en','B' UNION ALL
SELECT 'en','B'
SELECT * FROM @Table
INSERT @Output
SELECT 6 UNION ALL
SELECT 8 UNION ALL
SELECT 12 UNION ALL
SELECT 13
SELECT * FROM @Output
Any help and sample code would be greatly appreciated.
Thank you.
September 22, 2008 at 9:52 am
HI there
Are you sure your outputs are correct?
I'm a bit confused.
thanks
Chris
----------------------------------------------
Try to learn something about everything and everything about something. - Thomas Henry Huxley
:w00t:
Posting Best Practices[/url]
Numbers / Tally Tables[/url]
September 22, 2008 at 10:19 am
Hello
Try this out, have a look at the result set (it will contain all the iterations of the dupes) and use it to help explain what you're trying to do.
[font="Courier New"]DECLARE @Table TABLE (
id bigint IDENTITY PRIMARY KEY CLUSTERED,
lang VARCHAR(100),
code CHAR(1) )
INSERT @Table
SELECT 'en','A' UNION ALL
SELECT 'en','A' UNION ALL
SELECT 'en','B' UNION ALL
SELECT 'de','B' UNION ALL
SELECT 'zh','B' UNION ALL
SELECT 'en','A' UNION ALL
SELECT 'en','A' UNION ALL
SELECT 'en','B' UNION ALL
SELECT 'de','B' UNION ALL
SELECT 'en','B' UNION ALL
SELECT 'en','B' UNION ALL
SELECT 'en','A' UNION ALL
SELECT 'en','A'
SELECT a.*
FROM @Table a
INNER JOIN (
SELECT lang, code
FROM @Table
WHERE code = 'B'
GROUP BY lang, code
HAVING COUNT(*) > 1) d
ON d.lang = a.lang AND d.code = a.code
ORDER BY a.lang, a.code
[/font]
Cheers
ChrisM
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
September 22, 2008 at 8:28 pm
Christopher Stobbs (9/22/2008)
HI thereAre you sure your outputs are correct?
I'm a bit confused.
thanks
Chris
Yes I am Sure with my output. If we're going to look at the first example:
DECLARE @Table TABLE (
id bigint IDENTITY PRIMARY KEY CLUSTERED,
lang varchar(100),
code char(1)
)
DECLARE @Output TABLE(id bigint)
INSERT @Table
SELECT 'en','A' UNION ALL
SELECT 'en','A' UNION ALL
SELECT 'en','B' UNION ALL
SELECT 'de','B' UNION ALL
SELECT 'zh','B' UNION ALL
SELECT 'en','A' UNION ALL
SELECT 'en','A' UNION ALL
SELECT 'en','B' UNION ALL
SELECT 'de','B' UNION ALL
SELECT 'en','B' UNION ALL
SELECT 'en','B' UNION ALL
SELECT 'en','A' UNION ALL
SELECT 'en','A'
SELECT * FROM @Table
INSERT @Output
SELECT 10 UNION ALL
SELECT 11
SELECT * FROM @Output
Why is the output 10 and 11? because on the first batch of B's (id's 3-5), it doesn't have a duplicate. if you go to second batch of B's which are id's 8-11, you can see duplicates on id's 8,10, and 11. With this, I just need ids 10 and 11 since they are the second occurrence of duplicate onwards (meaning not the first one)
I created a simple code but it won't give me the exact output since it will group all the B's in the entire data, not by batch of B's. Anyway, here's my code:
;WITH test AS (
SELECT *,
rank() OVER (PARTITION BY lang ORDER BY id) dr
FROM @Table
WHERE code = 'B'
)
SELECT * FROM Test
WHERE dr > 1
ORDER BY id
Thanks.
September 22, 2008 at 11:58 pm
Now I gotcha, thanks for the extra explanation.
[font="Courier New"]DECLARE @Table TABLE (
[id] bigint IDENTITY PRIMARY KEY CLUSTERED,
lang VARCHAR(100),
code CHAR(1) )
INSERT @Table
SELECT 'en','A' UNION ALL -- 1
SELECT 'en','A' UNION ALL -- 2
SELECT 'en','B' UNION ALL -- 3
SELECT 'de','B' UNION ALL -- 4
SELECT 'zh','B' UNION ALL -- 5
SELECT 'en','A' UNION ALL -- 6
SELECT 'en','A' UNION ALL -- 7
SELECT 'en','B' UNION ALL -- 8 dupe of 3
SELECT 'de','B' UNION ALL -- 9 dupe of 4
SELECT 'en','B' UNION ALL -- 10 dupe of 3
SELECT 'en','B' UNION ALL -- 11 dupe of 3
SELECT 'en','A' UNION ALL -- 12
SELECT 'en','A' -- 13
SELECT a.*
FROM @Table a
INNER JOIN (
SELECT MIN([id]) AS FirstID, lang, code
FROM @Table
WHERE code = 'B'
GROUP BY lang, code
HAVING COUNT(*) > 1) d
ON d.lang = a.lang AND d.code = a.code AND a.[id] <> d.FirstID
ORDER BY a.[id]
[/font]
Results:
id lang code
---- ------ ----
8 en B
9 de B
10 en B
11 en B
Cheers
ChrisM
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
September 23, 2008 at 2:16 am
HI ,
Why then is 6 & 8 a result in the second example.
Also Chris your code doesn't return 10 & 11 as the OP expected.
thanks
Chris
----------------------------------------------
Try to learn something about everything and everything about something. - Thomas Henry Huxley
:w00t:
Posting Best Practices[/url]
Numbers / Tally Tables[/url]
September 23, 2008 at 2:22 am
Ok at last I see what is need, now just to work it out ๐
----------------------------------------------
Try to learn something about everything and everything about something. - Thomas Henry Huxley
:w00t:
Posting Best Practices[/url]
Numbers / Tally Tables[/url]
September 23, 2008 at 2:41 am
Hello Chris,
Thanks for the response and help, I really appreciate it.
Actually the code you supplied doesn't give me the correct output. The first example should output 10,11 while the second example should output 6,8,12,13. The code should take a look at the second occurrence of duplicates onwards per batch of B's and not as the whole B's.
So in example one(1), the first batch of B's is 3-5 and the second is 8-11 and so on if I have more data of B's in it. In example one, the output should be 10,11
DECLARE @Table TABLE (
id bigint IDENTITY PRIMARY KEY CLUSTERED,
lang varchar(100),
code char(1)
)
DECLARE @Output TABLE(id bigint)
INSERT @Table
SELECT 'en','A' UNION ALL -- 1
SELECT 'en','A' UNION ALL -- 2
SELECT 'en','B' UNION ALL -- 3 [batch 1 of B's]
SELECT 'de','B' UNION ALL -- 4 [batch 1 of B's]
SELECT 'zh','B' UNION ALL -- 5 [batch 1 of B's]
SELECT 'en','A' UNION ALL -- 6
SELECT 'en','A' UNION ALL -- 7
SELECT 'en','B' UNION ALL -- 8 [batch 2 of B's]
SELECT 'de','B' UNION ALL -- 9 [batch 2 of B's]
SELECT 'en','B' UNION ALL -- 10 dupe of 8 [batch 2 of B's] - second occurrence of dupe
SELECT 'en','B' UNION ALL -- 11 dupe of 8 [batch 2 of B's] - third occurrence of dupe
SELECT 'en','A' UNION ALL -- 12
SELECT 'en','A' -- 13
SELECT * FROM @Table
INSERT @Output
SELECT 10 UNION ALL
SELECT 11
SELECT * FROM @Output
In the second(2) example, the first batch of B's is from 4-8 and the second batch is from 11-13. In this example, the output should be 6,8,12,13.
DECLARE @Table TABLE (
id bigint IDENTITY PRIMARY KEY CLUSTERED,
lang varchar(100),
code char(1)
)
DECLARE @Output TABLE(id bigint)
INSERT @Table
SELECT 'en','A' UNION ALL -- 1
SELECT 'en','A' UNION ALL -- 2
SELECT 'en','A' UNION ALL -- 3
SELECT 'de','B' UNION ALL -- 4 [batch 1 of B's]
SELECT 'zh','B' UNION ALL -- 5 [batch 1 of B's]
SELECT 'zh','B' UNION ALL -- 6 dupe of 5 [batch 1 of B's] - second occurence of dupe
SELECT 'en','B' UNION ALL -- 7 [batch 1 of B's]
SELECT 'en','B' UNION ALL -- 8 dupe of 7 [batch 1 of B's] - second occurence of dupe
SELECT 'de','A' UNION ALL -- 9
SELECT 'en','A' UNION ALL -- 10
SELECT 'en','B' UNION ALL -- 11 [batch 2 of B's]
SELECT 'en','B' UNION ALL -- 12 dupe of 11 [batch 2 of B's] - second occurence of dupe
SELECT 'en','B' -- 13 dupe of 11 [batch 2 of B's] - third occurence of dupe
SELECT * FROM @Table
INSERT @Output
SELECT 6 UNION ALL
SELECT 8 UNION ALL
SELECT 12 UNION ALL
SELECT 13
SELECT * FROM @Output
I hope this would help clear things.
Thank you so much.
September 23, 2008 at 3:08 am
Christopher Stobbs (9/23/2008)
Ok at last I see what is need, now just to work it out ๐
Thank you Chris for the help.
September 23, 2008 at 4:33 am
Are there any other cols in the table that might help with identifying the groupings?
----------------------------------------------
Try to learn something about everything and everything about something. - Thomas Henry Huxley
:w00t:
Posting Best Practices[/url]
Numbers / Tally Tables[/url]
September 23, 2008 at 5:18 am
mmm ok here is what I have.
It involves a new column and an update statement.
[font="Courier New"]
DECLARE @Table TABLE (
id bigint IDENTITY PRIMARY KEY CLUSTERED,
lang VARCHAR(100),
code CHAR(1),
GRP INT
)
DECLARE @Output TABLE(id bigint)
INSERT @Table(lang,code)
SELECT 'en','A' UNION ALL
SELECT 'en','A' UNION ALL
SELECT 'en','B' UNION ALL
SELECT 'de','B' UNION ALL
SELECT 'zh','B' UNION ALL
SELECT 'en','A' UNION ALL
SELECT 'en','A' UNION ALL
SELECT 'en','B' UNION ALL
SELECT 'de','B' UNION ALL
SELECT 'en','B' UNION ALL
SELECT 'en','B' UNION ALL
SELECT 'en','A' UNION ALL
SELECT 'en','A'
SELECT * FROM @Table
INSERT @Output
SELECT 10 UNION ALL
SELECT 11
SELECT * FROM @Output
--SOLUTION
--********************************
DECLARE @grp INT
SET @grp = 0
UPDATE @Table SET GRP = 0 WHERE id = 1
-- CREATE THE GROUPING
UPDATE t
SET @grp = GRP = CASE WHEN a.code != t.Code
THEN @grp + 1
ELSE @grp
END
FROM @Table t
INNER JOIN @Table a ON a.Id +1 = t.Id
--REMOVE THE FIRST ROW FROM EACH SUB-GROUP
SELECT id,Lang,Code
FROM
(SELECT
ROW_NUMBER() OVER (PARTITION BY Lang,Code,grp ORDER BY id) num,
*
FROM @Table
WHERE Code = 'b'
) nest
WHERE Num ! =1
ORDER BY id
-[/font]
----------------------------------------------
Try to learn something about everything and everything about something. - Thomas Henry Huxley
:w00t:
Posting Best Practices[/url]
Numbers / Tally Tables[/url]
September 23, 2008 at 7:12 pm
Christopher Stobbs (9/23/2008)
Are there any other cols in the table that might help with identifying the groupings?
There are no other columns to determine groupings, another information though is that the table here are dynamic and also columns are dynamic, so there is a possibility that there would be another column that will be added aside from "lang" in which to check for dupes. Anyway, I have an idea on how to make it dynamic and i think the most important thing is the logic and the performance of the code because eventually it is anticipated to have 1 million or more data with the said patterns.
Christopher Stobbs (9/23/2008)
mmm ok here is what I have.It involves a new column and an update statement.
[font="Courier New"]
DECLARE @Table TABLE (
id bigint IDENTITY PRIMARY KEY CLUSTERED,
lang VARCHAR(100),
code CHAR(1),
GRP INT
)
DECLARE @Output TABLE(id bigint)
INSERT @Table(lang,code)
SELECT 'en','A' UNION ALL
SELECT 'en','A' UNION ALL
SELECT 'en','B' UNION ALL
SELECT 'de','B' UNION ALL
SELECT 'zh','B' UNION ALL
SELECT 'en','A' UNION ALL
SELECT 'en','A' UNION ALL
SELECT 'en','B' UNION ALL
SELECT 'de','B' UNION ALL
SELECT 'en','B' UNION ALL
SELECT 'en','B' UNION ALL
SELECT 'en','A' UNION ALL
SELECT 'en','A'
SELECT * FROM @Table
INSERT @Output
SELECT 10 UNION ALL
SELECT 11
SELECT * FROM @Output
--SOLUTION
--********************************
DECLARE @grp INT
SET @grp = 0
UPDATE @Table SET GRP = 0 WHERE id = 1
-- CREATE THE GROUPING
UPDATE t
SET @grp = GRP = CASE WHEN a.code != t.Code
THEN @grp + 1
ELSE @grp
END
FROM @Table t
INNER JOIN @Table a ON a.Id +1 = t.Id
--REMOVE THE FIRST ROW FROM EACH SUB-GROUP
SELECT id,Lang,Code
FROM
(SELECT
ROW_NUMBER() OVER (PARTITION BY Lang,Code,grp ORDER BY id) num,
*
FROM @Table
WHERE Code = 'b'
) nest
WHERE Num ! =1
ORDER BY id
-[/font]
I can't thank you enough for the help Chris. Though there is a GRP column added on the sample table, since I have dynamic table, is it possible to add a column 'GRP' dynamically?
Thanks.
September 24, 2008 at 2:42 am
Yeah it shouldn't be a problem to hard code that Column, cause looks like that Column and the Code Column will be the only two that you kow for certian.
I would suggest creating a sample to test with a million rows.
I would also suggest testing it with temp tables and real tables, as I would think that 1mil rows in a table variable might cause performance issues.
Thanks
Chris
----------------------------------------------
Try to learn something about everything and everything about something. - Thomas Henry Huxley
:w00t:
Posting Best Practices[/url]
Numbers / Tally Tables[/url]
September 24, 2008 at 4:17 am
Christopher Stobbs (9/24/2008)
Yeah it shouldn't be a problem to hard code that Column, cause looks like that Column and the Code Column will be the only two that you kow for certian.I would suggest creating a sample to test with a million rows.
I would also suggest testing it with temp tables and real tables, as I would think that 1mil rows in a table variable might cause performance issues.
Thanks
Chris
Thank you for the response and suggestion Chris, if you can show me how to dynamically add column GRP on every dynamic table then that would great. I'll try to create test data with 1 million rows but basically its pretty much the same pattern with my 2 examples and repeated until 1 million data.
Thanks.
September 27, 2008 at 9:04 pm
salvysmith08 (9/24/2008)
...if you can show me how to dynamically add column GRP on every dynamic table then that would great.
Old gaming trick... when you can't move, change color... Temp Table to the rescue... ๐
--===== Add the Grp column to all the other columns in a temp table
SELECT *,0 AS Grp
INTO #MyHead
FROM @Table
--===== Add the same primary key as the original table
ALTER TABLE #MyHead
ADD PRIMARY KEY CLUSTERED (ID)
--===== Populate the Grp column using a special update
DECLARE @PrevGrp INT; SET @PrevGrp = 0
DECLARE @PrevCode CHAR(1); SET @PrevCode = ''
UPDATE #MyHead
SET @PrevGrp = Grp = CASE WHEN Code = @PrevCode THEN @PrevGrp ELSE @PrevGrp+1 END,
@PrevCode = Code
--===== Return the ID's of only Code "B" rows where dupes exist within a group.
;WITH
cteCounts AS
(SELECT ROW_NUMBER() OVER (PARTITION BY Code,Lang,Grp ORDER BY ID) AS MyCount,*
FROM #MyHead)
SELECT ID FROM cteCounts WHERE Code = 'B' AND MyCount > 1 ORDER BY ID
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply