October 22, 2008 at 6:24 am
AnzioBake (10/22/2008)
I had some trouble logging on so read the other posts after my replyI had to make a change but now I am not sure if the result set is correct (see below)
4574 appears as a duplicate of 4572 which is a duplicate of 4571 -4570
RowIDRowKeyLocaleCode
45691deA
45711enA
45721enA
45741enA
45751enA
45761enA
45771enA
45791enA
45821enA
45831enA
45841enA
45851enA
45871enA
45891enA
45901enA
45921enA
45931enA
45951deA
45981deA
46001zhA
46011deC
Hi AnzioBake, yes you've got the CORRECT output on this second post of yours.
Thank you for your help.
October 22, 2008 at 7:09 am
Sorry for not getting back with you Dan. Lost the primary hard drive in my dev machine yesterday, so it was an all day affair getting the thing back up and running. Think I'm just about there, although the resolution / color balance on the monitors now seems all screwed up. In any case, looks like AnzioBake has discovered a solution.
October 22, 2008 at 7:12 am
I tried to post this late in the day, yesterday, but was having trouble with the website:
Dan - I believe AnzioBake has now likely found the solution, but even he still had a question about exactly what constituted a duplicate: something for which you've never provided a complete and exact statement or definition that describes it in sufficiently excruciating detail AND handles all possible input.
I'm pretty sure that's why it's taking so long for someone to provide a solution. Also, you've never stated one word about WHY you need to do this. If we knew the ultimate objective, we'd have a far better chance at helping you get there.
Steve
(aka smunson)
:):):)
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
October 22, 2008 at 7:36 am
Garadin (10/22/2008)
Sorry for not getting back with you Dan. Lost the primary hard drive in my dev machine yesterday, so it was an all day affair getting the thing back up and running. Think I'm just about there, although the resolution / color balance on the monitors now seems all screwed up. In any case, looks like AnzioBake has discovered a solution.
No problem Seth, just waiting on what the solution AnzioBake does though but I think he got it already. But if you think you have time to fix the solution you have then that would be great. I really appreciate your help guys. Thank you for that.
October 22, 2008 at 8:40 am
AnzioBake (10/22/2008)
My solution is a bit long but executes in under three (3) minutes.Here is s sample of the longer series and the results set. If Correct I will post my Solution...
Anzio, it is generally good form to always post your code here, as this gives others the chance to review it, check it and even test it. If you think that it might be too long for a post then just include it as an attachment. 🙂
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
October 22, 2008 at 9:44 am
I seem to have had some internet issues at work (where I was posting from earlier ) I will post the code as an attachment in the morning. Sorry for the delay, but I ended up in a meeting that took whole afternoon and left immediately.
October 22, 2008 at 11:49 am
Dan Segalles (10/22/2008)
But if you think you have time to fix the solution you have then that would be great.
Unfortunately, I've hit a dead wall with the method I was trying. I've tried it every which way I can think of (without completely ditching that method and using something different), and I get outdone by the query optimizer every time. It seems that it simply will not recursively run that subquery for each row. While that's really cool for the fact that it won't let me create RBAR(and has solved several other issues)... it's not very cool in the sense that it makes this one not work at all. Grabbing duplicates per group is easy. Grabbing them whilst ignoring other rows that happened before the last one, as well as including 'patches' of duplicates until the next change seems to add too much logic for this solution to handle.
October 23, 2008 at 12:56 am
This is my solution. Having a problem at work connecting to website.
--Select * From #TempTable
IF OBJECT_ID ('TempDb..#Seq','U') IS NOT NULL
BEGIN
DROP TABLE #Seq
END
CREATE TABLE #Seq
(
RowID int PRIMARY KEY CLUSTERED,
Code char(1)
,NextSeqStartRowID int
)
IF OBJECT_ID ('TempDb..#Dupe','U') IS NOT NULL
BEGIN
DROP TABLE #Dupe
END
CREATE TABLE #Dupe (
SeqStartRowID int
, Code Char(1)
, NextSeqStartRowID int
, FirstRowID int
, DuplicateRowID int
, Primary Key Clustered( SeqStartRowID , Code , NextSeqStartRowID , FirstRowID )
)
IF OBJECT_ID ('TempDb..#Result','U') IS NOT NULL
BEGIN
DROP TABLE #Result
END
CREATE TABLE #Result (
RowID int PRIMARY KEY CLUSTERED,
RowKey varchar(1000),
Locale varchar(10),
Code char(1)
)
DECLARE @OptRows int; -- variable which records the optimum number of rows to scan
DECLARE@RowCnt int
SET @OptRows = 100;
----Find All the induvidual sequences -------------------
Insert into #Seq
Select
RowID = Min( RowiD)
, Code
, NextSeqStartRowID = isNUll( NextSeqStartRowID, Max(RowID)+1 )
From(
Select
a.RowID
, a.Code
, NextSeqStartRowID = Min( b.RowID )
From #TempTable as a
Inner Join #TempTable as b
on b.rowID > a.RowID
And b.RowID <= a.RowID +@OptRows
And b.Code <> a.Code
Group by
a.RowID
, a.Code
)as c
Group by
Code
, NextSeqStartRowID
-----------------------------------------------------------
----Identify all Duplicates in sequences--------------------
Insert into #Dupe
Select
SeqStartRowID = s.rowID
, Code = s.Code
, s.NextSeqStartRowID
, FirstRowID = t.RowID
, DuplicateRowID = Min( x.RowId )
From #Seq as s
Left Join #TempTable as t
on t.rowID >= s.RowID
And t.RowID < s.NextSeqStartRowID
Inner JOin #TempTable as x
on x.RowId >= s.RowID
And x.RowID < s.NextSeqStartRowID
And x.RowID > t.rowID
And x.Locale = t.locale
And x.rowKey = t.RowKey
Group by
s.rowID
, s.Code
, s.NextSeqStartRowID
, t.RowID
Order by
s.RowID
, t.rowID
---------------------------------------------------------
----Constuct the result set-------------------------------
Delete from #Result
Insert into #Result
Select d.*
From #Seq as s
Inner join #TempTable as d
on d.rowid = s.rowid
Set @RowCnt = 1
While @Rowcnt > 0 Begin
Insert into #Result
Select d.*
From
(
Select
S.LstRes
, RowId = Min( d.DuplicateRowId )
From(
----Find Last RowID in Result Set
Select
s.RowID
, s.Code
, s.NextSeqStartRowID
, LstRes = Max( r.RowID )
From #Seq as s
Inner join #Result as r
on r.RowId >= s.rowid
And r.RowID < s.NextSeqStartRowId
Group by
s.RowID
, s.Code
, s.NextSeqStartRowID
--------------------------------
) as s
Inner join #Dupe as d
on d.SeqStartRowID = s.rowid
And d.FirstRowID >= s.LstRes ----Can be cahnged to > if you do not want to copare to the last duplicate row
Group by
LstRes
) as x
Inner join #TempTable as d
on d.RowID = x.RowID
Select @RowCnt = @@RowCount
End
----------------------------------------------------
Select * From #Result
October 23, 2008 at 11:55 pm
It has been 24 hours and no response. I just wanna confirm that the solution works and performs adequately.
October 25, 2008 at 8:22 am
AnzioBake (10/23/2008)
It has been 24 hours and no response. I just wanna confirm that the solution works and performs adequately.
Hello Anzio, I apologize for not getting back to you right away since I have to take care of some emergency matters. Anyway, I tested your code during the weekend for integrity but it wasn't giving me the correct output for some scenarios.
I simply test it with this kind of scenario:
INSERT #TempTable
SELECT 1,'1','en','A' UNION ALL
SELECT 2,'1','en','A' UNION ALL
SELECT 5,'1','en','C' UNION ALL
SELECT 7,'1','en','A' UNION ALL
SELECT 8,'1','zh','A' UNION ALL
SELECT 10,'1','en','A' UNION ALL
SELECT 11,'1','zh','A' UNION ALL
SELECT 20,'1','en','D' UNION ALL
SELECT 21,'1','en','D'
GO
Which should output:
RowIDRowKeyLocaleCode
21enA
51enC
71enA
101enA
201enD
Your script outputted this:
RowIDRowKeyLocaleCode
11enA
21enA
51enC
71enA
101enA
Thank for your help Anzio.
October 27, 2008 at 12:18 am
No problem with the delay. I just wanted to know if it has solved your problem...I will check the scenario where it gives the incorrect answer
October 27, 2008 at 12:58 am
Sorry for posting the whole script again. some minor changes required but the code below gives the required result.
I changed the script to
1. Take All codes into Account
2. Only look at duplicates for code = A and
3. Remove the first occurance of code = A from the result set.
Question
Do you want to remove the first occurance of Code ='A' or the first record? You can adjust the delete record script based on your answer.
IF OBJECT_ID ('TempDB..#TempTable','U') IS NOT NULL
BEGIN
DROP TABLE #TempTable
END
CREATE TABLE #TempTable
(
RowID int PRIMARY KEY CLUSTERED,
RowKey varchar(1000),
Locale varchar(10),
Code char(1)
)
GO
INSERT #TempTable
SELECT 1,'1','en','A' UNION ALL
SELECT 2,'1','en','A' UNION ALL
SELECT 5,'1','en','C' UNION ALL
SELECT 7,'1','en','A' UNION ALL
SELECT 8,'1','zh','A' UNION ALL
SELECT 10,'1','en','A' UNION ALL
SELECT 11,'1','zh','A' UNION ALL
SELECT 20,'1','en','D' UNION ALL
SELECT 21,'1','en','D'
GO
/*
Which should output:
RowID RowKey Locale Code
2 1 en A
5 1 en C
7 1 en A
10 1 en A
20 1 en D
Your script outputted this:
RowID RowKey Locale Code
1 1 en A
2 1 en A
5 1 en C
7 1 en A
10 1 en A
*/
--Select * From #TempTable
IF OBJECT_ID ('TempDb..#Seq','U') IS NOT NULL
BEGIN
DROP TABLE #Seq
END
CREATE TABLE #Seq
(
RowID int PRIMARY KEY CLUSTERED,
Code char(1)
,NextSeqStartRowID int
)
IF OBJECT_ID ('TempDb..#Dupe','U') IS NOT NULL
BEGIN
DROP TABLE #Dupe
END
CREATE TABLE #Dupe (
SeqStartRowID int
, Code Char(1)
, NextSeqStartRowID int
, FirstRowID int
, DuplicateRowID int
, Primary Key Clustered( SeqStartRowID , Code , NextSeqStartRowID , FirstRowID )
)
IF OBJECT_ID ('TempDb..#Result','U') IS NOT NULL
BEGIN
DROP TABLE #Result
END
CREATE TABLE #Result (
RowID int PRIMARY KEY CLUSTERED,
RowKey varchar(1000),
Locale varchar(10),
Code char(1)
)
DECLARE @OptRows int; -- variable which records the optimum number of rows to scan
DECLARE@RowCnt int
SET @OptRows = 100;
----Find All the induvidual sequences -------------------
Insert into #Seq
Select
RowID = Min( RowiD)
, Code
, NextSeqStartRowID = isNUll( NextSeqStartRowID, Max(RowID)+1 )
From(
Select
a.RowID
, a.Code
, NextSeqStartRowID = Min( b.RowID )
From #TempTable as a
Left Join #TempTable as b
on b.rowID > a.RowID
And b.RowID <= a.RowID +@OptRows
And b.Code <> a.Code
Group by
a.RowID
, a.Code
)as c
Group by
Code
, NextSeqStartRowID
-----------------------------------------------------------
----Identify all Duplicates in sequences--------------------
Insert into #Dupe
Select
SeqStartRowID = s.rowID
, Code = s.Code
, s.NextSeqStartRowID
, FirstRowID = t.RowID
, DuplicateRowID = Min( x.RowId )
From #Seq as s
Left Join #TempTable as t
on t.rowID >= s.RowID
And t.RowID < s.NextSeqStartRowID
Inner JOin #TempTable as x
on x.RowId >= s.RowID
And x.RowID < s.NextSeqStartRowID
And x.RowID > t.rowID
And x.Locale = t.locale
And x.rowKey = t.RowKey
Group by
s.rowID
, s.Code
, s.NextSeqStartRowID
, t.RowID
Order by
s.RowID
, t.rowID
---------------------------------------------------------
----Construct the result set-------------------------------
Delete from #Result
Insert into #Result
Select d.*
From #Seq as s
Inner join #TempTable as d
on d.rowid = s.rowid
Set @RowCnt = 1
While @Rowcnt > 0 Begin
Insert into #Result
Select d.*
From
(
Select
S.LstRes
, RowId = Min( d.DuplicateRowId )
From(
----Find Last RowID in Result Set
Select
s.RowID
, s.Code
, s.NextSeqStartRowID
, LstRes = Max( r.RowID )
From #Seq as s
Inner join #Result as r
on r.RowId >= s.rowid
And r.RowID < s.NextSeqStartRowId
Group by
s.RowID
, s.Code
, s.NextSeqStartRowID
--------------------------------
) as s
Inner join #Dupe as d
on d.SeqStartRowID = s.rowid
And d.FirstRowID >= s.LstRes ----Can be changed to > if you do not want to compare to the last duplicate row
And d.Code = 'A'--Only interested in Duplicates of A
Group by
LstRes
) as x
Inner join #TempTable as d
on d.RowID = x.RowID
Select @RowCnt = @@RowCount
End
----------------------------------------------------
--Remove First Record of the sequence for Code = 'A'---------
Delete from tgt
From #Result as tgt
Inner Join(
Select Min(RowID) as RowID
From #Seq
Where Code = 'A'
)as seq
on seq.RowID = Tgt.RowID
------------------------------
Select * From #Result
October 27, 2008 at 1:11 am
October 27, 2008 at 5:48 am
Peso (10/27/2008)
Also seehttp://www.sqlteam.com/forums/topic.asp?TOPIC_ID=112721
http://www.sqlservercentral.com/Forums/Topic562564-338-1.aspx
http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=3829716&SiteID=1
http://forums.microsoft.com/TechNet/ShowPost.aspx?PostID=3928920&SiteID=17
Heh... but the question remains... has anyone done it correctly, yet?
--Jeff Moden
Change is inevitable... Change for the better is not.
October 27, 2008 at 6:22 am
For records {1, 2} OP wants the second record with ID 2 because there is a duplicate record.
For records {7, 8, 10, 11} OP wants the first record with ID 7 because there are duplicate records.
Why first record for first batch and second record for second batch?
What are the business rules for this?
I know the business rule is to remove the first duplicate of every batch, but applying that rule on second batch would lead to return record with ID 8, not ID 7 as OP writes.
The reason I posted the many links is that in every link there are different descriptions of the business rules. I don't know which is correct.
N 56°04'39.16"
E 12°55'05.25"
Viewing 15 posts - 31 through 45 (of 115 total)
You must be logged in to reply to this topic. Login to reply