October 30, 2008 at 10:57 am
Compromise Solution:
The following code uses set-based logic to get the set of all valid AND invalid duplicates within each 'A' range. It populates the @worktable table variable with all of the dups and the rowid of the previous row with the same key/locale/code (the [preMax] column.)
The compromise is to then update this table using a CASE statement that tests the value of the [premax] colunmn against the value of the @lastDup variable and resets the variable. This update may run faster than the While loop. Dan, please test it out for performance. I have no objections to the table variable being converted to a temp table if you take this to production.
(Yes, Anzio, under the covers it's all loops. We're talking EXPLICIT loops like Steve/smunson said. The difference is that in while loops, the inserts/select queries are running multiple times, which is usually less efficient than running one query.)
------------------------------------------------------------------------------------------------------------------
declare @worktable table (ARange int, rowID int primary key, rowKey int, locale char(2), code char(1), preMin int, preMax int,lastDup int)
declare @example int
declare @lastDup int
set @example = 3
drop table #temptable
CREATE TABLE #TempTable
(RowID int primary key
,RowKey varchar(1000)
,Locale varchar(10)
,Code char(1)
)
IF @example = 1
begin
INSERT INTO #TempTable
SELECT 1 as rowID,'1','en','A' UNION ALL
SELECT 2,'1','en','A' UNION ALL
SELECT 5,'1','en','C' UNION ALL
SELECT 7,'1','en','A' UNION ALL
SELECT 8,'1','zh','A' UNION ALL
SELECT 9,'1','en','A' UNION ALL -- test
SELECT 10,'1','en','A' UNION ALL
SELECT 11,'1','zh','A' UNION ALL
SELECT 20,'1','en','D' UNION ALL
SELECT 21,'1','en','D'
order by rowID
end
IF @example = 2-- A's only
begin
INSERT INTO #TempTable
SELECT 1 as RowID,'1' as RowKey,'en' as Locale,'A' as Code UNION ALL
SELECT 2,'1','zh','A' UNION ALL
SELECT 3,'1','zh','A' UNION ALL
SELECT 4,'2','en','A' UNION ALL
SELECT 5,'2','en','A' UNION ALL
SELECT 10,'1','zh','A' UNION ALL
SELECT 11,'1','zh','A'
order by rowID
end
IF @example = 3
begin
insert into #tempTable
SELECT 1 as rowID,'1','en','A' UNION ALL
SELECT 2,'2','en','A' UNION ALL
SELECT 3,'3','en','A' UNION ALL
SELECT 4,'1','en','A' UNION ALL
SELECT 5,'2','en','A' UNION ALL
SELECT 6,'2','en','A' UNION ALL
SELECT 13,'3','en','A' UNION ALL
SELECT 14,'1','en','A' UNION ALL
SELECT 15,'2','en','A' UNION ALL
SELECT 23,'3','en','A' UNION ALL
SELECT 24,'1','en','A' UNION ALL
--SELECT 30,'2','en','A' UNION ALL
SELECT 31,'2','en','A'
order by rowID
end
select * from #temptable
;with cte1 as -- identify breaks
(select t1.RowID,t1.code
from #temptable t1
where t1.code <> (select top 1 code from #temptable where RowID < t1.rowID order by rowID desc)
)
,cte1a as -- include first and last row for ranging purposes
(select t1.RowID, t1.code
from #temptable t1
where RowID = (select min(RowID) from #temptable t2)
union all
select * from cte1
union all
select max(RowID)+1, '~'
from #temptable t1
)
,cte2 as -- identify ranges
(select c1.code,c1.RowID as ARange,isnull(min(c1a.RowID),max(c1.RowID)) as maxRow
from cte1a c1
left join cte1a c1a on c1a.code <> c1.code and c1a.RowID > c1.RowID
where c1.code = 'A'
group by c1.code,c1.RowID
)
,cte3 as -- associate individual rows with their ranges
(select ARange,t.*
from #tempTable t
join cte2 c on t.rowID between ARange and MaxRow
where t.code = 'A'
)
,cte4 as -- identify "raw" duplicates within ranges
(select c1.*,min(c2.rowID) as preMin,max(c2.rowID) as preMax
from cte3 c1
join cte3 c2 on c2.ARange = c1.ARange
and c2.rowKey = c1.rowKey
and c2.locale = c1.locale
and c2.code = c1.code
and c2.rowID < c1.rowID
group by c1.ARange, c1.rowID,c1.rowKey,c1.locale,c1.code
)
-- populate worktable with all dups
insert into @workTable
select Arange, RowId,RowKey,Locale,Code,preMin,PreMax, null as validDup
from cte4 c
-- flag valid dups
set @lastDup = 0
update @worktable
set @lastDup = lastDup = case when(preMax) >= @lastdup then rowID
else @lastDup
end
select rowID,rowKey,locale,code
from @worktable
where rowID = lastDup
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
October 30, 2008 at 12:40 pm
Example 1 in your "compromise solution" provides a result set consisting of RowID's 2, 9, & 10. Isn't RowID 5 supposed to be included?
Also, we STILL don't have an answer from Dan Segalles about exactly what real-world problem this query is supposed to solve.... Dan ???
Steve
(aka smunson)
:):):)
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
October 30, 2008 at 1:28 pm
No, if you look at the example 1 data in the compromise solution, 5 is code "C". Change it to "A" and it will show up.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
October 30, 2008 at 2:04 pm
Row 5 should be included, as it's a code change. He doesn't want *dupes* of C, but if it's a code change, it is still included.
October 30, 2008 at 2:16 pm
GIMME A BREAK HERE !! :w00t:
I was focused on showing the compromise solution to the duplicates only, since no one had any problems identifying the breaks. How about this?
Also, you guys missed that rows 7 and 20 were also breaks, so nyah.....:cool:
------------------------------------------------------------------------------------------------------------------
declare @worktable table (ARange int, rowID int primary key, rowKey int, locale char(2), code char(1), preMin int, preMax int,lastDup int)
declare @breaks table (RowID int primary key,RowKey int, Locale char(2), code char(1))
declare @example int
declare @lastDup int
set @example = 1
drop table #temptable
CREATE TABLE #TempTable
(RowID int primary key
,RowKey varchar(1000)
,Locale varchar(10)
,Code char(1)
)
IF @example = 1
begin
INSERT INTO #TempTable
SELECT 1 as rowID,'1','en','A' UNION ALL
SELECT 2,'1','en','A' UNION ALL
SELECT 5,'1','en','C' UNION ALL
SELECT 7,'1','en','A' UNION ALL
SELECT 8,'1','zh','A' UNION ALL
SELECT 9,'1','en','A' UNION ALL -- test
SELECT 10,'1','en','A' UNION ALL
SELECT 11,'1','zh','A' UNION ALL
SELECT 20,'1','en','D' UNION ALL
SELECT 21,'1','en','D'
order by rowID
end
select * from #temptable
insert into @breaks
select t1.RowID,t1.rowKey,t1.locale,t1.code
from #temptable t1
where t1.code <> (select top 1 code from #temptable where RowID < t1.rowID order by rowID desc)
;with cte1a as -- include first and last row for ranging purposes
(select t1.RowID, t1.code
from #temptable t1
where RowID = (select min(RowID) from #temptable t2)
union all
select rowId,code from @breaks
union all
select max(RowID)+1, '~'
from #temptable t1
)
,cte2 as -- identify ranges
(select c1.code,c1.RowID as ARange,isnull(min(c1a.RowID),max(c1.RowID)) as maxRow
from cte1a c1
left join cte1a c1a on c1a.code <> c1.code and c1a.RowID > c1.RowID
where c1.code = 'A'
group by c1.code,c1.RowID
)
,cte3 as -- associate individual rows with their ranges
(select ARange,t.*
from #tempTable t
join cte2 c on t.rowID between ARange and MaxRow
where t.code = 'A'
)
,cte4 as -- identify "raw" duplicates within ranges
(select c1.*,min(c2.rowID) as preMin,max(c2.rowID) as preMax
from cte3 c1
join cte3 c2 on c2.ARange = c1.ARange
and c2.rowKey = c1.rowKey
and c2.locale = c1.locale
and c2.code = c1.code
and c2.rowID < c1.rowID
group by c1.ARange, c1.rowID,c1.rowKey,c1.locale,c1.code
)
-- populate worktable with all dups
insert into @workTable
select Arange, RowId,RowKey,Locale,Code,preMin,PreMax, null as validDup
from cte4 c
-- flag valid dups
set @lastDup = 0
update @worktable
set @lastDup = lastDup = case when(preMax) >= @lastdup then rowID
else @lastDup
end
select rowID,rowKey,locale,code
from @worktable
where rowID = lastDup
union all
select * from @breaks
order by rowID
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
October 30, 2008 at 2:39 pm
GIMME A BREAK HERE !!
I was focused on showing the compromise solution to the duplicates only, since no one had any problems identifying the breaks. How about this?
Also, you guys missed that rows 7 and 20 were also breaks, so nyah.....
NO BREAKS FOR YOU!
Heh, I didn't really look too much at it, I've been slammed today, this is one of the only threads I've even semi-checked, I just glanced at 5 when Steve pointed it out.
October 30, 2008 at 2:50 pm
Garadin (10/30/2008)
NO BREAKS FOR YOU!
Tough crowd today....
Hey! I got the quote to work!
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
November 3, 2008 at 10:51 am
Dan, any updates for us on the performance and integrity of any of the proposed solutions? I believe there are at least 3 right now (Mine, anzio's and bhovious'... I can't remember if Steve withdrew his or not) waiting on testing.
November 3, 2008 at 11:30 am
I've been lurking on this topic, and I'd be interested in what the final outcome is also.
November 3, 2008 at 11:45 am
I'm still waiting for a complete and unambiguous description of the problem, as well as a good description of exactly what real-world problem the solution to the query problem is supposed to solve. I have yet to see either one. I've seen some things get close on the description, but in every case, details are either missing or are glossed over sufficiently to leave enough doubt and gaps to drive an 18-wheeler through. At this point, I suspect the OP has the solution only because one responder just happened to get it right, and is just no longer interested in this any more. Apparently, that's nothing new around here...
Steve
(aka smunson)
:):):)
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
November 3, 2008 at 11:52 am
Did you see my breakdown of the logic involved? I think it's pretty unambiguous, and covers all scenarios. The logic behind it is actually not as complicated as it initially seemed. Now, as to the second part... I haven't the slightest still ;).
November 3, 2008 at 12:07 pm
How true....
November 3, 2008 at 12:33 pm
Unfortunately, your example from several pages back conflicts with a statement by the OP that no range contains a duplicate, and your example starts out with one, unless he meant that a range has to start at row 2, or that such is an exception. Also, you did a great job describing the reason why any given row failed to qualify to be marked, but your description does NOT specify exactly how to determine that a given row is a duplicate - it says only go back to the last duplicate. Unfortunately, that leaves a host of options not covered, such as what if there is no previous duplicate in that set of A's? I might well be making this more complex than it really is, and now having reviewed your description, I think I actually know what the desired result is, but there are all kinds of problems one can run into that aren't explicitly covered. One needs to know how to handle each and every possibility.
Just a thought, but why would a set of data that starts with rows 1 thru 3 as duplicates, having only rows 2 and 3 be marked, and yet, if row 4 were C and then rows 5 thru 7 were 'A' dupes, all 3 of those dupes would be marked? I'm still trying to concoct ANY kind of scenario where that kind of lack of consistency is an essential part of anything other than some bizarre logic puzzle. At this point, I can't even remember if you have to identify a break between a C and a D, and your example didn't specify on that either.
Steve
(aka smunson)
:):):)
Garadin (11/3/2008)
Did you see my breakdown of the logic involved? I think it's pretty unambiguous, and covers all scenarios. The logic behind it is actually not as complicated as it initially seemed. Now, as to the second part... I haven't the slightest still ;).
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
November 3, 2008 at 12:42 pm
smunson (11/3/2008)
Unfortunately, your example from several pages back conflicts with a statement by the OP that no range contains a duplicate, and your example starts out with one, unless he meant that a range has to start at row 2, or that such is an exception. Also, you did a great job describing the reason why any given row failed to qualify to be marked, but your description does NOT specify exactly how to determine that a given row is a duplicate - it says only go back to the last duplicate. Unfortunately, that leaves a host of options not covered, such as what if there is no previous duplicate in that set of A's? I might well be making this more complex than it really is, and now having reviewed your description, I think I actually know what the desired result is, but there are all kinds of problems one can run into that aren't explicitly covered. One needs to know how to handle each and every possibility.Just a thought, but why would a set of data that starts with rows 1 thru 3 as duplicates, having only rows 2 and 3 be marked, and yet, if row 4 were C and then rows 5 thru 7 were 'A' dupes, all 3 of those dupes would be marked? I'm still trying to concoct ANY kind of scenario where that kind of lack of consistency is an essential part of anything other than some bizarre logic puzzle. At this point, I can't even remember if you have to identify a break between a C and a D, and your example didn't specify on that either.
Steve
(aka smunson)
:):):)
Garadin (11/3/2008)
Did you see my breakdown of the logic involved? I think it's pretty unambiguous, and covers all scenarios. The logic behind it is actually not as complicated as it initially seemed. Now, as to the second part... I haven't the slightest still ;).
While I think I could answer all this and continue to explain it, it's all somewhat of an exercise in futility as I'm not even sure the OP is still with us and I think the bigger issue you want to comprehend is how this could be useful in any practical application... which all my explanations of logic would nothing to clarify. :hehe:
November 3, 2008 at 1:03 pm
Seth,
I'm glad you came to that conclusion. I gave up my attempt long ago because it was becoming abundantly clear that the OP was just not going to "give it up" as to what this was all about, and he also just kept trying to explain without ever really being clear about what the final objective was (pardon my politics, but that reminds me of a certain pair of prominent liberal democrats). Given he needed to change his desired result several times, it just seemed that we were doing all the work and he was doing the absolute minimum necessary to get help. I try not to get too involved in that kind of effort.
Steve
(aka smunson)
:):):)
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
Viewing 15 posts - 91 through 105 (of 115 total)
You must be logged in to reply to this topic. Login to reply