September 4, 2008 at 9:40 am
How many different locales and how many different RowKeys are there?
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
September 4, 2008 at 10:03 am
jj, that still doesn't get it. You have to know the range before-hand for that solution to work. If you know the range already, then you don't need to do the query at all, since you already have the final answer that's being asked for.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
September 4, 2008 at 10:33 am
GSquared (9/4/2008)
jj, that still doesn't get it. You have to know the range before-hand for that solution to work. If you know the range already, then you don't need to do the query at all, since you already have the final answer that's being asked for.
i guess the counter seems easy but hard to code...
but u think this is achievable in set based, maybe pseudocursor or something?
September 4, 2008 at 11:11 am
jj011185 (9/4/2008)
hmm, sounds like its going to be messy.
lets try not to go into cursors/loops/recursion.. this is indeed challenging. if not impossible.
Messy, yes. Impossible no. With a temp table, you could use something like a "running total" process to track that down. I have the beginning of the thought how to do it, but don't have time to code it right now.
Basis for this would be something like this:
;with MattCTE as (
select ROW_NUMBER() over (partition by Rowkey, locale order by rowID) RN, *
from @mytable)
Select m1.*, isnull(m2.rowID,0) PrevRowID
INTO #MyTempTable
from MattCTE m1 left outer join
MattCTE m1 on m1.rowkey=m2.rowkey and m1.locale=m2.locale and m1.rn=m2.rn-1
Now - slap a clustered index on #MyTempTable, and go to town. Using the "running variables" - increment the sequence if the "previousRowID" is greater than or equal to the row you started the last sequence on.
If that doesn't help - then I will try to code it later tonight to give you a running sample.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
September 4, 2008 at 12:03 pm
I am pretty sure that I know how to do it too. I just need the OP to answer my question first:
rbarryyoung (9/4/2008)
How many different locales and how many different RowKeys are there?
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
September 5, 2008 at 11:47 am
bump, anyone have more idea?
September 5, 2008 at 11:55 am
Waiting for the Original Poster to answer our questions...
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
September 5, 2008 at 2:34 pm
This is what I mean by a recursive solution:
create table #T (
ID int identity,
Col1 int,
Col2 int)
insert into #T (Col1, Col2)
select 1, 1 union all
select 2, 2 union all
select 3, 3 union all
select 4, 4 union all
select 1, 1 union all
select 2, 2 union all
select 5, 5 union all
select 6, 6 union all
select 5, 5 union all
select 7, 7 union all
select 8, 8 union all
select 4, 4 union all
select 7, 7
;with CTE (DupeID) as
(select min(t1.id)
from #t t1
inner join #t t2
on t1.col1 = t2.col1
and t1.col2 = t2.col2
and t1.id > t2.id
union all
select t3.id
from #t t3
inner join cte
on t3.id > cte.dupeid
inner join #t t4
on t3.col1 = t4.col1
and t3.col2 = t4.col2
and t3.id > t4.id
and t4.id > cte.dupeid)
select distinct #t.*,
case
when dupeid is not null then 1
else 0
end as LastInSet
from #t
left outer join cte
on #t.id = cte.dupeid
This does get what rows are the Last In Set rows. Of course, it will need to be modified to use the tables and rows of the OP, but it should get the answer.
I tried to set up a "running totals" type solution, but couldn't get it to do what I wanted. Someone else might have better luck with it. In that case, we'd have to test performance of both solutions.
This one does work.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
September 5, 2008 at 3:59 pm
Hi All,
Sorry for the late reply.
Christian Buettner:
Hi dans,
You might try something like in the attached example.
I have not spent too much time verifying it, so it might contain a few errors.
But the concept might be helpful.
I have tested it with 5000 copies of the original data you posted, and the query ran for approximately 1:30 minutes for 75000 RowIDs and 15000 Ranges.
Since the ranges were very small in the example, you might be able to achieve better results
with your real table if there alre also bigger ranges.
You might also try to play around with the amount by which the OptRows is increased and decreased on each test.
Best Regards,
Chris Büttner
Your script works great and it gave me the expected result on any scenarios. One possible issue though is when the data gets larger, the processing time will eventually increase with huge amount of time interval. So if I have 1 MILLION data or more, it would probably take hours to finish.
Jeff Moden
I gotta ask... why do you need to do this? What is the purpose of this?
This is used for update by batch as per requirement on our system. We need to separately process unique keys by batch to surely update every data in a table.
rbarryyoung
rbarryyoung (9/4/2008)How many different locales and how many different RowKeys are there?
In a certain data, it is possible to have 5 duplicates max per RowKey & Locale combination. So to answer your question on how many, we'll never know, it could be a Million or more.
Thank you for the prompt response and help. I really appreciate it.
-dans
September 5, 2008 at 4:14 pm
All right... Sorry it took a little while - but here goes...
On 2 million rows, the running totals version returns the sequences as I understand them... in 35.3 seconds. Gus - I tried to get your recursive CTE solution to work, but I must have missed something, because it doesn't look right to me. It also suffers badly on any "regular" sized data sets (I couldn't even get it to finish unless I cut the rowcount to 200.... (Let me know if I didn't transcribe that correctly).
Anyway - here are the 2 solutions, and some "bigger-sized" test data to work with.
--Test Data
drop table MyTable
select top 1000000
identity(bigint,1,1) RowID,
cast(rand(checksum(newID()))*20 as int) Rowkey,
cast(rand(checksum(newID()))*10 as int) Locale
into MyTable
from sys.all_columns sc1,
sys.all_columns sc2
create unique clustered index uciMyTable on MyTable(rowid)
DROP TABLE #MYtEMPtABLE
--for the test
declare @g datetime
--Running Total Solution
set @g=getdate();
create index helper on MyTable(RowKey,locale)
;with MattCTE as (
select ROW_NUMBER() over (partition by Rowkey, locale order by rowID) RN, *
from mytable)
Select m1.*, isnull(m2.rowID,0) PrevRowID, 0 as seqnum
INTO #MyTempTable
from MattCTE m1 left outer join
MattCTE m2 on m1.rowkey=m2.rowkey and m1.locale=m2.locale and m1.rn=m2.rn+1
--order by rowkey, locale,rowid
create clustered index uci_MTT on #MyTempTable (rowID)
--now for the running total part
declare @dummy int
declare @prevSeqID bigint
set @prevSeqID=-1
declare @currentSEQnum int
set @currentSEQnum=0;
update #MyTempTable
set @currentSEQnum=seqnum=case when PrevRowID>@prevSeqID then 1 else 0 end +@currentSEQnum,
@prevSeqID=case when PrevRowID>@prevSeqID then RowID else 0 end + @prevSeqID,
@dummy=@prevSeqID
from #MyTempTable WITH (Index(uci_MTT), TABLockX)
select 'Running total',datediff(ms, @g,getdate())
--select top 1000 * from #MyTempTable order by rowID
select seqnum,min(rowID) start,max(rowID) done
from #MyTempTable group by seqnum
--Gus' recursive CTE
set @g=getdate();
;with CTE (DupeID) as
(select min(t1.rowid)
from MyTable t1
inner join MyTable t2
on t1.rowkey = t2.rowKey
and t1.locale = t2.locale
and t1.rowid > t2.rowid
union all
select t3.rowid
from MyTable t3
inner join cte
on t3.rowid > cte.dupeid
inner join MyTable t4
on t3.rowkey = t4.rowkey
and t3.locale = t4.locale
and t3.rowid > t4.rowid
and t4.rowid > cte.dupeid)
select distinct MyTable.*,
case
when dupeid is not null then 1
else 0
end as LastInSet
from MyTable
left outer join cte
on MyTable.rowid = cte.dupeid
select 'Recursive CTE',datediff(ms, @g,getdate())
dsegalles80 - hopefully this works for you.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
September 5, 2008 at 6:25 pm
dsegalles80 (9/5/2008)
Jeff ModenI gotta ask... why do you need to do this? What is the purpose of this?
This is used for update by batch as per requirement on our system. We need to separately process unique keys by batch to surely update every data in a table.
Gosh... I can't help but think you're doing that the really, really hard way. If I knew more about the process...
--Jeff Moden
Change is inevitable... Change for the better is not.
September 8, 2008 at 2:38 am
dsegalles80 (9/5/2008)
Hi All,Sorry for the late reply.
Christian Buettner:
Hi dans,
You might try something like in the attached example.
I have not spent too much time verifying it, so it might contain a few errors.
But the concept might be helpful.
I have tested it with 5000 copies of the original data you posted, and the query ran for approximately 1:30 minutes for 75000 RowIDs and 15000 Ranges.
Since the ranges were very small in the example, you might be able to achieve better results
with your real table if there alre also bigger ranges.
You might also try to play around with the amount by which the OptRows is increased and decreased on each test.
Best Regards,
Chris Büttner
Your script works great and it gave me the expected result on any scenarios. One possible issue though is when the data gets larger, the processing time will eventually increase with huge amount of time interval. So if I have 1 MILLION data or more, it would probably take hours to finish.
Hi dsegalles80,
thanks for your feedback.
I reviewed my code and indeed found some areas for performance improvement.
1. The deletion of rows within the loop has been removed and replaced with a new
variable that records the highest deleted rowid (@DeletedRowID)
2. If no duplicate was found within the range of @OptRows rows, @OptRows is now
doubled instead of adding only one. (This reduces the number of loops significantly
for cases where no duplicate was found.)
3. The exit criteria is now @OptRows must be less than the original #of rows in the table.
(Replaces the check for #of remaining rows in the table compared to @OptRows)
I have tested it again with 2,250,000 rows and it finished after 1 minute and 52 seconds on my dev machine.
Best Regards,
Chris Büttner
September 8, 2008 at 5:26 am
Matt Miller
Thank you very much for the script and idea. This works for some of the scenarios though some of which are giving me incorrect data output.
Christian Buettner
Your script is terrific! this really works for me. I tested it and its more faster than the other one and output scenarios are correct. Thank very much for your help Christian.
To everyone who spent their time on this thread, I really appreciate all your effort, help, and magnificent ideas to this so thank you very much all for your time.
best regards,
-- dans
September 8, 2008 at 7:52 am
It sounds like you have a working solution, but if you want to pursue tracking down the scenarios it doesn't work in, let me know what an example might be. I already have one idea that might be causing an issue.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
September 8, 2008 at 4:28 pm
Thanks for the feedback, dans!
Best Regards,
Chris Büttner
Viewing 15 posts - 16 through 30 (of 31 total)
You must be logged in to reply to this topic. Login to reply