Porcess Range IDs

Question

Porcess Range IDs

Viewing 15 posts - 16 through 30 (of 31 total)

You must be logged in to reply to this topic. Login to reply

RBarryYoung SSC Guru Points: 143329 More actions · Answer 1

How many different locales and how many different RowKeys are there?

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

GSquared SSC Guru Points: 260824 More actions · Answer 2

jj, that still doesn't get it. You have to know the range before-hand for that solution to work. If you know the range already, then you don't need to do the query at all, since you already have the final answer that's being asked for.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

jj011185 Old Hand Points: 324 More actions · Answer 3

GSquared (9/4/2008)
jj, that still doesn't get it. You have to know the range before-hand for that solution to work. If you know the range already, then you don't need to do the query at all, since you already have the final answer that's being asked for.

i guess the counter seems easy but hard to code...

but u think this is achievable in set based, maybe pseudocursor or something?

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 4

jj011185 (9/4/2008)
hmm, sounds like its going to be messy.
lets try not to go into cursors/loops/recursion.. this is indeed challenging. if not impossible.

Messy, yes. Impossible no. With a temp table, you could use something like a "running total" process to track that down. I have the beginning of the thought how to do it, but don't have time to code it right now.

Basis for this would be something like this:

;with MattCTE as (

select ROW_NUMBER() over (partition by Rowkey, locale order by rowID) RN, *

from @mytable)

Select m1.*, isnull(m2.rowID,0) PrevRowID

INTO #MyTempTable

from MattCTE m1 left outer join

MattCTE m1 on m1.rowkey=m2.rowkey and m1.locale=m2.locale and m1.rn=m2.rn-1

Now - slap a clustered index on #MyTempTable, and go to town. Using the "running variables" - increment the sequence if the "previousRowID" is greater than or equal to the row you started the last sequence on.

If that doesn't help - then I will try to code it later tonight to give you a running sample.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

RBarryYoung SSC Guru Points: 143329 More actions · Answer 5

I am pretty sure that I know how to do it too. I just need the OP to answer my question first:

rbarryyoung (9/4/2008)
How many different locales and how many different RowKeys are there?

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

jj011185 Old Hand Points: 324 More actions · Answer 6

jj011185

Old Hand

Points: 324

September 5, 2008 at 11:47 am

#867961

bump, anyone have more idea?

RBarryYoung SSC Guru Points: 143329 More actions · Answer 7

Waiting for the Original Poster to answer our questions...

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

GSquared SSC Guru Points: 260824 More actions · Answer 8

This is what I mean by a recursive solution:

create table #T (

ID int identity,

Col1 int,

Col2 int)

insert into #T (Col1, Col2)

select 1, 1 union all

select 2, 2 union all

select 3, 3 union all

select 4, 4 union all

select 1, 1 union all

select 2, 2 union all

select 5, 5 union all

select 6, 6 union all

select 5, 5 union all

select 7, 7 union all

select 8, 8 union all

select 4, 4 union all

select 7, 7

;with CTE (DupeID) as

(select min(t1.id)

from #t t1

inner join #t t2

on t1.col1 = t2.col1

and t1.col2 = t2.col2

and t1.id > t2.id

union all

select t3.id

from #t t3

inner join cte

on t3.id > cte.dupeid

inner join #t t4

on t3.col1 = t4.col1

and t3.col2 = t4.col2

and t3.id > t4.id

and t4.id > cte.dupeid)

select distinct #t.*,

case

when dupeid is not null then 1

else 0

end as LastInSet

from #t

left outer join cte

on #t.id = cte.dupeid

This does get what rows are the Last In Set rows. Of course, it will need to be modified to use the tables and rows of the OP, but it should get the answer.

I tried to set up a "running totals" type solution, but couldn't get it to do what I wanted. Someone else might have better luck with it. In that case, we'd have to test performance of both solutions.

This one does work.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Dan Segalles SSC-Addicted Points: 412 More actions · Answer 9

Hi All,

Sorry for the late reply.

Christian Buettner:

Hi dans,
You might try something like in the attached example.
I have not spent too much time verifying it, so it might contain a few errors.
But the concept might be helpful.
I have tested it with 5000 copies of the original data you posted, and the query ran for approximately 1:30 minutes for 75000 RowIDs and 15000 Ranges.
Since the ranges were very small in the example, you might be able to achieve better results
with your real table if there alre also bigger ranges.
You might also try to play around with the amount by which the OptRows is increased and decreased on each test.
Best Regards,
Chris Büttner

Your script works great and it gave me the expected result on any scenarios. One possible issue though is when the data gets larger, the processing time will eventually increase with huge amount of time interval. So if I have 1 MILLION data or more, it would probably take hours to finish.

Jeff Moden

I gotta ask... why do you need to do this? What is the purpose of this?

This is used for update by batch as per requirement on our system. We need to separately process unique keys by batch to surely update every data in a table.

rbarryyoung

rbarryyoung (9/4/2008)How many different locales and how many different RowKeys are there?

In a certain data, it is possible to have 5 duplicates max per RowKey & Locale combination. So to answer your question on how many, we'll never know, it could be a Million or more.

Thank you for the prompt response and help. I really appreciate it.

-dans

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 10

All right... Sorry it took a little while - but here goes...

On 2 million rows, the running totals version returns the sequences as I understand them... in 35.3 seconds. Gus - I tried to get your recursive CTE solution to work, but I must have missed something, because it doesn't look right to me. It also suffers badly on any "regular" sized data sets (I couldn't even get it to finish unless I cut the rowcount to 200.... (Let me know if I didn't transcribe that correctly).

Anyway - here are the 2 solutions, and some "bigger-sized" test data to work with.

--Test Data

drop table MyTable

select top 1000000

identity(bigint,1,1) RowID,

cast(rand(checksum(newID()))*20 as int) Rowkey,

cast(rand(checksum(newID()))*10 as int) Locale

into MyTable

from sys.all_columns sc1,

sys.all_columns sc2

create unique clustered index uciMyTable on MyTable(rowid)

DROP TABLE #MYtEMPtABLE

--for the test

declare @g datetime

--Running Total Solution

set @g=getdate();

create index helper on MyTable(RowKey,locale)

;with MattCTE as (

select ROW_NUMBER() over (partition by Rowkey, locale order by rowID) RN, *

from mytable)

Select m1.*, isnull(m2.rowID,0) PrevRowID, 0 as seqnum

INTO #MyTempTable

from MattCTE m1 left outer join

MattCTE m2 on m1.rowkey=m2.rowkey and m1.locale=m2.locale and m1.rn=m2.rn+1

--order by rowkey, locale,rowid

create clustered index uci_MTT on #MyTempTable (rowID)

--now for the running total part

declare @dummy int

declare @prevSeqID bigint

set @prevSeqID=-1

declare @currentSEQnum int

set @currentSEQnum=0;

update #MyTempTable

set @currentSEQnum=seqnum=case when PrevRowID>@prevSeqID then 1 else 0 end +@currentSEQnum,

@prevSeqID=case when PrevRowID>@prevSeqID then RowID else 0 end + @prevSeqID,

@dummy=@prevSeqID

from #MyTempTable WITH (Index(uci_MTT), TABLockX)

select 'Running total',datediff(ms, @g,getdate())

--select top 1000 * from #MyTempTable order by rowID

select seqnum,min(rowID) start,max(rowID) done

from #MyTempTable group by seqnum

--Gus' recursive CTE

set @g=getdate();

;with CTE (DupeID) as

(select min(t1.rowid)

from MyTable t1

inner join MyTable t2

on t1.rowkey = t2.rowKey

and t1.locale = t2.locale

and t1.rowid > t2.rowid

union all

select t3.rowid

from MyTable t3

inner join cte

on t3.rowid > cte.dupeid

inner join MyTable t4

on t3.rowkey = t4.rowkey

and t3.locale = t4.locale

and t3.rowid > t4.rowid

and t4.rowid > cte.dupeid)

select distinct MyTable.*,

case

when dupeid is not null then 1

else 0

end as LastInSet

from MyTable

left outer join cte

on MyTable.rowid = cte.dupeid

select 'Recursive CTE',datediff(ms, @g,getdate())

dsegalles80 - hopefully this works for you.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Jeff Moden SSC Guru Points: 1004533 More actions · Answer 11

dsegalles80 (9/5/2008)
Jeff Moden
I gotta ask... why do you need to do this? What is the purpose of this?
This is used for update by batch as per requirement on our system. We need to separately process unique keys by batch to surely update every data in a table.

Gosh... I can't help but think you're doing that the really, really hard way. If I knew more about the process...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Christian Buettner-167247 SSChampion Points: 13729 More actions · Answer 12

dsegalles80 (9/5/2008)
Hi All,
Sorry for the late reply.
Christian Buettner:
Hi dans,
You might try something like in the attached example.
I have not spent too much time verifying it, so it might contain a few errors.
But the concept might be helpful.
I have tested it with 5000 copies of the original data you posted, and the query ran for approximately 1:30 minutes for 75000 RowIDs and 15000 Ranges.
Since the ranges were very small in the example, you might be able to achieve better results
with your real table if there alre also bigger ranges.
You might also try to play around with the amount by which the OptRows is increased and decreased on each test.
Best Regards,
Chris Büttner
Your script works great and it gave me the expected result on any scenarios. One possible issue though is when the data gets larger, the processing time will eventually increase with huge amount of time interval. So if I have 1 MILLION data or more, it would probably take hours to finish.

Hi dsegalles80,

thanks for your feedback.

I reviewed my code and indeed found some areas for performance improvement.

1. The deletion of rows within the loop has been removed and replaced with a new

variable that records the highest deleted rowid (@DeletedRowID)

2. If no duplicate was found within the range of @OptRows rows, @OptRows is now

doubled instead of adding only one. (This reduces the number of loops significantly

for cases where no duplicate was found.)

3. The exit criteria is now @OptRows must be less than the original #of rows in the table.

(Replaces the check for #of remaining rows in the table compared to @OptRows)

I have tested it again with 2,250,000 rows and it finished after 1 minute and 52 seconds on my dev machine.

Best Regards,

Chris Büttner

Dan Segalles SSC-Addicted Points: 412 More actions · Answer 13

Matt Miller

Thank you very much for the script and idea. This works for some of the scenarios though some of which are giving me incorrect data output.

Christian Buettner

Your script is terrific! this really works for me. I tested it and its more faster than the other one and output scenarios are correct. Thank very much for your help Christian.

To everyone who spent their time on this thread, I really appreciate all your effort, help, and magnificent ideas to this so thank you very much all for your time.

best regards,

-- dans

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 14

It sounds like you have a working solution, but if you want to pursue tracking down the scenarios it doesn't work in, let me know what an example might be. I already have one idea that might be causing an issue.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Christian Buettner-167247 SSChampion Points: 13729 More actions · Answer 15

Christian Buettner-167247

SSChampion

Points: 13729

September 8, 2008 at 4:28 pm

#868791

Thanks for the feedback, dans!

Best Regards,

Chris Büttner