Porcess Range IDs

  • How many different locales and how many different RowKeys are there?

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • jj, that still doesn't get it. You have to know the range before-hand for that solution to work. If you know the range already, then you don't need to do the query at all, since you already have the final answer that's being asked for.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (9/4/2008)


    jj, that still doesn't get it. You have to know the range before-hand for that solution to work. If you know the range already, then you don't need to do the query at all, since you already have the final answer that's being asked for.

    i guess the counter seems easy but hard to code...

    but u think this is achievable in set based, maybe pseudocursor or something?

  • jj011185 (9/4/2008)


    hmm, sounds like its going to be messy.

    lets try not to go into cursors/loops/recursion.. this is indeed challenging. if not impossible.

    Messy, yes. Impossible no. With a temp table, you could use something like a "running total" process to track that down. I have the beginning of the thought how to do it, but don't have time to code it right now.

    Basis for this would be something like this:

    ;with MattCTE as (

    select ROW_NUMBER() over (partition by Rowkey, locale order by rowID) RN, *

    from @mytable)

    Select m1.*, isnull(m2.rowID,0) PrevRowID

    INTO #MyTempTable

    from MattCTE m1 left outer join

    MattCTE m1 on m1.rowkey=m2.rowkey and m1.locale=m2.locale and m1.rn=m2.rn-1

    Now - slap a clustered index on #MyTempTable, and go to town. Using the "running variables" - increment the sequence if the "previousRowID" is greater than or equal to the row you started the last sequence on.

    If that doesn't help - then I will try to code it later tonight to give you a running sample.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • I am pretty sure that I know how to do it too. I just need the OP to answer my question first:

    rbarryyoung (9/4/2008)


    How many different locales and how many different RowKeys are there?

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • bump, anyone have more idea?

  • Waiting for the Original Poster to answer our questions...

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • This is what I mean by a recursive solution:

    create table #T (

    ID int identity,

    Col1 int,

    Col2 int)

    insert into #T (Col1, Col2)

    select 1, 1 union all

    select 2, 2 union all

    select 3, 3 union all

    select 4, 4 union all

    select 1, 1 union all

    select 2, 2 union all

    select 5, 5 union all

    select 6, 6 union all

    select 5, 5 union all

    select 7, 7 union all

    select 8, 8 union all

    select 4, 4 union all

    select 7, 7

    ;with CTE (DupeID) as

    (select min(t1.id)

    from #t t1

    inner join #t t2

    on t1.col1 = t2.col1

    and t1.col2 = t2.col2

    and t1.id > t2.id

    union all

    select t3.id

    from #t t3

    inner join cte

    on t3.id > cte.dupeid

    inner join #t t4

    on t3.col1 = t4.col1

    and t3.col2 = t4.col2

    and t3.id > t4.id

    and t4.id > cte.dupeid)

    select distinct #t.*,

    case

    when dupeid is not null then 1

    else 0

    end as LastInSet

    from #t

    left outer join cte

    on #t.id = cte.dupeid

    This does get what rows are the Last In Set rows. Of course, it will need to be modified to use the tables and rows of the OP, but it should get the answer.

    I tried to set up a "running totals" type solution, but couldn't get it to do what I wanted. Someone else might have better luck with it. In that case, we'd have to test performance of both solutions.

    This one does work.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Hi All,

    Sorry for the late reply.

    Christian Buettner:

    Hi dans,

    You might try something like in the attached example.

    I have not spent too much time verifying it, so it might contain a few errors.

    But the concept might be helpful.

    I have tested it with 5000 copies of the original data you posted, and the query ran for approximately 1:30 minutes for 75000 RowIDs and 15000 Ranges.

    Since the ranges were very small in the example, you might be able to achieve better results

    with your real table if there alre also bigger ranges.

    You might also try to play around with the amount by which the OptRows is increased and decreased on each test.

    Best Regards,

    Chris Büttner

    Your script works great and it gave me the expected result on any scenarios. One possible issue though is when the data gets larger, the processing time will eventually increase with huge amount of time interval. So if I have 1 MILLION data or more, it would probably take hours to finish.

    Jeff Moden

    I gotta ask... why do you need to do this? What is the purpose of this?

    This is used for update by batch as per requirement on our system. We need to separately process unique keys by batch to surely update every data in a table.

    rbarryyoung

    rbarryyoung (9/4/2008)How many different locales and how many different RowKeys are there?

    In a certain data, it is possible to have 5 duplicates max per RowKey & Locale combination. So to answer your question on how many, we'll never know, it could be a Million or more.

    Thank you for the prompt response and help. I really appreciate it.

    -dans

  • All right... Sorry it took a little while - but here goes...

    On 2 million rows, the running totals version returns the sequences as I understand them... in 35.3 seconds. Gus - I tried to get your recursive CTE solution to work, but I must have missed something, because it doesn't look right to me. It also suffers badly on any "regular" sized data sets (I couldn't even get it to finish unless I cut the rowcount to 200.... (Let me know if I didn't transcribe that correctly).

    Anyway - here are the 2 solutions, and some "bigger-sized" test data to work with.

    --Test Data

    drop table MyTable

    select top 1000000

    identity(bigint,1,1) RowID,

    cast(rand(checksum(newID()))*20 as int) Rowkey,

    cast(rand(checksum(newID()))*10 as int) Locale

    into MyTable

    from sys.all_columns sc1,

    sys.all_columns sc2

    create unique clustered index uciMyTable on MyTable(rowid)

    DROP TABLE #MYtEMPtABLE

    --for the test

    declare @g datetime

    --Running Total Solution

    set @g=getdate();

    create index helper on MyTable(RowKey,locale)

    ;with MattCTE as (

    select ROW_NUMBER() over (partition by Rowkey, locale order by rowID) RN, *

    from mytable)

    Select m1.*, isnull(m2.rowID,0) PrevRowID, 0 as seqnum

    INTO #MyTempTable

    from MattCTE m1 left outer join

    MattCTE m2 on m1.rowkey=m2.rowkey and m1.locale=m2.locale and m1.rn=m2.rn+1

    --order by rowkey, locale,rowid

    create clustered index uci_MTT on #MyTempTable (rowID)

    --now for the running total part

    declare @dummy int

    declare @prevSeqID bigint

    set @prevSeqID=-1

    declare @currentSEQnum int

    set @currentSEQnum=0;

    update #MyTempTable

    set @currentSEQnum=seqnum=case when PrevRowID>@prevSeqID then 1 else 0 end +@currentSEQnum,

    @prevSeqID=case when PrevRowID>@prevSeqID then RowID else 0 end + @prevSeqID,

    @dummy=@prevSeqID

    from #MyTempTable WITH (Index(uci_MTT), TABLockX)

    select 'Running total',datediff(ms, @g,getdate())

    --select top 1000 * from #MyTempTable order by rowID

    select seqnum,min(rowID) start,max(rowID) done

    from #MyTempTable group by seqnum

    --Gus' recursive CTE

    set @g=getdate();

    ;with CTE (DupeID) as

    (select min(t1.rowid)

    from MyTable t1

    inner join MyTable t2

    on t1.rowkey = t2.rowKey

    and t1.locale = t2.locale

    and t1.rowid > t2.rowid

    union all

    select t3.rowid

    from MyTable t3

    inner join cte

    on t3.rowid > cte.dupeid

    inner join MyTable t4

    on t3.rowkey = t4.rowkey

    and t3.locale = t4.locale

    and t3.rowid > t4.rowid

    and t4.rowid > cte.dupeid)

    select distinct MyTable.*,

    case

    when dupeid is not null then 1

    else 0

    end as LastInSet

    from MyTable

    left outer join cte

    on MyTable.rowid = cte.dupeid

    select 'Recursive CTE',datediff(ms, @g,getdate())

    dsegalles80 - hopefully this works for you.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • dsegalles80 (9/5/2008)


    Jeff Moden

    I gotta ask... why do you need to do this? What is the purpose of this?

    This is used for update by batch as per requirement on our system. We need to separately process unique keys by batch to surely update every data in a table.

    Gosh... I can't help but think you're doing that the really, really hard way. If I knew more about the process...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • dsegalles80 (9/5/2008)


    Hi All,

    Sorry for the late reply.

    Christian Buettner:

    Hi dans,

    You might try something like in the attached example.

    I have not spent too much time verifying it, so it might contain a few errors.

    But the concept might be helpful.

    I have tested it with 5000 copies of the original data you posted, and the query ran for approximately 1:30 minutes for 75000 RowIDs and 15000 Ranges.

    Since the ranges were very small in the example, you might be able to achieve better results

    with your real table if there alre also bigger ranges.

    You might also try to play around with the amount by which the OptRows is increased and decreased on each test.

    Best Regards,

    Chris Büttner

    Your script works great and it gave me the expected result on any scenarios. One possible issue though is when the data gets larger, the processing time will eventually increase with huge amount of time interval. So if I have 1 MILLION data or more, it would probably take hours to finish.

    Hi dsegalles80,

    thanks for your feedback.

    I reviewed my code and indeed found some areas for performance improvement.

    1. The deletion of rows within the loop has been removed and replaced with a new

    variable that records the highest deleted rowid (@DeletedRowID)

    2. If no duplicate was found within the range of @OptRows rows, @OptRows is now

    doubled instead of adding only one. (This reduces the number of loops significantly

    for cases where no duplicate was found.)

    3. The exit criteria is now @OptRows must be less than the original #of rows in the table.

    (Replaces the check for #of remaining rows in the table compared to @OptRows)

    I have tested it again with 2,250,000 rows and it finished after 1 minute and 52 seconds on my dev machine.

    Best Regards,

    Chris Büttner

  • Matt Miller

    Thank you very much for the script and idea. This works for some of the scenarios though some of which are giving me incorrect data output.

    Christian Buettner

    Your script is terrific! this really works for me. I tested it and its more faster than the other one and output scenarios are correct. Thank very much for your help Christian.

    To everyone who spent their time on this thread, I really appreciate all your effort, help, and magnificent ideas to this so thank you very much all for your time.

    best regards,

    -- dans

  • It sounds like you have a working solution, but if you want to pursue tracking down the scenarios it doesn't work in, let me know what an example might be. I already have one idea that might be causing an issue.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Thanks for the feedback, dans!

    Best Regards,

    Chris Büttner

Viewing 15 posts - 16 through 30 (of 31 total)

You must be logged in to reply to this topic. Login to reply