count of consecutive value

  • I have a somewhat different business application summarized below.

    A table with columns Interval and Value. These represent an activity that occurs at each minute of a 24 hour day.

    The interval is unique [one value for each minute during the day]. The requirement is to "pivot" this data for contiguous intervals with the same value:

    Example:

    480 279

    481 279

    482 279

    483 279

    484 282

    485 282

    486 279

    487 279

    The output is

    Value From To

    279 480 483

    282 484 485

    279 486 487

    I used the Row_Number() approach to find all the contiguous data islands.

  • Edward Boyle (2/4/2009)


    I have a somewhat different business application summarized below.

    A table with columns Interval and Value. These represent an activity that occurs at each minute of a 24 hour day.

    The interval is unique [one value for each minute during the day]. The requirement is to "pivot" this data for contiguous intervals with the same value:

    Example:

    480 279

    481 279

    482 279

    483 279

    484 282

    485 282

    486 279

    487 279

    The output is

    Value From To

    279 480 483

    282 484 485

    279 486 487

    I used the Row_Number() approach to find all the contiguous data islands.

    I don't see the need for row_number(). In that case straight sql solves it

    select value_point , min(interval) as from_min ,max(interval) as to_min

    from table

    group by value_point

    order by from_min


    * Noel

  • Jeff Moden (2/3/2009)


    I agree with Barry... enjoy the cleverness of the code and the way the author of that code overcame the shortcomings of the language. 😉

    I agree. I appreciate clever and said so:

    microsoft.public.access.queries

    Sep 18 2003

    grouping records (See Groucho)

    http://tinyurl.com/5tt644

    Fast forward 5 years.

    microsoft.public.sqlserver.programming

    Oct 23 2008

    'stroed procedure to break numbers into ranges'

    http://tinyurl.com/d4fxbp

    And in 2013 I'm sure some others will discover the same clever trick 🙂

    BWAA-HAAA! So what? Heh... Consider this... If the people who wrote T-SQL and other languages got it all right the first time, people wouldn't have the need for what you're trying to sell. 😛

    I appreciate your insight...when are you going to dive into Dataphor? 🙂

    www.beyondsql.blogspot.com

  • steve dassin (2/4/2009)


    I appreciate your insight...when are you going to dive into Dataphor? 🙂

    In all likely hood, never. Not because I don't think it's worthwhile or anything like that. It's for one of the same reasons I don't use CLR's... it's not available on every instance of SQL Server I'll run into and telling people I can't do something without a chunk of 3rd party software just isn't what people want to hear when they hire a consultant. Maybe, if I were a permanent employee, things might be a bit different.

    You've got some good stuff in there... why don't you try to convice M$ to buy your good extensions?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/31/2009)


    Nah... I take it back... not clever... freakin' brilliant. The key to this whole thing is the grouping. You've gotta be able to group each "cluster" of rows that are in a sequence by date and slot. I knew it couldn't be done with a single ROW_NUMBER, but the winner used the difference between two ROW_NUMBERs with different partitions to generate the GRP for each student... absolutely brilliant.

    Before I studied the winners code, I did a "data smear" using the "running total quirky update" to form the groups. The advantage with the quirky update is that it'll work in both SQL Server 2k and 2k5. But, for 2k5, the winner's solution beats the quirky update.

    Here's the results...

    7936 ms - Quirky update

    7390 ms - Winner's code

    I'll take that 6.9% defeat as a substantial butt whipping... further, the winner's code also has the advantage of being able to be put into a view, if you need to do such a thing. My hat's off to the winner... I didn't believe such a thing was possible.

    Lordy, I love it when I learn something new! 🙂

    Wow! That approach really is brilliant.

    I am in awe that someone thought of that!

  • Noel

    Perhaps I mis-stated the problem. I am looking for separate contiguous intervals with the same value, not the start and end for any value [as your code provides].

    In any case the sampel code below.

    Create Table #T (Interval Int,Val Int)

    Insert #T Values(480, 279)

    Insert #T Values(481, 279)

    Insert #T Values(482, 279)

    Insert #T Values(483, 279)

    Insert #T Values(484, 282)

    Insert #T Values(486, 279)

    Insert #T Values(487, 279)

    Select * from #t

    ------------- Your Code------------

    select Val , min(interval) as from_min ,max(interval) as to_min

    from #T

    group by val

    order by from_min

    /* Result

    279481487

    282484484

    The 279 Value has two separate intervals [which I need to identify]

    */

    ------------- My Code ---------------------

    With T as

    (Select Interval,Val,RN=Row_Number() over(partition by Val order by Interval) From #T

    ), Island as

    (Select Grp=Row_Number() Over(Order by Interval)-RN,Interval,Val From T)

    Select

    Start=Min(Interval),

    [End]=Max(Interval),

    Val

    From Island

    Group by Val,Grp

    order by Start

    /*Result

    481483279

    484484282

    486487279

    *&/

  • RBarryYoung (1/31/2009)


    Heh. You know Jeff, I think that this might change that article that you're writing, just a little bit. I bet you don't mind too much though. 😛

    Actually, I just found a fly in the ointment... the method isn't perfect and your first instinct on the Grp sometimes duplicating was spot on. I'll post the example that does the imperfection soon... I wanna play with it some more.

    Looks like the article might change, alright... as a possible recommendation not to do this.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (2/14/2009)


    RBarryYoung (1/31/2009)


    Heh. You know Jeff, I think that this might change that article that you're writing, just a little bit. I bet you don't mind too much though. 😛

    Actually, I just found a fly in the ointment... the method isn't perfect and your first instinct on the Grp sometimes duplicating was spot on. I'll post the example that does the imperfection soon... I wanna play with it some more.

    Looks like the article might change, alright... as a possible recommendation not to do this.

    Ok... here's the code that exposes the "fly"... maybe I'm doing something wrong, but if I did, I sure would like to know what it is.

    --===== Create a test table

    CREATE TABLE #Sample

    (

    SampleDate DATETIME,

    Item CHAR(1)

    )

    --===== Populate the test table with sample data

    INSERT INTO #Sample

    (SampleDate,Item)

    SELECT '2009-02-14 21:27:33.433','A' UNION ALL

    SELECT '2009-02-14 21:27:35.970','A' UNION ALL

    SELECT '2009-02-14 21:27:45.247','B' UNION ALL

    SELECT '2009-02-14 21:27:50.077','A' UNION ALL

    SELECT '2009-02-14 21:28:12.177','A' UNION ALL

    SELECT '2009-02-14 21:28:13.937','B' UNION ALL

    SELECT '2009-02-14 21:28:26.360','B' UNION ALL

    SELECT '2009-02-14 21:28:26.753','B' UNION ALL

    SELECT '2009-02-14 21:28:27.673','B' UNION ALL

    SELECT '2009-02-14 21:28:39.193','B' UNION ALL

    SELECT '2009-02-14 21:28:42.510','A' UNION ALL

    SELECT '2009-02-14 21:28:51.493','B' UNION ALL

    SELECT '2009-02-14 21:28:55.037','B' UNION ALL

    SELECT '2009-02-14 21:29:04.493','A' UNION ALL

    SELECT '2009-02-14 21:29:16.020','B' UNION ALL

    SELECT '2009-02-14 21:29:20.267','B' UNION ALL

    SELECT '2009-02-14 21:29:36.410','B' UNION ALL

    SELECT '2009-02-14 21:29:53.947','B' UNION ALL

    SELECT '2009-02-14 21:30:00.377','A' UNION ALL

    SELECT '2009-02-14 21:30:04.150','A' UNION ALL

    SELECT '2009-02-14 21:30:05.013','A' UNION ALL

    SELECT '2009-02-14 21:30:10.857','B' UNION ALL

    SELECT '2009-02-14 21:30:16.770','B' UNION ALL

    SELECT '2009-02-14 21:30:17.687','A' UNION ALL

    SELECT '2009-02-14 21:30:18.110','B' UNION ALL

    SELECT '2009-02-14 21:30:18.347','B' UNION ALL

    SELECT '2009-02-14 21:30:22.533','B' UNION ALL

    SELECT '2009-02-14 21:30:38.897','A' UNION ALL

    SELECT '2009-02-14 21:30:44.870','A' UNION ALL

    SELECT '2009-02-14 21:31:13.933','B'

    --===== Try to calculate the Groups in date order. It fails.

    -- Look at rows that contain 11 and 15-18 in the RowNumSampleDate

    -- column to observe the failure. Obviously, that will break

    -- a sequence by group.

    ;WITH

    cteRowNumItem AS

    (

    SELECT SampleDate,

    Item,

    RowNum = ROW_NUMBER() OVER(PARTITION BY Item ORDER BY SampleDate)

    FROM #Sample

    )

    SELECT SampleDate,

    Item,

    GroupNum = ROW_NUMBER() OVER(ORDER BY SampleDate)-RowNum,

    ROW_NUMBER() OVER(ORDER BY SampleDate) AS RowNumSampleDate,

    RowNum AS RowNumItem

    FROM cteRowNumItem

    ORDER BY SampleDate

    Man... it's too bad, too... I thought this was a really great method. I don't trust this method because of this breakage proof...

    I dunno... maybe I just didn't go far enough... maybe when I do the sequencing, it'll work if I include both the Item and the GroupNum in the partition.... lemme see.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yeah... that fixes it... or at least for this particular problem...

    ;WITH

    cteRowNumItem AS

    (

    SELECT SampleDate,

    Item,

    RowNum = ROW_NUMBER() OVER(PARTITION BY Item ORDER BY SampleDate)

    FROM #Sample

    )

    ,

    cteGroup AS

    (

    SELECT SampleDate,

    Item,

    GroupNum = ROW_NUMBER() OVER(ORDER BY SampleDate)-RowNum,

    ROW_NUMBER() OVER(ORDER BY SampleDate) AS RowNumSampleDate,

    RowNum AS RowNumItem

    FROM cteRowNumItem

    )

    SELECT *,

    ROW_NUMBER() OVER (PARTITION BY Item,GroupNum ORDER BY RowNumSampleDate) AS Sequence

    FROM cteGroup

    ORDER BY SampleDate

    I still don't trust it, yet. I've got some more testing to do with a larger data set.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ok... now I trust it... just gotta remember to NOT let the fact that GroupNums aren't unique across items but the combination of Item and GroupNum works as expected when partitioning to build a sequence by group...

    If anyone else wants to play with a large data set, here's the code I'm using to do my testing with...

    USE TempDB

    GO

    SELECT TOP 1000000

    RAND(CAST(NEWID() AS VARBINARY))+GETDATE() AS SampleDate,

    CHAR(ABS(CHECKSUM(NEWID()))%2+65) ITEM

    INTO Sample

    FROM Master.sys.SysColumns t1 CROSS JOIN Master.sys.SysColumns t2

    CREATE CLUSTERED INDEX IX_Sample_Item_SampleDate ON dbo.Sample

    (

    [ITEM] ASC,

    [SampleDate] ASC

    )

    GO

    --===== This works if you include both the Item and GroupNum in the partition...

    ;WITH

    cteRowNumItem AS

    (

    SELECT SampleDate,

    Item,

    RowNum = ROW_NUMBER() OVER(PARTITION BY Item ORDER BY SampleDate)

    FROM Sample

    )

    ,

    cteGroup AS

    (

    SELECT SampleDate,

    Item,

    GroupNum = ROW_NUMBER() OVER(ORDER BY SampleDate)-RowNum,

    ROW_NUMBER() OVER(ORDER BY SampleDate) AS RowNumSampleDate,

    RowNum AS RowNumItem

    FROM cteRowNumItem

    )

    SELECT SampleDate,Item,

    ROW_NUMBER() OVER (PARTITION BY Item,GroupNum ORDER BY RowNumSampleDate) AS Sequence,

    GroupNum,RowNumSampleDate,RowNumItem

    FROM cteGroup

    ORDER BY SampleDate

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (2/14/2009)


    Ok... now I trust it... just gotta remember to NOT let the fact that GroupNums aren't unique across items but the combination of Item and GroupNum works as expected when partitioning to build a sequence by group...

    I'm still on hiatus, but Yeah, I think that's the key...

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Jeff,

    You got me worried with your doubts about this ROW_NUMBER() difference technique.

    I used this technique in a proposed solution to this topic:

    http://www.sqlservercentral.com/Forums/Topic656492-338-1.aspx

    However, I think the proposed solution is valid, because the columns used in the PARTITION BY clause of the ROW_NUMBER() functions are also included in the final GROUP BY clause.

    Do you agree?

    Andrew

  • Looks fine to me, Andrew. Sorry for the panic, but I went into a panic when I saw that the group numbers weren't unique across items when I was setting up an experiment for an article on "data smears"... I just didn't take the code to the next step. The group numbers combined with the item number do make the correct grouping. It's a bit like Cod Liver Oil... tastes like crap but is so good for you. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • To add to this discussion, I think that a nested CTE can provide a solution for this. The problem is that nested CTEs are not supported, but there is a trick to work around this. Following your test cases, we can define a simple table as:

    create table result

    (

    id int,

    code char(1)

    )

    with these rows:

    id code

    8A

    6A

    5A

    4B

    3B

    2B

    1A

    Step 1: get a sequential order of the records. This will help us ensure there are no gaps on the IDs. This can be achieved by using row_number and for reusability wrapping it around a CTE

    WITH SEQ_LIST (seq,id,code) as

    (

    select row_number() over (order by id desc) as seq, id,code

    from result (nolock)

    )

    STEP 2: since we would like to reuse this with another CTE we can create a view:

    create view dbo.vwResult

    as

    WITH SEQ_LIST (seq,id,code) as

    (

    select row_number() over (order by id desc) as seq, id,code

    from result (nolock)

    )

    select * from SEQ_LIST

    The view returns this:

    Seq id Code (note how id has a gap from 8 to 6)

    18A

    26A

    35A

    44B (the seq should end here because B is found)

    53B

    62B

    71A

    STEP 3: We are not ready to use the view. The trick now is to find the most recent record. Once that is found, we need to select the records prior to the most recent one. Since the view is already in sequential order, we can do a select top 1 to get the latest record:

    SELECT top 1 seq, code, 1 as cnt

    FROM vwResult

    returns this row:

    seq code cnt

    1A1

    STEP 4: We now need to get the records prior to the last one with a matching code. This can be done by using a recursive CTE.

    with RESULT_CNT (seq, code,cnt) as

    (

    SELECT top 1 seq, code, 1 as cnt --get the most recent row

    FROM vwResult (NOLOCK)

    union all

    SELECT a.seq, a.code, 1 as cnt --gets the next row only if same code

    FROM vwResult (NOLOCK) a

    inner join RESULT_CNT b on a.seq = b.seq+1 and a.code = b.code

    )

    select code, sum(cnt) as Total from RESULT_CNT -- gets the repetition count

    group by code

    The key is to select the records with the same code and with the next sequence number (a.seq = b.seq+1) . The result for this test case is this:

    Seq Code Count

    1A1

    2A1

    3A1

    Seq 4 has code B, so the recursive query only returns three rows.

    We select all those sequential records with the same code. For each record, we provide the instance count of 1. This will help us do an aggregate by code at the end of the query. The aggreagate result is:

    Code Total

    A3

    I hope this helps.

  • Nice job, but recursion is actually slower than a cursor.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 31 through 44 (of 44 total)

You must be logged in to reply to this topic. Login to reply