how to get consecutive records grouped

  • hi all,

    i'm trying to group consecutive record, anyone knows how to do this ?

    this is my sample data:

    /* what i have */

    select dt, cat

    from

    (

    select '2009-09-09 14:50:02.000' as dt,'OT' as cat

    union all

    select '2009-09-09 14:55:44.000' as dt,'OT' as cat

    union all

    select '2009-09-09 15:10:55.000' as dt,'OT' as cat

    union all

    select '2009-09-09 15:30:00.000' as dt,'OT' as cat

    union all

    select '2009-09-09 16:16:41.000' as dt,'OD' as cat

    union all

    select '2009-09-09 16:26:37.000' as dt,'OD' as cat

    union all

    select '2009-09-09 16:32:43.000' as dt,'OS' as cat

    union all

    select '2009-09-09 17:56:37.000' as dt,'OT' as cat

    union all

    select '2009-09-09 17:56:48.000' as dt,'OT' as cat

    )

    sub

    /* what i want */

    select dt, cat, gr

    from

    (

    select '2009-09-09 14:50:02.000' as dt,'OT' as cat,1 as gr

    union all

    select '2009-09-09 14:55:44.000' as dt,'OT' as cat,1 as gr

    union all

    select '2009-09-09 15:10:55.000' as dt,'OT' as cat,1 as gr

    union all

    select '2009-09-09 15:30:00.000' as dt,'OT' as cat,1 as gr

    union all

    select '2009-09-09 16:16:41.000' as dt,'OD' as cat,2 as gr

    union all

    select '2009-09-09 16:26:37.000' as dt,'OD' as cat,2 as gr

    union all

    select '2009-09-09 16:32:43.000' as dt,'OS' as cat,3 as gr

    union all

    select '2009-09-09 17:56:37.000' as dt,'OT' as cat,4 as gr

    union all

    select '2009-09-09 17:56:48.000' as dt,'OT' as cat,4 as gr

    )

    sub

    so basically, table 1 is my data, and table 2 is what i want table 1 to produce.

    so far, the best i can come up is

    with table1 as

    (

    select dt, cat

    from

    (

    select '2009-09-09 14:50:02.000' as dt,'OT' as cat

    union all

    select '2009-09-09 14:55:44.000' as dt,'OT' as cat

    union all

    select '2009-09-09 15:10:55.000' as dt,'OT' as cat

    union all

    select '2009-09-09 15:30:00.000' as dt,'OT' as cat

    union all

    select '2009-09-09 16:16:41.000' as dt,'OD' as cat

    union all

    select '2009-09-09 16:26:37.000' as dt,'OD' as cat

    union all

    select '2009-09-09 16:32:43.000' as dt,'OS' as cat

    union all

    select '2009-09-09 17:56:37.000' as dt,'OT' as cat

    union all

    select '2009-09-09 17:56:48.000' as dt,'OT' as cat

    )

    sub

    )

    select

    row_number() over (partition by cat order by dt) as rn1,

    row_number() over (order by dt) as rn2,

    row_number() over (order by dt) - row_number() over (partition by cat order by dt) as rn,

    dt,

    cat

    from

    table1

    then group by rn. it works fine with small sample data, but when data gets bigger, the rn can get wrong result.

    anybody has come across this problem before ? or has a solution for this ?

    thank you

  • Hello,

    I guess what your are looking for is t-sql dense_rank() function

    select

    dense_rank() over (order by cat),

    dt,

    cat

    from

    table1

  • It's quite difficult to give a solution without actually looking at the records which are causing the problem.

    My wild guess would be use a dense_rank instead of row_number.

    select

    dense_rank() over (partition by cat order by dt) as rn1,

    row_number() over (order by dt) as rn2,

    row_number() over (order by dt) - dense_rank() over (partition by cat order by dt) as rn,

    dt,

    cat

    from

    table1

    --------------------------------------------------------------------------------------------------
    I am just an another naive wannabe DBA trying to learn SQL Server

  • Eralper (11/22/2010)


    Hello,

    I guess what your are looking for is t-sql dense_rank() function

    select

    dense_rank() over (order by cat),

    dt,

    cat

    from

    table1

    Oops sorry didnt know you already had posted the solution.

    --------------------------------------------------------------------------------------------------
    I am just an another naive wannabe DBA trying to learn SQL Server

  • Sachin Nandanwar (11/22/2010)


    It's quite difficult to give a solution without actually looking at the records which are causing the problem.

    What on Earth are you talking about? The OP gave both the sample data and the desired result. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sometimes happens to me too, but I believe it is better to mention the correct solution more than once

    Of course if it is really correct solution

  • Justin,

    Sachin's code almost did it. It just needed a push over the edge to get it where you wanted it. Here's the push...

    WITH

    cteGroups AS

    (

    SELECT dt,

    cat,

    Sequence = ROW_NUMBER() OVER (ORDER BY dt, cat),

    TheGroup = ROW_NUMBER() OVER (ORDER BY dt, cat)

    - ROW_NUMBER() OVER (PARTITION BY cat ORDER BY dt, cat)

    FROM dbo.Table1

    )

    ,

    cteGrouped AS

    (

    SELECT dt,

    cat,

    OrderedGroup = Sequence - ROW_NUMBER() OVER (PARTITION BY TheGroup ORDER BY Sequence)

    FROM cteGroups

    )

    SELECT dt,

    cat,

    gr = DENSE_RANK() OVER (ORDER BY OrderedGroup)

    FROM cteGrouped

    ORDER BY dt, cat

    And very well done on your original post. It's real handy having BOTH the test data and the expected result in such a readily usable format. Thanks for taking the time.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • thank you for the reply guys, however dense_rank does not solve it. i have a better data that shows my problem:

    with table1 as

    (

    select dt, cat

    from

    (

    select '2009-09-09 06:18:16' as dt,'OD' as cat

    union all

    select '2009-09-09 06:18:51' as dt,'OD' as cat

    union all

    select '2009-09-09 06:18:55' as dt,'OD' as cat

    union all

    select '2009-09-09 06:35:31' as dt,'OT' as cat

    union all

    select '2009-09-09 06:45:25' as dt,'OD' as cat

    union all

    select '2009-09-09 06:45:37' as dt,'OT' as cat

    union all

    select '2009-09-09 06:46:35' as dt,'OD' as cat

    union all

    select '2009-09-09 06:46:38' as dt,'OT' as cat

    union all

    select '2009-09-09 06:47:16' as dt,'OD' as cat

    union all

    select '2009-09-09 06:47:20' as dt,'OT' as cat /*rn 6 1st occurrence*/

    union all

    select '2009-09-09 06:57:27' as dt,'OD' as cat

    union all

    select '2009-09-09 06:57:36' as dt,'OT' as cat

    union all

    select '2009-09-09 06:58:53' as dt,'OD' as cat

    union all

    select '2009-09-09 06:58:56' as dt,'OT' as cat

    union all

    select '2009-09-09 06:59:30' as dt,'OD' as cat /*rn 6 2nd occurrence*/

    union all

    select '2009-09-09 06:59:39' as dt,'OT' as cat

    union all

    select '2009-09-09 07:11:10' as dt,'OD' as cat

    union all

    select '2009-09-09 07:11:17' as dt,'OT' as cat

    union all

    select '2009-09-09 07:11:49' as dt,'OD' as cat

    union all

    select '2009-09-09 07:11:53' as dt,'OT' as cat /*rn 11 1st occurrence */

    union all

    select '2009-09-09 07:12:27' as dt,'OD' as cat

    union all

    select '2009-09-09 07:14:31' as dt,'OS' as cat

    union all

    select '2009-09-09 07:26:29' as dt,'OT' as cat

    union all

    select '2009-09-09 07:26:39' as dt,'OD' as cat /*rn 11 2nd occurrence*/

    union all

    select '2009-09-09 07:28:21' as dt,'OT' as cat

    union all

    select '2009-09-09 07:40:17' as dt,'OD' as cat

    union all

    select '2009-09-09 07:40:25' as dt,'OT' as cat

    ) sub

    )

    select

    dense_rank() over (order by cat) as rcat,

    row_number() over (partition by cat order by dt) as rn1,

    row_number() over (order by dt) as rn2,

    row_number() over (order by dt) - row_number() over (partition by cat order by dt) as rn,

    dt,

    cat

    from

    table1

    order by

    dt

    rn 6 and 11 occurs more than once because of my flaw logic of grouping.

    what i want is

    with table1 as

    (

    select dt, cat, ds

    from

    (

    select '2009-09-09 06:18:16' as dt,'OD' as cat, 1 as ds

    union all

    select '2009-09-09 06:18:51' as dt,'OD' as cat, 1 as ds

    union all

    select '2009-09-09 06:18:55' as dt,'OD' as cat, 1 as ds

    union all

    select '2009-09-09 06:35:31' as dt,'OT' as cat, 2 as ds

    union all

    select '2009-09-09 06:45:25' as dt,'OD' as cat, 3 as ds

    union all

    select '2009-09-09 06:45:37' as dt,'OT' as cat, 4 as ds

    union all

    select '2009-09-09 06:46:35' as dt,'OD' as cat, 5 as ds

    union all

    select '2009-09-09 06:46:38' as dt,'OT' as cat, 6 as ds

    union all

    select '2009-09-09 06:47:16' as dt,'OD' as cat, 7 as ds

    union all

    select '2009-09-09 06:47:20' as dt,'OT' as cat, 8 as ds

    union all

    select '2009-09-09 06:57:27' as dt,'OD' as cat, 9 as ds

    union all

    select '2009-09-09 06:57:36' as dt,'OT' as cat, 10 as ds

    union all

    select '2009-09-09 06:58:53' as dt,'OD' as cat, 11 as ds

    union all

    select '2009-09-09 06:58:56' as dt,'OT' as cat, 12 as ds

    union all

    select '2009-09-09 06:59:30' as dt,'OD' as cat, 13 as ds

    union all

    select '2009-09-09 06:59:39' as dt,'OT' as cat, 14 as ds

    union all

    select '2009-09-09 07:11:10' as dt,'OD' as cat, 15 as ds

    union all

    select '2009-09-09 07:11:17' as dt,'OT' as cat, 16 as ds

    union all

    select '2009-09-09 07:11:49' as dt,'OD' as cat, 17 as ds

    union all

    select '2009-09-09 07:11:53' as dt,'OT' as cat, 18 as ds

    union all

    select '2009-09-09 07:12:27' as dt,'OD' as cat, 19 as ds

    union all

    select '2009-09-09 07:14:31' as dt,'OS' as cat, 20 as ds

    union all

    select '2009-09-09 07:26:29' as dt,'OT' as cat, 21 as ds

    union all

    select '2009-09-09 07:26:39' as dt,'OD' as cat, 22 as ds

    union all

    select '2009-09-09 07:28:21' as dt,'OT' as cat, 23 as ds

    union all

    select '2009-09-09 07:40:17' as dt,'OD' as cat, 24 as ds

    union all

    select '2009-09-09 07:40:25' as dt,'OT' as cat, 25 as ds

    ) sub

    )

    select

    ds,

    dt,

    cat

    from

    table1

    order by

    dt

    basically what i want is (when thinking like a cursor), create a new group if current category <> prev category

    thank you

  • thanks Jeff for your script, however, it gives same problem as my first grouping logic.

  • Hi the_justin!

    Maybe I am getting this wrong, but from your second output it seems that you want to group dt and cat with minutes precision on dt.

    So these are in the same group:

    || 2009-09-09 06:18:16 || OD ||

    || 2009-09-09 06:18:51 || OD ||

    || 2009-09-09 06:18:55 || OD ||

    but not these:

    || 2009-09-09 06:46:35 || OD ||

    || 2009-09-09 06:46:38 || OT ||

    Is that right?

    Cheers

  • A minor mod to Jeff's code fixes it

    WITH

    cteGroups AS

    (

    SELECT dt,

    cat,

    Sequence = ROW_NUMBER() OVER (ORDER BY dt, cat),

    TheGroup = ROW_NUMBER() OVER (ORDER BY dt, cat)

    - ROW_NUMBER() OVER (PARTITION BY cat ORDER BY dt, cat)

    FROM dbo.Table1

    )

    ,

    cteGrouped AS

    (

    SELECT dt,

    cat,

    OrderedGroup = Sequence - ROW_NUMBER() OVER (PARTITION BY TheGroup[highlight=""],cat[/highlight] ORDER BY Sequence)

    FROM cteGroups

    )

    SELECT dt,

    cat,

    gr = DENSE_RANK() OVER (ORDER BY OrderedGroup)

    FROM cteGrouped

    ORDER BY dt, cat

    Change PARTITION BY TheGroup

    to PARTITION BY TheGroup,cat

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Hmmm , possibly not the most efficient but does the job i think 😉

    with table1 as

    (

    select dt, cat

    from

    (

    select '2009-09-09 06:18:16' as dt,'OD' as cat

    union all

    select '2009-09-09 06:18:51' as dt,'OD' as cat

    union all

    select '2009-09-09 06:18:55' as dt,'OD' as cat

    union all

    select '2009-09-09 06:35:31' as dt,'OT' as cat

    union all

    select '2009-09-09 06:45:25' as dt,'OD' as cat

    union all

    select '2009-09-09 06:45:37' as dt,'OT' as cat

    union all

    select '2009-09-09 06:46:35' as dt,'OD' as cat

    union all

    select '2009-09-09 06:46:38' as dt,'OT' as cat

    union all

    select '2009-09-09 06:47:16' as dt,'OD' as cat

    union all

    select '2009-09-09 06:47:20' as dt,'OT' as cat /*rn 6 1st occurrence*/

    union all

    select '2009-09-09 06:57:27' as dt,'OD' as cat

    union all

    select '2009-09-09 06:57:36' as dt,'OT' as cat

    union all

    select '2009-09-09 06:58:53' as dt,'OD' as cat

    union all

    select '2009-09-09 06:58:56' as dt,'OT' as cat

    union all

    select '2009-09-09 06:59:30' as dt,'OD' as cat /*rn 6 2nd occurrence*/

    union all

    select '2009-09-09 06:59:39' as dt,'OT' as cat

    union all

    select '2009-09-09 07:11:10' as dt,'OD' as cat

    union all

    select '2009-09-09 07:11:17' as dt,'OT' as cat

    union all

    select '2009-09-09 07:11:49' as dt,'OD' as cat

    union all

    select '2009-09-09 07:11:53' as dt,'OT' as cat /*rn 11 1st occurrence */

    union all

    select '2009-09-09 07:12:27' as dt,'OD' as cat

    union all

    select '2009-09-09 07:14:31' as dt,'OS' as cat

    union all

    select '2009-09-09 07:26:29' as dt,'OT' as cat

    union all

    select '2009-09-09 07:26:39' as dt,'OD' as cat /*rn 11 2nd occurrence*/

    union all

    select '2009-09-09 07:28:21' as dt,'OT' as cat

    union all

    select '2009-09-09 07:40:17' as dt,'OD' as cat

    union all

    select '2009-09-09 07:40:25' as dt,'OT' as cat

    ) sub

    ),

    withRown

    as

    (

    select *, row_number() over (order by dt) as rown,

    row_number() over (order by cat,dt desc) as grp

    from table1

    ),

    withGrouper

    as(

    select *,grp+rown as Grouper from withRown

    ),

    withmin

    as

    (

    select cat,Grouper,min(dt) as mindt,max(dt) as maxdt

    from withGrouper

    group by cat,Grouper

    ),

    withgrouping

    as

    (

    select *,Row_number() over(order by mindt) as Grouping from withmin

    )

    Select table1.cat,table1.dt,Grouping

    from withgrouping

    join table1

    on table1.dt between withgrouping.mindt and withgrouping.maxdt

    and table1.cat = withgrouping.cat

    order by grouping



    Clear Sky SQL
    My Blog[/url]

  • This sounds like it could be solved by the "Quirky Update" method to me - as you need to compare each row to the previous row.

    See Jeff's article about it here : http://www.sqlservercentral.com/articles/T-SQL/68467

    And also see this post about how to make it a touch safer : http://www.sqlservercentral.com/Forums/FindPost981258.aspx

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • mister.magoo (11/23/2010)


    This sounds like it could be solved by the "Quirky Update" method to me - as you need to compare each row to the previous row.

    See Jeff's article about it here : http://www.sqlservercentral.com/articles/T-SQL/68467

    And also see this post about how to make it a touch safer : http://www.sqlservercentral.com/Forums/FindPost981258.aspx

    Heh... Ironic... I'm usually the one to push my own articles. I was actually tempted to use the quirky update but thought I'd take a different route because I thought I'd run into problems with the OP on using a Temp Table or modifying the original table. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • @justin,

    Are you all set now?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Viewing 15 posts - 1 through 15 (of 16 total)

    You must be logged in to reply to this topic. Login to reply