Ordered group by

  • This is my original table:

    declare @originalTable table (Username varchar(30), Date datetime, Code varchar(10))

    insert into @originalTable

    select 'User3', '2012-06-25', '007' union

    select 'User1', '2012-06-20', '002' union

    select 'User1', '2012-06-28', '010' union

    select 'User1', '2012-06-21', '003' union

    select 'User2', '2012-06-22', '004' union

    select 'User2', '2012-06-23', '005' union

    select 'User1', '2012-06-19', '001' union

    select 'User3', '2012-06-26', '008' union

    select 'User2', '2012-06-29', '011' union

    select 'User3', '2012-06-24', '006' union

    select 'User3', '2012-06-30', '012' union

    select 'User1', '2012-06-27', '009'

    I am looking for a select, which would give me this (in this exact order):

    declare @expectedTable table (Username varchar(30), DateFrom datetime, DateTo datetime, CodeFrom varchar(10), CodeTo varchar(10))

    insert into @expectedTable

    select 'User1', '2012-06-19', '2012-06-21', '001', '003' union

    select 'User2', '2012-06-22', '2012-06-23', '004', '005' union

    select 'User3', '2012-06-24', '2012-06-26', '006', '008' union

    select 'User1', '2012-06-27', '2012-06-28', '009', '010' union

    select 'User2', '2012-06-29', '2012-06-29', '011', '011' union

    select 'User3', '2012-06-30', '2012-06-30', '012', '012'

    First I have to sort the original table by Code, I get this:

    User12012-06-19 00:00:00.000001

    User12012-06-20 00:00:00.000002

    User12012-06-21 00:00:00.000003

    User22012-06-22 00:00:00.000004

    User22012-06-23 00:00:00.000005

    User32012-06-24 00:00:00.000006

    User32012-06-25 00:00:00.000007

    User32012-06-26 00:00:00.000008

    User12012-06-27 00:00:00.000009

    User12012-06-28 00:00:00.000010

    User22012-06-29 00:00:00.000011

    User32012-06-30 00:00:00.000012

    Now I should somehow group by Username and select min(Date), max(Date), min(Code), max(Code). But if I group by Username, then ALL records of an user are grouped together, but I need only those, who are together.

    I guess I would need to add another column, like this:

    1User12012-06-19 00:00:00.000001

    1User12012-06-20 00:00:00.000002

    1User12012-06-21 00:00:00.000003

    2User22012-06-22 00:00:00.000004

    2User22012-06-23 00:00:00.000005

    3User32012-06-24 00:00:00.000006

    3User32012-06-25 00:00:00.000007

    3User32012-06-26 00:00:00.000008

    4User12012-06-27 00:00:00.000009

    4User12012-06-28 00:00:00.000010

    5User22012-06-29 00:00:00.000011

    6User32012-06-30 00:00:00.000012

    Then I could group by this new column and Username and get expected results. How can I add this new column? I tried with ranking functions, but couldn't figure it out...

  • This, perhaps?

    ; with cte as

    (

    select *

    , rn1 = ROW_NUMBER() over(Order by ot.code)

    - ROW_NUMBER() over(partition by ot.username Order by ot.code)

    from @originalTable ot

    )

    select c.Username

    ,DateFrom = MIN (c.Date)

    ,Dateto = max (c.Date)

    ,CodeFrom = MIN (c.Code)

    ,CodeTo = max (c.Code)

    from cte c

    group by c.Username , c.rn1

    order by CodeFrom

  • @ColdCoffee: no, I tried that. The query

    select

    row_number() over (order by Code) RowNumber,

    row_number() over (partition by username order by code) RowNumberWithPartition,

    row_number() over (order by Code) - row_number() over (partition by username order by code) Combined,

    *

    from @originalTable

    order by Code

    yields this:

    110User12012-06-19 00:00:00.000001

    220User12012-06-20 00:00:00.000002

    330User12012-06-21 00:00:00.000003

    413User22012-06-22 00:00:00.000004

    523User22012-06-23 00:00:00.000005

    615User32012-06-24 00:00:00.000006

    725User32012-06-25 00:00:00.000007

    835User32012-06-26 00:00:00.000008

    945User12012-06-27 00:00:00.000009

    1055User12012-06-28 00:00:00.000010

    1138User22012-06-29 00:00:00.000011

    1248User32012-06-30 00:00:00.000012

    The row_number function with partition by is incorrect because it continues with counting instead of starting it over again when the user changes.

  • SELECT Username,

    DateFrom = MIN([Date]),

    DateTo = MAX([Date]),

    CodeFrom = MIN(Code),

    CodeTo = MAX(Code)

    FROM (

    SELECT Username, [Date], Code,

    rn = ROW_NUMBER() OVER(ORDER BY Username, Code),

    dr = DENSE_RANK() OVER(ORDER BY Code)

    FROM @originalTable

    ) d

    GROUP BY Username, dr-rn

    ORDER BY 2

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • @ChrisM@Work - that's it, thanks πŸ™‚

    Is there any specific reason why you used dense_rank function? I tried with two row_number functions and it still works ok.

  • _simon_ (6/29/2012)


    @ChrisM@Work - that's it, thanks πŸ™‚

    Is there any specific reason why you used dense_rank function? I tried with two row_number functions and it still works ok.

    I'd be interested to see your version with two row_number functions. I used dense_rank because it's easier to work out -I can't remember how to do this stuff.

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • It is exactly the same as yours, I just replaced dense_rank function with row_number:

    SELECT Username,

    DateFrom = MIN([Date]),

    DateTo = MAX([Date]),

    CodeFrom = MIN(Code),

    CodeTo = MAX(Code)

    FROM (

    SELECT Username, [Date], Code,

    rn = ROW_NUMBER() OVER(ORDER BY Username, Code),

    dr = ROW_NUMBER() OVER(ORDER BY Code)

    FROM @originalTable

    ) d

    GROUP BY Username, dr-rn

    ORDER BY 2

    I tested it on large data and the results is the same in both cases.

  • _simon_ (6/29/2012)

    The row_number function with partition by is incorrect because it continues with counting instead of starting it over again when the user changes.

    My query produces the exact same output that you asked for. Did you ask for a extra colum with increasing numbers or did you want you result to be Grouped by a the order of code with recurring Username?

  • I want my results to be grouped, your query is totally correct. I was just curious if there is any reason behind the decision, that you used dense_rank() instead of row_number().

    These two queries returns the same result:

    SELECT Username,

    DateFrom = MIN([Date]),

    DateTo = MAX([Date]),

    CodeFrom = MIN(Code),

    CodeTo = MAX(Code)

    FROM (

    SELECT Username, [Date], Code,

    rn = ROW_NUMBER() OVER(ORDER BY Username, Code),

    dr = DENSE_RANK() OVER(ORDER BY Code)

    FROM @originalTable

    ) d

    GROUP BY Username, dr-rn

    ORDER BY 2

    SELECT Username,

    DateFrom = MIN([Date]),

    DateTo = MAX([Date]),

    CodeFrom = MIN(Code),

    CodeTo = MAX(Code)

    FROM (

    SELECT Username, [Date], Code,

    rn = ROW_NUMBER() OVER(ORDER BY Username, Code),

    dr = ROW_NUMBER() OVER(ORDER BY Code)

    FROM @originalTable

    ) d

    GROUP BY Username, dr-rn

    ORDER BY 2

    Edit - but it doesn't matter really, main thing is, that it just works πŸ™‚

  • _simon_ (7/1/2012)


    I want my results to be grouped, your query is totally correct. I was just curious if there is any reason behind the decision, that you used dense_rank() instead of row_number().

    These two queries returns the same result:

    SELECT Username,

    DateFrom = MIN([Date]),

    DateTo = MAX([Date]),

    CodeFrom = MIN(Code),

    CodeTo = MAX(Code)

    FROM (

    SELECT Username, [Date], Code,

    rn = ROW_NUMBER() OVER(ORDER BY Username, Code),

    dr = DENSE_RANK() OVER(ORDER BY Code)

    FROM @originalTable

    ) d

    GROUP BY Username, dr-rn

    ORDER BY 2

    SELECT Username,

    DateFrom = MIN([Date]),

    DateTo = MAX([Date]),

    CodeFrom = MIN(Code),

    CodeTo = MAX(Code)

    FROM (

    SELECT Username, [Date], Code,

    rn = ROW_NUMBER() OVER(ORDER BY Username, Code),

    dr = ROW_NUMBER() OVER(ORDER BY Code)

    FROM @originalTable

    ) d

    GROUP BY Username, dr-rn

    ORDER BY 2

    Edit - but it doesn't matter really, main thing is, that it just works πŸ™‚

    No other reason than habit πŸ˜‰

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply