Ordered group by

Question

Post reply

Ordered group by

_simon_

SSCrazy

Points: 2329
More actions
June 29, 2012 at 1:18 am

#263089

This is my original table:
declare @originalTable table (Username varchar(30), Date datetime, Code varchar(10))
insert into @originalTable
select 'User3', '2012-06-25', '007' union
select 'User1', '2012-06-20', '002' union
select 'User1', '2012-06-28', '010' union
select 'User1', '2012-06-21', '003' union
select 'User2', '2012-06-22', '004' union
select 'User2', '2012-06-23', '005' union
select 'User1', '2012-06-19', '001' union
select 'User3', '2012-06-26', '008' union
select 'User2', '2012-06-29', '011' union
select 'User3', '2012-06-24', '006' union
select 'User3', '2012-06-30', '012' union
select 'User1', '2012-06-27', '009'
I am looking for a select, which would give me this (in this exact order):
declare @expectedTable table (Username varchar(30), DateFrom datetime, DateTo datetime, CodeFrom varchar(10), CodeTo varchar(10))
insert into @expectedTable
select 'User1', '2012-06-19', '2012-06-21', '001', '003' union
select 'User2', '2012-06-22', '2012-06-23', '004', '005' union
select 'User3', '2012-06-24', '2012-06-26', '006', '008' union
select 'User1', '2012-06-27', '2012-06-28', '009', '010' union
select 'User2', '2012-06-29', '2012-06-29', '011', '011' union
select 'User3', '2012-06-30', '2012-06-30', '012', '012'
First I have to sort the original table by Code, I get this:
User12012-06-19 00:00:00.000001
User12012-06-20 00:00:00.000002
User12012-06-21 00:00:00.000003
User22012-06-22 00:00:00.000004
User22012-06-23 00:00:00.000005
User32012-06-24 00:00:00.000006
User32012-06-25 00:00:00.000007
User32012-06-26 00:00:00.000008
User12012-06-27 00:00:00.000009
User12012-06-28 00:00:00.000010
User22012-06-29 00:00:00.000011
User32012-06-30 00:00:00.000012
Now I should somehow group by Username and select min(Date), max(Date), min(Code), max(Code). But if I group by Username, then ALL records of an user are grouped together, but I need only those, who are together.
I guess I would need to add another column, like this:
1User12012-06-19 00:00:00.000001
1User12012-06-20 00:00:00.000002
1User12012-06-21 00:00:00.000003
2User22012-06-22 00:00:00.000004
2User22012-06-23 00:00:00.000005
3User32012-06-24 00:00:00.000006
3User32012-06-25 00:00:00.000007
3User32012-06-26 00:00:00.000008
4User12012-06-27 00:00:00.000009
4User12012-06-28 00:00:00.000010
5User22012-06-29 00:00:00.000011
6User32012-06-30 00:00:00.000012
Then I could group by this new column and Username and get expected results. How can I add this new column? I tried with ranking functions, but couldn't figure it out...

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

ColdCoffee SSC-Dedicated Points: 39972 More actions · Answer 1

This, perhaps?

; with cte as

(

select *

, rn1 = ROW_NUMBER() over(Order by ot.code)

- ROW_NUMBER() over(partition by ot.username Order by ot.code)

from @originalTable ot

)

select c.Username

,DateFrom = MIN (c.Date)

,Dateto = max (c.Date)

,CodeFrom = MIN (c.Code)

,CodeTo = max (c.Code)

from cte c

group by c.Username , c.rn1

order by CodeFrom

_simon_ SSCrazy Points: 2329 More actions · Answer 2

@ColdCoffee: no, I tried that. The query

select

row_number() over (order by Code) RowNumber,

row_number() over (partition by username order by code) RowNumberWithPartition,

row_number() over (order by Code) - row_number() over (partition by username order by code) Combined,

*

from @originalTable

order by Code

yields this:

110User12012-06-19 00:00:00.000001

220User12012-06-20 00:00:00.000002

330User12012-06-21 00:00:00.000003

413User22012-06-22 00:00:00.000004

523User22012-06-23 00:00:00.000005

615User32012-06-24 00:00:00.000006

725User32012-06-25 00:00:00.000007

835User32012-06-26 00:00:00.000008

945User12012-06-27 00:00:00.000009

1055User12012-06-28 00:00:00.000010

1138User22012-06-29 00:00:00.000011

1248User32012-06-30 00:00:00.000012

The row_number function with partition by is incorrect because it continues with counting instead of starting it over again when the user changes.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 3

SELECT Username,

DateFrom = MIN([Date]),

DateTo = MAX([Date]),

CodeFrom = MIN(Code),

CodeTo = MAX(Code)

FROM (

SELECT Username, [Date], Code,

rn = ROW_NUMBER() OVER(ORDER BY Username, Code),

dr = DENSE_RANK() OVER(ORDER BY Code)

FROM @originalTable

) d

GROUP BY Username, dr-rn

ORDER BY 2

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

_simon_ SSCrazy Points: 2329 More actions · Answer 4

@ChrisM@Work - that's it, thanks 🙂

Is there any specific reason why you used dense_rank function? I tried with two row_number functions and it still works ok.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 5

_simon_ (6/29/2012)
@ChrisM@Work - that's it, thanks 🙂
Is there any specific reason why you used dense_rank function? I tried with two row_number functions and it still works ok.

I'd be interested to see your version with two row_number functions. I used dense_rank because it's easier to work out -I can't remember how to do this stuff.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

_simon_ SSCrazy Points: 2329 More actions · Answer 6

It is exactly the same as yours, I just replaced dense_rank function with row_number:

SELECT Username,

DateFrom = MIN([Date]),

DateTo = MAX([Date]),

CodeFrom = MIN(Code),

CodeTo = MAX(Code)

FROM (

SELECT Username, [Date], Code,

rn = ROW_NUMBER() OVER(ORDER BY Username, Code),

dr = ROW_NUMBER() OVER(ORDER BY Code)

FROM @originalTable

) d

GROUP BY Username, dr-rn

ORDER BY 2

I tested it on large data and the results is the same in both cases.

ColdCoffee SSC-Dedicated Points: 39972 More actions · Answer 7

_simon_ (6/29/2012)
The row_number function with partition by is incorrect because it continues with counting instead of starting it over again when the user changes.

My query produces the exact same output that you asked for. Did you ask for a extra colum with increasing numbers or did you want you result to be Grouped by a the order of code with recurring Username?

_simon_ SSCrazy Points: 2329 More actions · Answer 8

I want my results to be grouped, your query is totally correct. I was just curious if there is any reason behind the decision, that you used dense_rank() instead of row_number().

These two queries returns the same result:

SELECT Username,

DateFrom = MIN([Date]),

DateTo = MAX([Date]),

CodeFrom = MIN(Code),

CodeTo = MAX(Code)

FROM (

SELECT Username, [Date], Code,

rn = ROW_NUMBER() OVER(ORDER BY Username, Code),

dr = DENSE_RANK() OVER(ORDER BY Code)

FROM @originalTable

) d

GROUP BY Username, dr-rn

ORDER BY 2

SELECT Username,

DateFrom = MIN([Date]),

DateTo = MAX([Date]),

CodeFrom = MIN(Code),

CodeTo = MAX(Code)

FROM (

SELECT Username, [Date], Code,

rn = ROW_NUMBER() OVER(ORDER BY Username, Code),

dr = ROW_NUMBER() OVER(ORDER BY Code)

FROM @originalTable

) d

GROUP BY Username, dr-rn

ORDER BY 2

Edit - but it doesn't matter really, main thing is, that it just works 🙂

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 9

_simon_ (7/1/2012)
I want my results to be grouped, your query is totally correct. I was just curious if there is any reason behind the decision, that you used dense_rank() instead of row_number().
These two queries returns the same result:
SELECT Username,
DateFrom = MIN([Date]),
DateTo = MAX([Date]),
CodeFrom = MIN(Code),
CodeTo = MAX(Code)
FROM (
SELECT Username, [Date], Code,
rn = ROW_NUMBER() OVER(ORDER BY Username, Code),
dr = DENSE_RANK() OVER(ORDER BY Code)
FROM @originalTable
) d
GROUP BY Username, dr-rn
ORDER BY 2
SELECT Username,
DateFrom = MIN([Date]),
DateTo = MAX([Date]),
CodeFrom = MIN(Code),
CodeTo = MAX(Code)
FROM (
SELECT Username, [Date], Code,
rn = ROW_NUMBER() OVER(ORDER BY Username, Code),
dr = ROW_NUMBER() OVER(ORDER BY Code)
FROM @originalTable
) d
GROUP BY Username, dr-rn
ORDER BY 2
Edit - but it doesn't matter really, main thing is, that it just works 🙂

No other reason than habit 😉

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden