Help with ranking functions

Question

Post reply

Help with ranking functions

shannonjk

SSCarpal Tunnel

Points: 4286

January 26, 2017 at 12:39 pm

#317963

Greetings all :).

Ranking/Row functions always seem to throw me off. Hoping I can get a hand in what is probably very simple :).

I have a batch of data that I need to have an increasing row number for a set of items that change. For some reason, when I am using the row and rank functions, I always seem to get the opposite.

I have put together some sample data, the first 4 fields are the data fields I have available to me, and this is how the data is characterized. The 5th column "Rank should be" is the result that I am trying to get.

As you can see, the ordering should be by ID and Date, and then when there are changes in field 1 and 2, the number should increase. When field 1 and 2 are the same as the last, they should stay the same number.

ID	date	field 1	field 2	Rank should be
1	2/26/2015 5:58	test1	test1	1
1	2/26/2015 16:40	test2	test1	2
1	3/2/2015 1:58	test2	test2	3
1	3/4/2015 15:19	test2	test2	3
1	3/9/2015 13:20	test1	test2	4
1	3/9/2015 15:33	test2	test2	5
1	3/10/2015 10:23	test2	test2	5
1	3/10/2015 15:18	test3	test2	6
1	3/10/2015 18:05	test3	test2	6
1	3/12/2015 0:13	test3	test2	6
2	2/26/2015 5:58	test1	test1	1
2	2/26/2015 16:40	test2	test1	2
2	3/2/2015 1:58	test2	test2	3
2	3/4/2015 15:19	test2	test2	3
2	3/9/2015 13:20	test1	test2	4
2	3/9/2015 15:33	test2	test2	5
2	3/10/2015 10:23	test2	test2	5
2	3/10/2015 15:18	test3	test2	6
2	3/10/2015 18:05	test3	test2	6
2	3/12/2015 0:13	test3	test2	6

Link to my blog http://notyelf.com/

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Andrew P SSCarpal Tunnel Points: 4649 More actions · Answer 1

Here's a quick attempt at doing what I think you're after - does this do as expected?
select mytable.ID , mytable.date , mytable.[field 1] , mytable.[field 2] , dense_rank() over(partition by mytable.ID, mytable.[date] order by mytable.[field 1], mytable.[field 2]) as [Rank should be] from mytable

Andrew

Chris Harshman SSC-Forever Points: 42192 More actions · Answer 2

Andrew P - Thursday, January 26, 2017 1:01 PM
Here's a quick attempt at doing what I think you're after - does this do as expected?
select mytable.ID , mytable.date , mytable.[field 1] , mytable.[field 2] , dense_rank() over(partition by mytable.ID, mytable.[date] order by mytable.[field 1], mytable.[field 2]) as [Rank should be] from mytable
Andrew

I'm thinking we can't include the date column in the partition by because it would reset ranking even if the ID, Field1 and Field2 are the same, maybe:
DENSE_RANK() OVER (PARTITION BY ID ORDER BY Field1, Field2) AS rank_should_be

shannonjk SSCarpal Tunnel Points: 4286 More actions · Answer 3

Thanks for the response so far guys :).

Andrew, that simply returns 1 for everything.

Chris, that gets close, but it is ordered alphabetically by field 1 and 2, but we still need it ranked chronologically.

Link to my blog http://notyelf.com/

drew.allen SSC Guru Points: 76996 More actions · Answer 4

ROW_NUMBER(), RANK(), and DENSE_RANK() produce a monotonically increasing series within a partition with respect to the ORDER BY expressions. That is, if your sort by partition expressions and then the ORDER BY expressions, the ROW_NUMBER(), RANK(), and DENSE_RANK() results will be in order as well. This means that if your produces a result where the desired ROW_NUMBER(), RANK(), or DENSE_RANK() are not in order, then you can't use that order in your ROW_NUMBER(), RANK(), or DENSE_RANK() function. Since you want repeats and you don't want to skip numbers, you want a DENSE_RANK().

If you sort your data by ID, field_1, field_2, then your desired DENSE_RANK() is not in order, so your order by cannot be field_1, field_2.
If you sort your data by ID, field_2, field_1, then your desired DENSE_RANK() is still not in order, so your order by cannot be field_2, field_1.
The only sort orders where the desired RANK() is in order start with ID, [Date], but you do not want to use [Date] as the leading expression in your rank, because you do not want the rank to increase every time that the date changes.

The ranking functions will not give you what you want. You'll need to look at some other function. Specifically, LAG/LEAD are designed to allow you to access the values from the previous/next records based on the partition and sort order.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

shannonjk SSCarpal Tunnel Points: 4286 More actions · Answer 5

drew.allen - Thursday, January 26, 2017 1:21 PM
ROW_NUMBER(), RANK(), and DENSE_RANK() produce a monotonically increasing series within a partition with respect to the ORDER BY expressions. That is, if your sort by partition expressions and then the ORDER BY expressions, the ROW_NUMBER(), RANK(), and DENSE_RANK() results will be in order as well. This means that if your produces a result where the desired ROW_NUMBER(), RANK(), or DENSE_RANK() are not in order, then you can't use that order in your ROW_NUMBER(), RANK(), or DENSE_RANK() function. Since you want repeats and you don't want to skip numbers, you want a DENSE_RANK().
If you sort your data by ID, field_1, field_2, then your desired DENSE_RANK() is not in order, so your order by cannot be field_1, field_2.
If you sort your data by ID, field_2, field_1, then your desired DENSE_RANK() is still not in order, so your order by cannot be field_2, field_1.
The only sort orders where the desired RANK() is in order start with ID, [Date], but you do not want to use [Date] as the leading expression in your rank, because you do not want the rank to increase every time that the date changes.
The ranking functions will not give you what you want. You'll need to look at some other function. Specifically, LAG/LEAD are designed to allow you to access the values from the previous/next records based on the partition and sort order.
Drew

Thank you Drew, I think that concisely explained what I had feared; the fact that it wasn't possible :).

I am not too familiar with the LAG/LEAD functions. Do you know how I would go about using that in order to achieve what I am trying to do here?

Link to my blog http://notyelf.com/

drew.allen SSC Guru Points: 76996 More actions · Answer 6

Here is a solution using LAG.

BEGIN TRY SELECT * INTO #mytable FROM ( VALUES (1, CAST('2/26/2015 5:58' AS DATETIME), 'test1', 'test1', 1), (1, '2/26/2015 16:40', 'test2', 'test1', 2), (1, '3/2/2015 1:58', 'test2', 'test2', 3), (1, '3/4/2015 15:19', 'test2', 'test2', 3), (1, '3/9/2015 13:20', 'test1', 'test2', 4), (1, '3/9/2015 15:33', 'test2', 'test2', 5), (1, '3/10/2015 10:23', 'test2', 'test2', 5), (1, '3/10/2015 15:18', 'test3', 'test2', 6), (1, '3/10/2015 18:05', 'test3', 'test2', 6), (1, '3/12/2015 0:13', 'test3', 'test2', 6), (2, '2/26/2015 5:58', 'test1', 'test1', 1), (2, '2/26/2015 16:40', 'test2', 'test1', 2), (2, '3/2/2015 1:58', 'test2', 'test2', 3), (2, '3/4/2015 15:19', 'test2', 'test2', 3), (2, '3/9/2015 13:20', 'test1', 'test2', 4), (2, '3/9/2015 15:33', 'test2', 'test2', 5), (2, '3/10/2015 10:23', 'test2', 'test2', 5), (2, '3/10/2015 15:18', 'test3', 'test2', 6), (2, '3/10/2015 18:05', 'test3', 'test2', 6), (2, '3/12/2015 0:13', 'test3', 'test2', 6) ) mytable(ID, date, field_1, field_2, desired_rank) END TRY BEGIN CATCH END CATCH ; WITH mytable_starts AS ( SELECT *, CASE WHEN field_1 <> LAG(field_1, 1, '') OVER(PARTITION BY ID ORDER BY [date]) THEN 1 WHEN field_2 <> LAG(field_2, 1, '') OVER(PARTITION BY ID ORDER BY [date]) THEN 1 ELSE 0 END AS is_start FROM #mytable ) SELECT ID, [date], field_1, field_2, desired_rank, SUM(is_start) OVER(PARTITION BY ID ORDER BY [date] ROWS UNBOUNDED PRECEDING) AS calculated_rank FROM mytable_starts ORDER BY id, [date]

Drew

Edited: Added an explicit ROWS UNBOUNDED PRECEDING, because it performs better than the default RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

shannonjk SSCarpal Tunnel Points: 4286 More actions · Answer 7

drew.allen - Thursday, January 26, 2017 1:33 PM
Here is a solution using LAG.
BEGIN TRY SELECT * INTO #mytable FROM ( VALUES (1, CAST('2/26/2015 5:58' AS DATETIME), 'test1', 'test1', 1), (1, '2/26/2015 16:40', 'test2', 'test1', 2), (1, '3/2/2015 1:58', 'test2', 'test2', 3), (1, '3/4/2015 15:19', 'test2', 'test2', 3), (1, '3/9/2015 13:20', 'test1', 'test2', 4), (1, '3/9/2015 15:33', 'test2', 'test2', 5), (1, '3/10/2015 10:23', 'test2', 'test2', 5), (1, '3/10/2015 15:18', 'test3', 'test2', 6), (1, '3/10/2015 18:05', 'test3', 'test2', 6), (1, '3/12/2015 0:13', 'test3', 'test2', 6), (2, '2/26/2015 5:58', 'test1', 'test1', 1), (2, '2/26/2015 16:40', 'test2', 'test1', 2), (2, '3/2/2015 1:58', 'test2', 'test2', 3), (2, '3/4/2015 15:19', 'test2', 'test2', 3), (2, '3/9/2015 13:20', 'test1', 'test2', 4), (2, '3/9/2015 15:33', 'test2', 'test2', 5), (2, '3/10/2015 10:23', 'test2', 'test2', 5), (2, '3/10/2015 15:18', 'test3', 'test2', 6), (2, '3/10/2015 18:05', 'test3', 'test2', 6), (2, '3/12/2015 0:13', 'test3', 'test2', 6) ) mytable(ID, date, field_1, field_2, desired_rank) END TRY BEGIN CATCH END CATCH ; WITH mytable_starts AS ( SELECT *, CASE WHEN field_1 <> LAG(field_1, 1, '') OVER(PARTITION BY ID ORDER BY [date]) THEN 1 WHEN field_2 <> LAG(field_2, 1, '') OVER(PARTITION BY ID ORDER BY [date]) THEN 1 ELSE 0 END AS is_start FROM #mytable ) SELECT ID, [date], field_1, field_2, desired_rank, SUM(is_start) OVER(PARTITION BY ID ORDER BY [date]) AS calculated_rank FROM mytable_starts ORDER BY id, [date]
Drew

Drew, this is amazing! Thank you very much, this worked as expected!

Link to my blog http://notyelf.com/