January 26, 2017 at 12:39 pm
Greetings all :).
Ranking/Row functions always seem to throw me off. Hoping I can get a hand in what is probably very simple :).
I have a batch of data that I need to have an increasing row number for a set of items that change. For some reason, when I am using the row and rank functions, I always seem to get the opposite.
I have put together some sample data, the first 4 fields are the data fields I have available to me, and this is how the data is characterized. The 5th column "Rank should be" is the result that I am trying to get.
As you can see, the ordering should be by ID and Date, and then when there are changes in field 1 and 2, the number should increase. When field 1 and 2 are the same as the last, they should stay the same number.
ID | date | field 1 | field 2 | Rank should be |
1 | 2/26/2015 5:58 | test1 | test1 | 1 |
1 | 2/26/2015 16:40 | test2 | test1 | 2 |
1 | 3/2/2015 1:58 | test2 | test2 | 3 |
1 | 3/4/2015 15:19 | test2 | test2 | 3 |
1 | 3/9/2015 13:20 | test1 | test2 | 4 |
1 | 3/9/2015 15:33 | test2 | test2 | 5 |
1 | 3/10/2015 10:23 | test2 | test2 | 5 |
1 | 3/10/2015 15:18 | test3 | test2 | 6 |
1 | 3/10/2015 18:05 | test3 | test2 | 6 |
1 | 3/12/2015 0:13 | test3 | test2 | 6 |
2 | 2/26/2015 5:58 | test1 | test1 | 1 |
2 | 2/26/2015 16:40 | test2 | test1 | 2 |
2 | 3/2/2015 1:58 | test2 | test2 | 3 |
2 | 3/4/2015 15:19 | test2 | test2 | 3 |
2 | 3/9/2015 13:20 | test1 | test2 | 4 |
2 | 3/9/2015 15:33 | test2 | test2 | 5 |
2 | 3/10/2015 10:23 | test2 | test2 | 5 |
2 | 3/10/2015 15:18 | test3 | test2 | 6 |
2 | 3/10/2015 18:05 | test3 | test2 | 6 |
2 | 3/12/2015 0:13 | test3 | test2 | 6 |
Link to my blog http://notyelf.com/
January 26, 2017 at 1:01 pm
Here's a quick attempt at doing what I think you're after - does this do as expected?select
mytable.ID
, mytable.date
, mytable.[field 1]
, mytable.[field 2]
, dense_rank() over(partition by mytable.ID, mytable.[date] order by mytable.[field 1], mytable.[field 2]) as [Rank should be]
from mytable
Andrew
January 26, 2017 at 1:15 pm
Andrew P - Thursday, January 26, 2017 1:01 PMHere's a quick attempt at doing what I think you're after - does this do as expected?select
mytable.ID
, mytable.date
, mytable.[field 1]
, mytable.[field 2]
, dense_rank() over(partition by mytable.ID, mytable.[date] order by mytable.[field 1], mytable.[field 2]) as [Rank should be]
from mytableAndrew
I'm thinking we can't include the date column in the partition by because it would reset ranking even if the ID, Field1 and Field2 are the same, maybe:DENSE_RANK() OVER (PARTITION BY ID ORDER BY Field1, Field2) AS rank_should_be
January 26, 2017 at 1:20 pm
Thanks for the response so far guys :).
Andrew, that simply returns 1 for everything.
Chris, that gets close, but it is ordered alphabetically by field 1 and 2, but we still need it ranked chronologically.
Link to my blog http://notyelf.com/
January 26, 2017 at 1:21 pm
ROW_NUMBER(), RANK(), and DENSE_RANK() produce a monotonically increasing series within a partition with respect to the ORDER BY expressions. That is, if your sort by partition expressions and then the ORDER BY expressions, the ROW_NUMBER(), RANK(), and DENSE_RANK() results will be in order as well. This means that if your produces a result where the desired ROW_NUMBER(), RANK(), or DENSE_RANK() are not in order, then you can't use that order in your ROW_NUMBER(), RANK(), or DENSE_RANK() function. Since you want repeats and you don't want to skip numbers, you want a DENSE_RANK().
If you sort your data by ID, field_1, field_2, then your desired DENSE_RANK() is not in order, so your order by cannot be field_1, field_2.
If you sort your data by ID, field_2, field_1, then your desired DENSE_RANK() is still not in order, so your order by cannot be field_2, field_1.
The only sort orders where the desired RANK() is in order start with ID, [Date], but you do not want to use [Date] as the leading expression in your rank, because you do not want the rank to increase every time that the date changes.
The ranking functions will not give you what you want. You'll need to look at some other function. Specifically, LAG/LEAD are designed to allow you to access the values from the previous/next records based on the partition and sort order.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
January 26, 2017 at 1:25 pm
drew.allen - Thursday, January 26, 2017 1:21 PMROW_NUMBER(), RANK(), and DENSE_RANK() produce a monotonically increasing series within a partition with respect to the ORDER BY expressions. That is, if your sort by partition expressions and then the ORDER BY expressions, the ROW_NUMBER(), RANK(), and DENSE_RANK() results will be in order as well. This means that if your produces a result where the desired ROW_NUMBER(), RANK(), or DENSE_RANK() are not in order, then you can't use that order in your ROW_NUMBER(), RANK(), or DENSE_RANK() function. Since you want repeats and you don't want to skip numbers, you want a DENSE_RANK().If you sort your data by ID, field_1, field_2, then your desired DENSE_RANK() is not in order, so your order by cannot be field_1, field_2.
If you sort your data by ID, field_2, field_1, then your desired DENSE_RANK() is still not in order, so your order by cannot be field_2, field_1.
The only sort orders where the desired RANK() is in order start with ID, [Date], but you do not want to use [Date] as the leading expression in your rank, because you do not want the rank to increase every time that the date changes.The ranking functions will not give you what you want. You'll need to look at some other function. Specifically, LAG/LEAD are designed to allow you to access the values from the previous/next records based on the partition and sort order.
Drew
Thank you Drew, I think that concisely explained what I had feared; the fact that it wasn't possible :).
I am not too familiar with the LAG/LEAD functions. Do you know how I would go about using that in order to achieve what I am trying to do here?
Link to my blog http://notyelf.com/
January 26, 2017 at 1:33 pm
Here is a solution using LAG.
BEGIN TRY
SELECT *
INTO #mytable
FROM (
VALUES
(1, CAST('2/26/2015 5:58' AS DATETIME), 'test1', 'test1', 1),
(1, '2/26/2015 16:40', 'test2', 'test1', 2),
(1, '3/2/2015 1:58', 'test2', 'test2', 3),
(1, '3/4/2015 15:19', 'test2', 'test2', 3),
(1, '3/9/2015 13:20', 'test1', 'test2', 4),
(1, '3/9/2015 15:33', 'test2', 'test2', 5),
(1, '3/10/2015 10:23', 'test2', 'test2', 5),
(1, '3/10/2015 15:18', 'test3', 'test2', 6),
(1, '3/10/2015 18:05', 'test3', 'test2', 6),
(1, '3/12/2015 0:13', 'test3', 'test2', 6),
(2, '2/26/2015 5:58', 'test1', 'test1', 1),
(2, '2/26/2015 16:40', 'test2', 'test1', 2),
(2, '3/2/2015 1:58', 'test2', 'test2', 3),
(2, '3/4/2015 15:19', 'test2', 'test2', 3),
(2, '3/9/2015 13:20', 'test1', 'test2', 4),
(2, '3/9/2015 15:33', 'test2', 'test2', 5),
(2, '3/10/2015 10:23', 'test2', 'test2', 5),
(2, '3/10/2015 15:18', 'test3', 'test2', 6),
(2, '3/10/2015 18:05', 'test3', 'test2', 6),
(2, '3/12/2015 0:13', 'test3', 'test2', 6)
) mytable(ID, date, field_1, field_2, desired_rank)
END TRY
BEGIN CATCH
END CATCH
;
WITH mytable_starts AS
(
SELECT *,
CASE
WHEN field_1 <> LAG(field_1, 1, '') OVER(PARTITION BY ID ORDER BY [date]) THEN 1
WHEN field_2 <> LAG(field_2, 1, '') OVER(PARTITION BY ID ORDER BY [date]) THEN 1
ELSE 0
END AS is_start
FROM #mytable
)
SELECT ID, [date], field_1, field_2, desired_rank, SUM(is_start) OVER(PARTITION BY ID ORDER BY [date] ROWS UNBOUNDED PRECEDING) AS calculated_rank
FROM mytable_starts
ORDER BY id, [date]
Drew
Edited: Added an explicit ROWS UNBOUNDED PRECEDING, because it performs better than the default RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
January 26, 2017 at 1:50 pm
drew.allen - Thursday, January 26, 2017 1:33 PMHere is a solution using LAG.
BEGIN TRY
SELECT *
INTO #mytable
FROM (
VALUES
(1, CAST('2/26/2015 5:58' AS DATETIME), 'test1', 'test1', 1),
(1, '2/26/2015 16:40', 'test2', 'test1', 2),
(1, '3/2/2015 1:58', 'test2', 'test2', 3),
(1, '3/4/2015 15:19', 'test2', 'test2', 3),
(1, '3/9/2015 13:20', 'test1', 'test2', 4),
(1, '3/9/2015 15:33', 'test2', 'test2', 5),
(1, '3/10/2015 10:23', 'test2', 'test2', 5),
(1, '3/10/2015 15:18', 'test3', 'test2', 6),
(1, '3/10/2015 18:05', 'test3', 'test2', 6),
(1, '3/12/2015 0:13', 'test3', 'test2', 6),
(2, '2/26/2015 5:58', 'test1', 'test1', 1),
(2, '2/26/2015 16:40', 'test2', 'test1', 2),
(2, '3/2/2015 1:58', 'test2', 'test2', 3),
(2, '3/4/2015 15:19', 'test2', 'test2', 3),
(2, '3/9/2015 13:20', 'test1', 'test2', 4),
(2, '3/9/2015 15:33', 'test2', 'test2', 5),
(2, '3/10/2015 10:23', 'test2', 'test2', 5),
(2, '3/10/2015 15:18', 'test3', 'test2', 6),
(2, '3/10/2015 18:05', 'test3', 'test2', 6),
(2, '3/12/2015 0:13', 'test3', 'test2', 6)
) mytable(ID, date, field_1, field_2, desired_rank)
END TRY
BEGIN CATCH
END CATCH
;
WITH mytable_starts AS
(
SELECT *,
CASE
WHEN field_1 <> LAG(field_1, 1, '') OVER(PARTITION BY ID ORDER BY [date]) THEN 1
WHEN field_2 <> LAG(field_2, 1, '') OVER(PARTITION BY ID ORDER BY [date]) THEN 1
ELSE 0
END AS is_start
FROM #mytable
)
SELECT ID, [date], field_1, field_2, desired_rank, SUM(is_start) OVER(PARTITION BY ID ORDER BY [date]) AS calculated_rank
FROM mytable_starts
ORDER BY id, [date]Drew
Drew, this is amazing! Thank you very much, this worked as expected!
Link to my blog http://notyelf.com/
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply