Help with ranking functions

  • Greetings all :). 

    Ranking/Row functions always seem to throw me off. Hoping I can get a hand in what is probably very simple :).

    I have a batch of data that I need to have an increasing row number for a set of items that change. For some reason, when I am using the row and rank functions, I always seem to get the opposite. 

    I have put together some sample data, the first 4 fields are the data fields I have available to me, and this is how the data is characterized. The 5th column "Rank should be" is the result that I am trying to get. 

    As you can see, the ordering should be by ID and Date, and then when there are changes in field 1 and 2, the number should increase. When field 1 and 2 are the same as the last, they should stay the same number.

    IDdatefield 1field 2Rank should be 
    12/26/2015 5:58test1test11
    12/26/2015 16:40test2test12
    13/2/2015 1:58test2test23
    13/4/2015 15:19test2test23
    13/9/2015 13:20test1test24
    13/9/2015 15:33test2test25
    13/10/2015 10:23test2test25
    13/10/2015 15:18test3test26
    13/10/2015 18:05test3test26
    13/12/2015 0:13test3test26
    22/26/2015 5:58test1test11
    22/26/2015 16:40test2test12
    23/2/2015 1:58test2test23
    23/4/2015 15:19test2test23
    23/9/2015 13:20test1test24
    23/9/2015 15:33test2test25
    23/10/2015 10:23test2test25
    23/10/2015 15:18test3test26
    23/10/2015 18:05test3test26
    23/12/2015 0:13test3test26

    Link to my blog http://notyelf.com/

  • Here's a quick attempt at doing what I think you're after - does this do as expected?
    select
        mytable.ID
        , mytable.date
        , mytable.[field 1]
        , mytable.[field 2]
        , dense_rank() over(partition by mytable.ID, mytable.[date] order by mytable.[field 1], mytable.[field 2]) as [Rank should be]
    from mytable

    Andrew

  • Andrew P - Thursday, January 26, 2017 1:01 PM

    Here's a quick attempt at doing what I think you're after - does this do as expected?
    select
        mytable.ID
        , mytable.date
        , mytable.[field 1]
        , mytable.[field 2]
        , dense_rank() over(partition by mytable.ID, mytable.[date] order by mytable.[field 1], mytable.[field 2]) as [Rank should be]
    from mytable

    Andrew

    I'm thinking we can't include the date column in the partition by because it would reset ranking even if the ID, Field1 and Field2 are the same, maybe:
    DENSE_RANK() OVER (PARTITION BY ID ORDER BY Field1, Field2) AS rank_should_be 

  • Thanks for the response so far guys :). 

    Andrew, that simply returns 1 for everything.

    Chris, that gets close, but it is ordered alphabetically by field 1 and 2, but we still need it ranked chronologically.

    Link to my blog http://notyelf.com/

  • ROW_NUMBER(), RANK(), and DENSE_RANK() produce a monotonically increasing series within a partition with respect to the ORDER BY expressions.  That is, if your sort by partition expressions and then the ORDER BY expressions, the ROW_NUMBER(), RANK(), and DENSE_RANK() results will be in order as well.  This means that if your produces a result where the desired ROW_NUMBER(), RANK(), or DENSE_RANK() are not in order, then you can't use that order in your ROW_NUMBER(), RANK(), or DENSE_RANK() function.  Since you want repeats and you don't want to skip numbers, you want a DENSE_RANK().

    If you sort your data by ID, field_1, field_2, then your desired DENSE_RANK() is not in order, so your order by cannot be field_1, field_2.
    If you sort your data by ID, field_2, field_1, then your desired DENSE_RANK() is still not in order, so your order by cannot be field_2, field_1.
    The only sort orders where the desired RANK() is in order start with ID, [Date], but you do not want to use [Date] as the leading expression in your rank, because you do not want the rank to increase every time that the date changes.

    The ranking functions will not give you what you want.  You'll need to look at some other function.  Specifically, LAG/LEAD are designed to allow you to access the values from the previous/next records based on the partition and sort order.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen - Thursday, January 26, 2017 1:21 PM

    ROW_NUMBER(), RANK(), and DENSE_RANK() produce a monotonically increasing series within a partition with respect to the ORDER BY expressions.  That is, if your sort by partition expressions and then the ORDER BY expressions, the ROW_NUMBER(), RANK(), and DENSE_RANK() results will be in order as well.  This means that if your produces a result where the desired ROW_NUMBER(), RANK(), or DENSE_RANK() are not in order, then you can't use that order in your ROW_NUMBER(), RANK(), or DENSE_RANK() function.  Since you want repeats and you don't want to skip numbers, you want a DENSE_RANK().

    If you sort your data by ID, field_1, field_2, then your desired DENSE_RANK() is not in order, so your order by cannot be field_1, field_2.
    If you sort your data by ID, field_2, field_1, then your desired DENSE_RANK() is still not in order, so your order by cannot be field_2, field_1.
    The only sort orders where the desired RANK() is in order start with ID, [Date], but you do not want to use [Date] as the leading expression in your rank, because you do not want the rank to increase every time that the date changes.

    The ranking functions will not give you what you want.  You'll need to look at some other function.  Specifically, LAG/LEAD are designed to allow you to access the values from the previous/next records based on the partition and sort order.

    Drew

    Thank you Drew, I think that concisely explained what I had feared; the fact that it wasn't possible :). 

    I am not too familiar with the LAG/LEAD functions. Do you know how I would go about using that in order to achieve what I am trying to do here?

    Link to my blog http://notyelf.com/

  • Here is a solution using LAG.

    BEGIN TRY
      SELECT *
      INTO #mytable
      FROM (
          VALUES
      (1, CAST('2/26/2015 5:58' AS DATETIME), 'test1', 'test1', 1),
      (1, '2/26/2015 16:40', 'test2', 'test1', 2),
      (1, '3/2/2015 1:58', 'test2', 'test2', 3),
      (1, '3/4/2015 15:19', 'test2', 'test2', 3),
      (1, '3/9/2015 13:20', 'test1', 'test2', 4),
      (1, '3/9/2015 15:33', 'test2', 'test2', 5),
      (1, '3/10/2015 10:23', 'test2', 'test2', 5),
      (1, '3/10/2015 15:18', 'test3', 'test2', 6),
      (1, '3/10/2015 18:05', 'test3', 'test2', 6),
      (1, '3/12/2015 0:13', 'test3', 'test2', 6),
      (2, '2/26/2015 5:58', 'test1', 'test1', 1),
      (2, '2/26/2015 16:40', 'test2', 'test1', 2),
      (2, '3/2/2015 1:58', 'test2', 'test2', 3),
      (2, '3/4/2015 15:19', 'test2', 'test2', 3),
      (2, '3/9/2015 13:20', 'test1', 'test2', 4),
      (2, '3/9/2015 15:33', 'test2', 'test2', 5),
      (2, '3/10/2015 10:23', 'test2', 'test2', 5),
      (2, '3/10/2015 15:18', 'test3', 'test2', 6),
      (2, '3/10/2015 18:05', 'test3', 'test2', 6),
      (2, '3/12/2015 0:13', 'test3', 'test2', 6)
      ) mytable(ID, date, field_1, field_2, desired_rank)
    END TRY
    BEGIN CATCH
    END CATCH
    ;
    WITH mytable_starts AS
    (
        SELECT *,
            CASE
                WHEN field_1 <> LAG(field_1, 1, '') OVER(PARTITION BY ID ORDER BY [date]) THEN 1
                WHEN field_2 <> LAG(field_2, 1, '') OVER(PARTITION BY ID ORDER BY [date]) THEN 1
                ELSE 0
            END AS is_start
        FROM #mytable
    )
    SELECT ID, [date], field_1, field_2, desired_rank, SUM(is_start) OVER(PARTITION BY ID ORDER BY [date] ROWS UNBOUNDED PRECEDING) AS calculated_rank
    FROM mytable_starts
        ORDER BY id, [date]

    Drew

    Edited:  Added an explicit ROWS UNBOUNDED PRECEDING, because it performs better than the default RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen - Thursday, January 26, 2017 1:33 PM

    Here is a solution using LAG.

    BEGIN TRY
      SELECT *
      INTO #mytable
      FROM (
          VALUES
      (1, CAST('2/26/2015 5:58' AS DATETIME), 'test1', 'test1', 1),
      (1, '2/26/2015 16:40', 'test2', 'test1', 2),
      (1, '3/2/2015 1:58', 'test2', 'test2', 3),
      (1, '3/4/2015 15:19', 'test2', 'test2', 3),
      (1, '3/9/2015 13:20', 'test1', 'test2', 4),
      (1, '3/9/2015 15:33', 'test2', 'test2', 5),
      (1, '3/10/2015 10:23', 'test2', 'test2', 5),
      (1, '3/10/2015 15:18', 'test3', 'test2', 6),
      (1, '3/10/2015 18:05', 'test3', 'test2', 6),
      (1, '3/12/2015 0:13', 'test3', 'test2', 6),
      (2, '2/26/2015 5:58', 'test1', 'test1', 1),
      (2, '2/26/2015 16:40', 'test2', 'test1', 2),
      (2, '3/2/2015 1:58', 'test2', 'test2', 3),
      (2, '3/4/2015 15:19', 'test2', 'test2', 3),
      (2, '3/9/2015 13:20', 'test1', 'test2', 4),
      (2, '3/9/2015 15:33', 'test2', 'test2', 5),
      (2, '3/10/2015 10:23', 'test2', 'test2', 5),
      (2, '3/10/2015 15:18', 'test3', 'test2', 6),
      (2, '3/10/2015 18:05', 'test3', 'test2', 6),
      (2, '3/12/2015 0:13', 'test3', 'test2', 6)
      ) mytable(ID, date, field_1, field_2, desired_rank)
    END TRY
    BEGIN CATCH
    END CATCH
    ;
    WITH mytable_starts AS
    (
        SELECT *,
            CASE
                WHEN field_1 <> LAG(field_1, 1, '') OVER(PARTITION BY ID ORDER BY [date]) THEN 1
                WHEN field_2 <> LAG(field_2, 1, '') OVER(PARTITION BY ID ORDER BY [date]) THEN 1
                ELSE 0
            END AS is_start
        FROM #mytable
    )
    SELECT ID, [date], field_1, field_2, desired_rank, SUM(is_start) OVER(PARTITION BY ID ORDER BY [date]) AS calculated_rank
    FROM mytable_starts
        ORDER BY id, [date]

    Drew

    Drew, this is amazing! Thank you very much, this worked as expected! 

    Link to my blog http://notyelf.com/

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply