RANK() With NULLS Problem

  • Hi,

    I am trying to figure out a way to ignore NULLS using the RANK() function. According to this article it is the intended behaviour http://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=124953

    I am running the following query;

    CREATE TABLE #Ranking

    (

    [Name] varchar(50),

    [Item] varchar(7),

    Variant varchar(3),

    RegDate datetime,

    AbsenceDate datetime

    )

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES('John Smith', '4040502', 'BI9', '2009-05-18 00:00:00.000', '2009-05-18 00:00:00.000');

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-19 00:00:00.000', '2009-05-19 00:00:00.000');

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-20 00:00:00.000',NULL)

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-21 00:00:00.000', '2009-05-21 00:00:00.000')

    SELECT

    [Name],

    Item,

    Variant,

    RegDate,

    AbsenceDate,

    RANK() OVER (PARTITION BY [Name], Item, Variant

    ORDER BY RegDate ASC) AS ConsecutiveDays

    FROM #Ranking

    Basically I am trying to count the consecutive days absence. In this example John Smith has two days absence, attends the third and is absent again on the fourth day.

    As you can see the rank is displayed as 1,2,3,4 including the NULL value (no value for absence is counted as attendance). What I would like to see is a ranking of 1,2 for the first two records. NULL for the third and for the rank to start again at 1 for the fourth record.

    I am not sure I will be able to achieve this using the rank function, any help would be appreciated!

  • It looks like another running total problem, which has addressed many a times on this forum. And here is one solution by Jeff Moden that performs better than any other solution that I know of.

    Link to Jeff Moden Version:

    http://www.sqlservercentral.com/articles/Advanced+Querying/61716/[/url]

    And few other slower versions which I don't recommend (uses cursors, correlated queries etc.)

    http://www.sqlteam.com/article/calculating-running-totals

    Study, Understand, Implement it and if you still have any issues, post back and I would be here to help you out.

    --Ramesh


  • Yes, this looks more like a running totals problem than a ranking problem.

    Unfortunately, Jeff's article is still offline. You can get an idea of the process by looking at my related article. It is referenced below along with Jeff's in my signature block.

  • I changed the structure of #Ranking, adding the ConsecutiveDays when the table is created, hope this helps.

    drop table #Ranking

    CREATE TABLE #Ranking

    (

    [Name] varchar(50),

    [Item] varchar(7),

    Variant varchar(3),

    RegDate datetime,

    AbsenceDate datetime,

    ConsecutiveDays int

    )

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-18 00:00:00.000', '2009-05-18 00:00:00.000');

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-19 00:00:00.000', '2009-05-19 00:00:00.000');

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-19 00:00:00.000', '2009-05-19 00:00:00.000');

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-19 00:00:00.000', '2009-05-19 00:00:00.000');

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-20 00:00:00.000', NULL)

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-21 00:00:00.000', '2009-05-21 00:00:00.000')

    declare @DaysAbsent int

    update #Ranking

    set

    @DaysAbsent =

    case when AbsenceDate is not null

    then

    case when @DaysAbsent is null

    then 1

    else @DaysAbsent + 1

    end

    else null

    end,

    ConsecutiveDays = @DaysAbsent

    select * from #Ranking

  • Thank you all for your replies, I will take a look at Jeff's article when it comes back online as I am sure I will run into this problem again. For now though Allister your solution works perfectly. Thanks again!

    Edit...

    Actually I spoke a bit too soon, when I add another name into the table the results come back incorrectly, take a look at this example;

    DROP TABLE #Ranking

    CREATE TABLE #Ranking

    (

    [Name] varchar(50),

    [Item] varchar(7),

    Variant varchar(3),

    RegDate datetime,

    AbsenceDate datetime,

    ConsecutiveDays int

    )

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-18 00:00:00.000', '2009-05-18 00:00:00.000');

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-19 00:00:00.000', '2009-05-19 00:00:00.000');

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-20 00:00:00.000', '2009-05-20 00:00:00.000');

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-21 00:00:00.000', '2009-05-21 00:00:00.000');

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-22 00:00:00.000', NULL)

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-23 00:00:00.000', '2009-05-23 00:00:00.000')

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('Jane Smith', '4040502', 'BI9', '2009-05-18 00:00:00.000', '2009-05-18 00:00:00.000');

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('Jane Smith', '4040502', 'BI9', '2009-05-19 00:00:00.000', NULL);

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('Jane Smith', '4040502', 'BI9', '2009-05-20 00:00:00.000', '2009-05-20 00:00:00.000');

    declare @DaysAbsent int

    update #Ranking

    set

    @DaysAbsent = case when AbsenceDate is not null then

    case when @DaysAbsent is null then 1

    else @DaysAbsent + 1

    end else null

    end,

    ConsecutiveDays = @DaysAbsent

    SELECT * FROM #Ranking

    You can see that lines six and seven has allocated consecutive days absences for two different names.

  • Here is the code you are going to need. Please note the clustered index created on the temporary table. This is required as it controls the order of the update. Notice the use of index hints in the UPDATE statement, also required.

    CREATE TABLE #Ranking

    (

    [Name] varchar(50),

    [Item] varchar(7),

    Variant varchar(3),

    RegDate datetime,

    AbsenceDate datetime,

    ConsecutiveDays int

    );

    create clustered index ix_NameDate on #Ranking (

    [Name] asc,

    RegDate asc

    );

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-18 00:00:00.000', '2009-05-18 00:00:00.000');

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-19 00:00:00.000', '2009-05-19 00:00:00.000');

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-20 00:00:00.000', '2009-05-20 00:00:00.000');

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-21 00:00:00.000', '2009-05-21 00:00:00.000');

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-22 00:00:00.000', NULL)

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-23 00:00:00.000', '2009-05-23 00:00:00.000')

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('Jane Smith', '4040502', 'BI9', '2009-05-18 00:00:00.000', '2009-05-18 00:00:00.000');

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('Jane Smith', '4040502', 'BI9', '2009-05-19 00:00:00.000', NULL);

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('Jane Smith', '4040502', 'BI9', '2009-05-20 00:00:00.000', '2009-05-20 00:00:00.000');

    declare @DaysAbsent int,

    @Name varchar(50);

    set @Name = '';

    update #Ranking set

    @DaysAbsent = ConsecutiveDays = case when r.AbsenceDate is null

    then 0

    else case when r.[Name] @Name

    then 1

    else @DaysAbsent + 1

    end

    end,

    @Name = r.[Name]

    from

    #Ranking r with (index(ix_NameDate),tablockx)

    SELECT * FROM #Ranking

    DROP TABLE #Ranking

  • So, did the code I provide help you at all?

  • Hi Lynn many thanks for your help, sorry for the delay, didn't get a chance to test until late yesterday.

    I spotted another slight porblem in testing, take a look at this;

    DROP TABLE #Ranking

    CREATE TABLE #Ranking

    (

    [Name] varchar(50),

    [Item] varchar(7),

    Variant varchar(3),

    RegDate datetime,

    AbsenceDate datetime,

    ConsecutiveDays int

    );

    CREATE CLUSTERED INDEX ix_NameDate ON #Ranking

    (

    [Name] asc,

    RegDate asc

    );

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-18 00:00:00.000', '2009-05-18 00:00:00.000');

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-19 00:00:00.000', '2009-05-19 00:00:00.000');

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-20 00:00:00.000', '2009-05-20 00:00:00.000');

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-21 00:00:00.000', '2009-05-21 00:00:00.000');

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040503', 'BZ8', '2009-05-22 00:00:00.000', '2009-05-22 00:00:00.000');

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-22 00:00:00.000', NULL)

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-23 00:00:00.000', '2009-05-23 00:00:00.000')

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('Jane Smith', '4040502', 'BI9', '2009-05-18 00:00:00.000', '2009-05-18 00:00:00.000');

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('Jane Smith', '4040502', 'BI9', '2009-05-19 00:00:00.000', NULL);

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('Jane Smith', '4040502', 'BI9', '2009-05-20 00:00:00.000', NULL);

    DECLARE @DaysAbsent int,

    @Name varchar(50);

    SET @Name = '';

    UPDATE #Ranking

    SET @DaysAbsent = ConsecutiveDays =

    CASE WHEN r.AbsenceDate IS NULL THEN 0

    ELSE CASE WHEN R.[Name] @Name THEN 1

    ELSE @DaysAbsent + 1

    END

    END,

    @Name = r.[Name]

    FROM

    #Ranking r WITH (INDEX(ix_NameDate),TABLOCKX)

    SELECT * FROM #Ranking

    If you look at row 8 John has been assigned five consecutive days absence but the fifth day is actually a different item, which should begin the count again. Apologies if I did not make this clear previously, I only spotted it in testing!

  • dt (5/29/2009)


    Hi Lynn many thanks for your help, sorry for the delay, didn't get a chance to test until late yesterday.

    I spotted another slight porblem in testing, take a look at this;

    DROP TABLE #Ranking

    CREATE TABLE #Ranking

    (

    [Name] varchar(50),

    [Item] varchar(7),

    Variant varchar(3),

    RegDate datetime,

    AbsenceDate datetime,

    ConsecutiveDays int

    );

    CREATE CLUSTERED INDEX ix_NameDate ON #Ranking

    (

    [Name] asc,

    RegDate asc

    );

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-18 00:00:00.000', '2009-05-18 00:00:00.000');

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-19 00:00:00.000', '2009-05-19 00:00:00.000');

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-20 00:00:00.000', '2009-05-20 00:00:00.000');

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-21 00:00:00.000', '2009-05-21 00:00:00.000');

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040503', 'BZ8', '2009-05-22 00:00:00.000', '2009-05-22 00:00:00.000');

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-22 00:00:00.000', NULL)

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-23 00:00:00.000', '2009-05-23 00:00:00.000')

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('Jane Smith', '4040502', 'BI9', '2009-05-18 00:00:00.000', '2009-05-18 00:00:00.000');

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('Jane Smith', '4040502', 'BI9', '2009-05-19 00:00:00.000', NULL);

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('Jane Smith', '4040502', 'BI9', '2009-05-20 00:00:00.000', NULL);

    DECLARE @DaysAbsent int,

    @Name varchar(50);

    SET @Name = '';

    UPDATE #Ranking

    SET @DaysAbsent = ConsecutiveDays =

    CASE WHEN r.AbsenceDate IS NULL THEN 0

    ELSE CASE WHEN R.[Name] @Name THEN 1

    ELSE @DaysAbsent + 1

    END

    END,

    @Name = r.[Name]

    FROM

    #Ranking r WITH (INDEX(ix_NameDate),TABLOCKX)

    SELECT * FROM #Ranking

    If you look at row 8 John has been assigned five consecutive days absence but the fifth day is actually a different item, which should begin the count again. Apologies if I did not make this clear previously, I only spotted it in testing!

    First, I have to complain. Why not give us the full requirements upfront instead of this "Oh, by the way...". You would get much better answers the first time if you did this.

    With that, before I start doing anything more, are there any other requirements we should be aware of, or have we finally identified them all?

  • Again apologies, as I say I only realised whilst testing. It's why we test right? Anyway, no there are no other requirements!

  • dt (6/1/2009)


    Again apologies, as I say I only realised whilst testing. It's why we test right? Anyway, no there are no other requirements!

    Actually, not in this case. You really should know what your requirements are before you start working on the code. You know the data, you know what the results should be.

    Any way, here is the code I have come up with, test it to be sure it returns the result you expect. If not, hopefully you should have enough information to figure out what additional tuning is required. If not, just ask.

    CREATE TABLE #Ranking

    (

    [Name] varchar(50),

    [Item] varchar(7),

    Variant varchar(3),

    RegDate datetime,

    AbsenceDate datetime,

    ConsecutiveDays int

    );

    CREATE CLUSTERED INDEX ix_NameDate ON #Ranking

    (

    [Name] asc,

    [Item] asc,

    RegDate asc

    );

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-18 00:00:00.000', '2009-05-18 00:00:00.000');

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-19 00:00:00.000', '2009-05-19 00:00:00.000');

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-20 00:00:00.000', '2009-05-20 00:00:00.000');

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-21 00:00:00.000', '2009-05-21 00:00:00.000');

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040503', 'BZ8', '2009-05-22 00:00:00.000', '2009-05-22 00:00:00.000');

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-22 00:00:00.000', NULL)

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('John Smith', '4040502', 'BI9', '2009-05-23 00:00:00.000', '2009-05-23 00:00:00.000')

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('Jane Smith', '4040502', 'BI9', '2009-05-18 00:00:00.000', '2009-05-18 00:00:00.000');

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('Jane Smith', '4040502', 'BI9', '2009-05-19 00:00:00.000', NULL);

    INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

    VALUES ('Jane Smith', '4040502', 'BI9', '2009-05-20 00:00:00.000', NULL);

    DECLARE @DaysAbsent int,

    @Name varchar(50),

    @Item varchar(7);

    SET @Name = '';

    SET @Item = '';

    UPDATE #Ranking

    SET @DaysAbsent = ConsecutiveDays =

    CASE WHEN r.AbsenceDate IS NULL THEN 0

    ELSE CASE WHEN (r.[Name] @Name) or (r.[Item] @Item) THEN 1

    ELSE @DaysAbsent + 1

    END

    END,

    @Name = r.[Name],

    @Item = r.[Item]

    FROM

    #Ranking r WITH (INDEX(ix_NameDate),TABLOCKX);

    select * from #Ranking;

    DROP TABLE #Ranking

  • Perfect. Thank you very much for your help and patience 🙂

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply