RANK() With NULLS Problem

Question

RANK() With NULLS Problem

dt293

Default port

Points: 1412
More actions
May 27, 2009 at 9:52 am

#131949

Hi,
I am trying to figure out a way to ignore NULLS using the RANK() function. According to this article it is the intended behaviour http://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=124953
I am running the following query;
CREATE TABLE #Ranking
(
[Name] varchar(50),
[Item] varchar(7),
Variant varchar(3),
RegDate datetime,
AbsenceDate datetime
)
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES('John Smith', '4040502', 'BI9', '2009-05-18 00:00:00.000', '2009-05-18 00:00:00.000');
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-19 00:00:00.000', '2009-05-19 00:00:00.000');
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-20 00:00:00.000',NULL)
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-21 00:00:00.000', '2009-05-21 00:00:00.000')
SELECT
[Name],
Item,
Variant,
RegDate,
AbsenceDate,
RANK() OVER (PARTITION BY [Name], Item, Variant
ORDER BY RegDate ASC) AS ConsecutiveDays
FROM #Ranking
Basically I am trying to count the consecutive days absence. In this example John Smith has two days absence, attends the third and is absent again on the fourth day.
As you can see the rank is displayed as 1,2,3,4 including the NULL value (no value for absence is counted as attendance). What I would like to see is a ranking of 1,2 for the first two records. NULL for the third and for the rank to start again at 1 for the fourth record.
I am not sure I will be able to achieve this using the rank function, any help would be appreciated!

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Ramesh Saive SSC-Insane Points: 24275 More actions · Answer 1

It looks like another running total problem, which has addressed many a times on this forum. And here is one solution by Jeff Moden that performs better than any other solution that I know of.

Link to Jeff Moden Version:

http://www.sqlservercentral.com/articles/Advanced+Querying/61716/[/url]

And few other slower versions which I don't recommend (uses cursors, correlated queries etc.)

http://www.sqlteam.com/article/calculating-running-totals

Study, Understand, Implement it and if you still have any issues, post back and I would be here to help you out.

--Ramesh

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 2

Yes, this looks more like a running totals problem than a ranking problem.

Unfortunately, Jeff's article is still offline. You can get an idea of the process by looking at my related article. It is referenced below along with Jeff's in my signature block.

Allister Reid SSCrazy Points: 2666 More actions · Answer 3

I changed the structure of #Ranking, adding the ConsecutiveDays when the table is created, hope this helps.

drop table #Ranking

CREATE TABLE #Ranking

(

[Name] varchar(50),

[Item] varchar(7),

Variant varchar(3),

RegDate datetime,

AbsenceDate datetime,

ConsecutiveDays int

)

INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

VALUES ('John Smith', '4040502', 'BI9', '2009-05-18 00:00:00.000', '2009-05-18 00:00:00.000');

INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

VALUES ('John Smith', '4040502', 'BI9', '2009-05-19 00:00:00.000', '2009-05-19 00:00:00.000');

INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

VALUES ('John Smith', '4040502', 'BI9', '2009-05-19 00:00:00.000', '2009-05-19 00:00:00.000');

INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

VALUES ('John Smith', '4040502', 'BI9', '2009-05-19 00:00:00.000', '2009-05-19 00:00:00.000');

INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

VALUES ('John Smith', '4040502', 'BI9', '2009-05-20 00:00:00.000', NULL)

INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

VALUES ('John Smith', '4040502', 'BI9', '2009-05-21 00:00:00.000', '2009-05-21 00:00:00.000')

declare @DaysAbsent int

update #Ranking

set

@DaysAbsent =

case when AbsenceDate is not null

then

case when @DaysAbsent is null

then 1

else @DaysAbsent + 1

end

else null

end,

ConsecutiveDays = @DaysAbsent

select * from #Ranking

dt293 Default port Points: 1412 More actions · Answer 4

Thank you all for your replies, I will take a look at Jeff's article when it comes back online as I am sure I will run into this problem again. For now though Allister your solution works perfectly. Thanks again!

Edit...

Actually I spoke a bit too soon, when I add another name into the table the results come back incorrectly, take a look at this example;

DROP TABLE #Ranking

CREATE TABLE #Ranking

(

[Name] varchar(50),

[Item] varchar(7),

Variant varchar(3),

RegDate datetime,

AbsenceDate datetime,

ConsecutiveDays int

)

INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

VALUES ('John Smith', '4040502', 'BI9', '2009-05-18 00:00:00.000', '2009-05-18 00:00:00.000');

INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

VALUES ('John Smith', '4040502', 'BI9', '2009-05-19 00:00:00.000', '2009-05-19 00:00:00.000');

INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

VALUES ('John Smith', '4040502', 'BI9', '2009-05-20 00:00:00.000', '2009-05-20 00:00:00.000');

INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

VALUES ('John Smith', '4040502', 'BI9', '2009-05-21 00:00:00.000', '2009-05-21 00:00:00.000');

INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

VALUES ('John Smith', '4040502', 'BI9', '2009-05-22 00:00:00.000', NULL)

INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

VALUES ('John Smith', '4040502', 'BI9', '2009-05-23 00:00:00.000', '2009-05-23 00:00:00.000')

INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

VALUES ('Jane Smith', '4040502', 'BI9', '2009-05-18 00:00:00.000', '2009-05-18 00:00:00.000');

INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

VALUES ('Jane Smith', '4040502', 'BI9', '2009-05-19 00:00:00.000', NULL);

INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)

VALUES ('Jane Smith', '4040502', 'BI9', '2009-05-20 00:00:00.000', '2009-05-20 00:00:00.000');

declare @DaysAbsent int

update #Ranking

set

@DaysAbsent = case when AbsenceDate is not null then

case when @DaysAbsent is null then 1

else @DaysAbsent + 1

end else null

end,

ConsecutiveDays = @DaysAbsent

SELECT * FROM #Ranking

You can see that lines six and seven has allocated consecutive days absences for two different names.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 5

Here is the code you are going to need. Please note the clustered index created on the temporary table. This is required as it controls the order of the update. Notice the use of index hints in the UPDATE statement, also required.

CREATE TABLE #Ranking

(

[Name] varchar(50),

[Item] varchar(7),

Variant varchar(3),

RegDate datetime,

AbsenceDate datetime,

ConsecutiveDays int

);

create clustered index ix_NameDate on #Ranking (

[Name] asc,

RegDate asc