May 27, 2009 at 9:52 am
Hi,
I am trying to figure out a way to ignore NULLS using the RANK() function. According to this article it is the intended behaviour http://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=124953
I am running the following query;
CREATE TABLE #Ranking
(
[Name] varchar(50),
[Item] varchar(7),
Variant varchar(3),
RegDate datetime,
AbsenceDate datetime
)
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES('John Smith', '4040502', 'BI9', '2009-05-18 00:00:00.000', '2009-05-18 00:00:00.000');
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-19 00:00:00.000', '2009-05-19 00:00:00.000');
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-20 00:00:00.000',NULL)
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-21 00:00:00.000', '2009-05-21 00:00:00.000')
SELECT
[Name],
Item,
Variant,
RegDate,
AbsenceDate,
RANK() OVER (PARTITION BY [Name], Item, Variant
ORDER BY RegDate ASC) AS ConsecutiveDays
FROM #Ranking
Basically I am trying to count the consecutive days absence. In this example John Smith has two days absence, attends the third and is absent again on the fourth day.
As you can see the rank is displayed as 1,2,3,4 including the NULL value (no value for absence is counted as attendance). What I would like to see is a ranking of 1,2 for the first two records. NULL for the third and for the rank to start again at 1 for the fourth record.
I am not sure I will be able to achieve this using the rank function, any help would be appreciated!
May 27, 2009 at 10:14 am
It looks like another running total problem, which has addressed many a times on this forum. And here is one solution by Jeff Moden that performs better than any other solution that I know of.
Link to Jeff Moden Version:
http://www.sqlservercentral.com/articles/Advanced+Querying/61716/[/url]
And few other slower versions which I don't recommend (uses cursors, correlated queries etc.)
http://www.sqlteam.com/article/calculating-running-totals
Study, Understand, Implement it and if you still have any issues, post back and I would be here to help you out.
--Ramesh
May 27, 2009 at 10:35 am
Yes, this looks more like a running totals problem than a ranking problem.
Unfortunately, Jeff's article is still offline. You can get an idea of the process by looking at my related article. It is referenced below along with Jeff's in my signature block.
May 27, 2009 at 10:44 am
I changed the structure of #Ranking, adding the ConsecutiveDays when the table is created, hope this helps.
drop table #Ranking
CREATE TABLE #Ranking
(
[Name] varchar(50),
[Item] varchar(7),
Variant varchar(3),
RegDate datetime,
AbsenceDate datetime,
ConsecutiveDays int
)
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-18 00:00:00.000', '2009-05-18 00:00:00.000');
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-19 00:00:00.000', '2009-05-19 00:00:00.000');
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-19 00:00:00.000', '2009-05-19 00:00:00.000');
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-19 00:00:00.000', '2009-05-19 00:00:00.000');
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-20 00:00:00.000', NULL)
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-21 00:00:00.000', '2009-05-21 00:00:00.000')
declare @DaysAbsent int
update #Ranking
set
@DaysAbsent =
case when AbsenceDate is not null
then
case when @DaysAbsent is null
then 1
else @DaysAbsent + 1
end
else null
end,
ConsecutiveDays = @DaysAbsent
select * from #Ranking
May 28, 2009 at 1:52 am
Thank you all for your replies, I will take a look at Jeff's article when it comes back online as I am sure I will run into this problem again. For now though Allister your solution works perfectly. Thanks again!
Edit...
Actually I spoke a bit too soon, when I add another name into the table the results come back incorrectly, take a look at this example;
DROP TABLE #Ranking
CREATE TABLE #Ranking
(
[Name] varchar(50),
[Item] varchar(7),
Variant varchar(3),
RegDate datetime,
AbsenceDate datetime,
ConsecutiveDays int
)
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-18 00:00:00.000', '2009-05-18 00:00:00.000');
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-19 00:00:00.000', '2009-05-19 00:00:00.000');
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-20 00:00:00.000', '2009-05-20 00:00:00.000');
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-21 00:00:00.000', '2009-05-21 00:00:00.000');
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-22 00:00:00.000', NULL)
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-23 00:00:00.000', '2009-05-23 00:00:00.000')
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('Jane Smith', '4040502', 'BI9', '2009-05-18 00:00:00.000', '2009-05-18 00:00:00.000');
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('Jane Smith', '4040502', 'BI9', '2009-05-19 00:00:00.000', NULL);
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('Jane Smith', '4040502', 'BI9', '2009-05-20 00:00:00.000', '2009-05-20 00:00:00.000');
declare @DaysAbsent int
update #Ranking
set
@DaysAbsent = case when AbsenceDate is not null then
case when @DaysAbsent is null then 1
else @DaysAbsent + 1
end else null
end,
ConsecutiveDays = @DaysAbsent
SELECT * FROM #Ranking
You can see that lines six and seven has allocated consecutive days absences for two different names.
May 28, 2009 at 6:59 am
Here is the code you are going to need. Please note the clustered index created on the temporary table. This is required as it controls the order of the update. Notice the use of index hints in the UPDATE statement, also required.
CREATE TABLE #Ranking
(
[Name] varchar(50),
[Item] varchar(7),
Variant varchar(3),
RegDate datetime,
AbsenceDate datetime,
ConsecutiveDays int
);
create clustered index ix_NameDate on #Ranking (
[Name] asc,
RegDate asc
);
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-18 00:00:00.000', '2009-05-18 00:00:00.000');
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-19 00:00:00.000', '2009-05-19 00:00:00.000');
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-20 00:00:00.000', '2009-05-20 00:00:00.000');
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-21 00:00:00.000', '2009-05-21 00:00:00.000');
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-22 00:00:00.000', NULL)
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-23 00:00:00.000', '2009-05-23 00:00:00.000')
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('Jane Smith', '4040502', 'BI9', '2009-05-18 00:00:00.000', '2009-05-18 00:00:00.000');
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('Jane Smith', '4040502', 'BI9', '2009-05-19 00:00:00.000', NULL);
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('Jane Smith', '4040502', 'BI9', '2009-05-20 00:00:00.000', '2009-05-20 00:00:00.000');
declare @DaysAbsent int,
@Name varchar(50);
set @Name = '';
update #Ranking set
@DaysAbsent = ConsecutiveDays = case when r.AbsenceDate is null
then 0
else case when r.[Name] @Name
then 1
else @DaysAbsent + 1
end
end,
@Name = r.[Name]
from
#Ranking r with (index(ix_NameDate),tablockx)
SELECT * FROM #Ranking
DROP TABLE #Ranking
May 28, 2009 at 9:27 pm
So, did the code I provide help you at all?
May 29, 2009 at 2:18 am
Hi Lynn many thanks for your help, sorry for the delay, didn't get a chance to test until late yesterday.
I spotted another slight porblem in testing, take a look at this;
DROP TABLE #Ranking
CREATE TABLE #Ranking
(
[Name] varchar(50),
[Item] varchar(7),
Variant varchar(3),
RegDate datetime,
AbsenceDate datetime,
ConsecutiveDays int
);
CREATE CLUSTERED INDEX ix_NameDate ON #Ranking
(
[Name] asc,
RegDate asc
);
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-18 00:00:00.000', '2009-05-18 00:00:00.000');
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-19 00:00:00.000', '2009-05-19 00:00:00.000');
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-20 00:00:00.000', '2009-05-20 00:00:00.000');
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-21 00:00:00.000', '2009-05-21 00:00:00.000');
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040503', 'BZ8', '2009-05-22 00:00:00.000', '2009-05-22 00:00:00.000');
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-22 00:00:00.000', NULL)
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-23 00:00:00.000', '2009-05-23 00:00:00.000')
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('Jane Smith', '4040502', 'BI9', '2009-05-18 00:00:00.000', '2009-05-18 00:00:00.000');
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('Jane Smith', '4040502', 'BI9', '2009-05-19 00:00:00.000', NULL);
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('Jane Smith', '4040502', 'BI9', '2009-05-20 00:00:00.000', NULL);
DECLARE @DaysAbsent int,
@Name varchar(50);
SET @Name = '';
UPDATE #Ranking
SET @DaysAbsent = ConsecutiveDays =
CASE WHEN r.AbsenceDate IS NULL THEN 0
ELSE CASE WHEN R.[Name] @Name THEN 1
ELSE @DaysAbsent + 1
END
END,
@Name = r.[Name]
FROM
#Ranking r WITH (INDEX(ix_NameDate),TABLOCKX)
SELECT * FROM #Ranking
If you look at row 8 John has been assigned five consecutive days absence but the fifth day is actually a different item, which should begin the count again. Apologies if I did not make this clear previously, I only spotted it in testing!
May 29, 2009 at 9:39 am
dt (5/29/2009)
Hi Lynn many thanks for your help, sorry for the delay, didn't get a chance to test until late yesterday.I spotted another slight porblem in testing, take a look at this;
DROP TABLE #Ranking
CREATE TABLE #Ranking
(
[Name] varchar(50),
[Item] varchar(7),
Variant varchar(3),
RegDate datetime,
AbsenceDate datetime,
ConsecutiveDays int
);
CREATE CLUSTERED INDEX ix_NameDate ON #Ranking
(
[Name] asc,
RegDate asc
);
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-18 00:00:00.000', '2009-05-18 00:00:00.000');
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-19 00:00:00.000', '2009-05-19 00:00:00.000');
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-20 00:00:00.000', '2009-05-20 00:00:00.000');
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-21 00:00:00.000', '2009-05-21 00:00:00.000');
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040503', 'BZ8', '2009-05-22 00:00:00.000', '2009-05-22 00:00:00.000');
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-22 00:00:00.000', NULL)
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-23 00:00:00.000', '2009-05-23 00:00:00.000')
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('Jane Smith', '4040502', 'BI9', '2009-05-18 00:00:00.000', '2009-05-18 00:00:00.000');
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('Jane Smith', '4040502', 'BI9', '2009-05-19 00:00:00.000', NULL);
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('Jane Smith', '4040502', 'BI9', '2009-05-20 00:00:00.000', NULL);
DECLARE @DaysAbsent int,
@Name varchar(50);
SET @Name = '';
UPDATE #Ranking
SET @DaysAbsent = ConsecutiveDays =
CASE WHEN r.AbsenceDate IS NULL THEN 0
ELSE CASE WHEN R.[Name] @Name THEN 1
ELSE @DaysAbsent + 1
END
END,
@Name = r.[Name]
FROM
#Ranking r WITH (INDEX(ix_NameDate),TABLOCKX)
SELECT * FROM #Ranking
If you look at row 8 John has been assigned five consecutive days absence but the fifth day is actually a different item, which should begin the count again. Apologies if I did not make this clear previously, I only spotted it in testing!
First, I have to complain. Why not give us the full requirements upfront instead of this "Oh, by the way...". You would get much better answers the first time if you did this.
With that, before I start doing anything more, are there any other requirements we should be aware of, or have we finally identified them all?
June 1, 2009 at 2:23 am
Again apologies, as I say I only realised whilst testing. It's why we test right? Anyway, no there are no other requirements!
June 1, 2009 at 7:35 pm
dt (6/1/2009)
Again apologies, as I say I only realised whilst testing. It's why we test right? Anyway, no there are no other requirements!
Actually, not in this case. You really should know what your requirements are before you start working on the code. You know the data, you know what the results should be.
Any way, here is the code I have come up with, test it to be sure it returns the result you expect. If not, hopefully you should have enough information to figure out what additional tuning is required. If not, just ask.
CREATE TABLE #Ranking
(
[Name] varchar(50),
[Item] varchar(7),
Variant varchar(3),
RegDate datetime,
AbsenceDate datetime,
ConsecutiveDays int
);
CREATE CLUSTERED INDEX ix_NameDate ON #Ranking
(
[Name] asc,
[Item] asc,
RegDate asc
);
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-18 00:00:00.000', '2009-05-18 00:00:00.000');
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-19 00:00:00.000', '2009-05-19 00:00:00.000');
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-20 00:00:00.000', '2009-05-20 00:00:00.000');
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-21 00:00:00.000', '2009-05-21 00:00:00.000');
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040503', 'BZ8', '2009-05-22 00:00:00.000', '2009-05-22 00:00:00.000');
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-22 00:00:00.000', NULL)
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('John Smith', '4040502', 'BI9', '2009-05-23 00:00:00.000', '2009-05-23 00:00:00.000')
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('Jane Smith', '4040502', 'BI9', '2009-05-18 00:00:00.000', '2009-05-18 00:00:00.000');
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('Jane Smith', '4040502', 'BI9', '2009-05-19 00:00:00.000', NULL);
INSERT INTO #Ranking ([Name], [Item], Variant, RegDate, AbsenceDate)
VALUES ('Jane Smith', '4040502', 'BI9', '2009-05-20 00:00:00.000', NULL);
DECLARE @DaysAbsent int,
@Name varchar(50),
@Item varchar(7);
SET @Name = '';
SET @Item = '';
UPDATE #Ranking
SET @DaysAbsent = ConsecutiveDays =
CASE WHEN r.AbsenceDate IS NULL THEN 0
ELSE CASE WHEN (r.[Name] @Name) or (r.[Item] @Item) THEN 1
ELSE @DaysAbsent + 1
END
END,
@Name = r.[Name],
@Item = r.[Item]
FROM
#Ranking r WITH (INDEX(ix_NameDate),TABLOCKX);
select * from #Ranking;
DROP TABLE #Ranking
June 2, 2009 at 2:05 am
Perfect. Thank you very much for your help and patience 🙂
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply