When we need to find the nearest business day, or to calculate number of business days between two dates, and when we need a simple solution, then UDFs using calendar tables fit the bill perfectly. In this article we shall discuss how to:
- Build a calendar table
- Find the nearest business day
- Calculate number of business days between two dates
To ensure good performance and code reuse, our solutions will be wrapped as inline UDFs. Also we shall see a little bit of test-driven development in action, as we shall be developing these solutions in little increments.
Building a calendar table
In this section we shall create and populate a calendar table that stores all days, both business and non-business ones (the reasons why we need to store all days rather than only non-business ones will be discussed later). To populate such a table in a set-based way, in the way we typically do it in real life, we need a numbers table - a table with consecutive numbers. In case you don't have it yet, let us create a numbers table from scratch:
CREATE TABLE dbo.Numbers
(
n INT NOT NULL
CONSTRAINT PK_Numbers PRIMARY KEY
)
GO
INSERT INTO dbo.Numbers
( n )
VALUES ( 1 ) ;
DECLARE @i INT ;
SET @i = 1;
WHILE @i < 1024
BEGIN
INSERT INTO dbo.Numbers
( n )
SELECT n + @i
FROM dbo.Numbers ;
SET @i = @i * 2 ;
END ;
Once we have a numbers table, we can go ahead and build a calendar one, as follows:
CREATE TABLE dbo.Calendar
(
[Date] DATE NOT NULL
CONSTRAINT PK_Calendar PRIMARY KEY ,
IsBusinessDate CHAR(1)
NOT NULL
CONSTRAINT CHK_Calendar_IsBusinessDate
CHECK ( IsBusinessDate IN ( 'Y', 'N' ) )
) ;
GO INSERT INTO dbo.Calendar
( Date ,
IsBusinessDate
)
SELECT DATEADD(DAY, n, '20091231') AS [Date] ,
CASE WHEN DATEPART(weekday, DATEADD(DAY, n, '20091231')) IN (
1, 7 ) THEN 'N'
ELSE 'Y'
END AS IsBusinessDate
FROM dbo.Numbers
Also we need to mark holidays such as Easter and Independence Day in our calendar, as follows:
UPDATE dbo.Calendar
SET IsBusinessDate = 'N'
WHERE [Date] IN ( '20100404', -- Easter
'20100705' ) -- day after Independence
Now that the calendar table is built, let us utilize it.
Finding the business day on or after the given one
To make sure our code is easy to reuse, we shall create a user defined function. To make sure the function performs well, we shall create an inline one. For a detailed explanation of performace differences between scalar and inline UDFs, refer to Chapter 5 of my book.
Before writing any code, let us come up with a few tests:
SELECT '20100827' AS Expected ,
[Date]
FROM dbo.NearestBusinessDayOnOrBefore('20100827') SELECT '20100830' AS Expected ,
[Date]
FROM dbo.NearestBusinessDayOnOrBefore('20100828') SELECT '20100830' AS Expected ,
[Date]
FROM dbo.NearestBusinessDayOnOrBefore('20100829') SELECT '20100830' AS Expected ,
[Date]
FROM dbo.NearestBusinessDayOnOrBefore('20100830')
Of course, these tests are manual, not automated. We shall not discuss automatd and unit tests here. Although some of us already use automated tests, it is not a commonly accepted practice yet. Also at the time of this writing different teams use very different approaches for database testing, and I don't know of any widely accepted and convenient one. Discussing which approach to use for automated tests, and which tests to incorporate in the test harness, may be very interesting, but it is beyond the scope of this article.
Once we have the tests which describe the interface and expected results, let us develop just enough code to pass those tests:
CREATE FUNCTION dbo.NearestBusinessDayOnOrBefore ( @Date DATE )
RETURNS TABLE
AS RETURN
( SELECT TOP 1
[Date]
FROM dbo.Calendar
WHERE [Date] <= @Date
AND IsBusinessDate = 'Y'
ORDER BY [Date] DESC
)
As soon as we have the function, we can run the tests against it. With all the tests passing, we could just move on to another task, but, as good defensive programmers, we need to think of cases of unintended use. We don't need to think too hard to come up with such a case. Note that the earliest date in our calendar table is Jan 1st, 2010, and the last one is Oct 20th, 2012. This means that we don't really know what are the non-business days after that date. If we call our function for any date later than Oct 20th, 2012, what result should it return?
Suppose that we have clarified the requirements, and the expected result is NULL. The following test documents this requirement; and it fails.
DECLARE @d DATE;
SELECT @d = DATEADD(DAY,1,MAX([Date])) FROM dbo.Calendar ;
SELECT NULL AS Expected ,
[Date]
FROM dbo.NearestBusinessDayOnOrBefore(@d)
Let us add some more code, so that this new test passes as well as the old ones:
ALTER FUNCTION dbo.NearestBusinessDayOnOrBefore ( @Date DATE )
RETURNS TABLE
AS RETURN
( SELECT TOP 1
CASE WHEN EXISTS ( SELECT *
FROM dbo.Calendar
WHERE [Date] = @Date ) THEN [Date] END AS [Date]
FROM dbo.Calendar
WHERE [Date] <= @Date
AND IsBusinessDate = 'Y'
ORDER BY [Date] DESC
)
The example that we have just considered demonstrates why we have chosen to store all days in our table, rather than store only holidays, as is frequnelty suggested. The reason is simple: if store only holidays, we cannot distinguish between the situations when we know that there are no holidays and the situations when we don't know what the holidays are.
Note: in real life we should consider incorporating at least some of the tests used during this session of test-driven development in our automated test harness.
Also in real life we may decide that we shall build our calendar table for a really wide range of dates, and we don't care what is returned if the argument it outside that wide range. Under that assumption we may choose to use the previous, the simpler and more performant, version of our function, the one without EXISTS clause. This is what I actually use in my system.
Finding number of business days between two days.
As in the previous section, we shall use test-driven development to come up with a solution. This means that some tests must be developed first, as follows:
-- 0 business days between end of Friday and end of next Sunday
SELECT 0 AS Expected ,
NumDays
FROM dbo.BusinessDaysBetweenEndsOfDays ('20100827', '20100829') -- 5 business days between two consecutive Wednesdays, without holidays between them
SELECT 5 AS Expected ,
NumDays
FROM dbo.BusinessDaysBetweenEndsOfDays ('20100818', '20100825') -- 0 business days if both parameters are the same
SELECT 0 AS Expected ,
NumDays
FROM dbo.BusinessDaysBetweenEndsOfDays ('20100827', '20100827')
Note: the "between ends of days" requirement needs clarification. How many business days are between Friday and the next Saturday? It depends - there is one business day between early Friday morning and early Saturday morning, but there are zero business days between late Friday night and late Saturday night. The "between ends of days" requirement means that we are implementing the second approach, the number of business days between the times late at night on given days.
Let us implement just enough code to meet these requirements, to pass these tests:
CREATE FUNCTION dbo.BusinessDaysBetweenEndsOfDays
(
@DateFrom DATE ,
@DateTo DATE
)
RETURNS TABLE
AS RETURN
( SELECT COUNT(*) AS NumDays
FROM dbo.Calendar
WHERE [Date] > @DateFrom
AND [Date] <= @DateTo
AND IsBusinessDate = 'Y'
)
This code passes our tests, but we have more requirements. As in the previous section, we may need to address the case when the calendar table does not store all the days between DateFrom and DateTo. Of course, we might choose to not handle this case, just like we did in the very end of the previous section. However, let us suppose that we have decided to address this case.
Before adding more tests, and before altering the function to pass them, let me explain why we don't lay out all the tests at once, and why we don't develop all the code at once. In my experience, we are more efficient when we develop code in reasonalbly small increments. If we are trying to implement all the requirements at once, we usually spend more time on development. This is especially true when we have a lot of distractions and need to multitask - in such cases we want to complete a step between distractions and context switching.
Of course, for developers with different skill levels and different experience with the subject area the optimal step size may be very different. So, if these steps are too small for you, just take bigger ones.
Getting back to development, the following tests all expect NULL, because one or both ends of the range are beyond the range of known dates, and these tests do not pass yet:
-- NULL if at least one parameter does not have a row in Calendar table
SELECT NULL AS Expected ,
NumDays
FROM dbo.BusinessDaysBetweenEndsOfDays('20010827', '20100817') SELECT NULL AS Expected ,
NumDays
FROM dbo.BusinessDaysBetweenEndsOfDays('20100827', '20990817') SELECT NULL AS Expected ,
NumDays
FROM dbo.BusinessDaysBetweenEndsOfDays('20010827', '20990817')
Modifying the function to pass these tests is easy:
ALTER FUNCTION dbo.BusinessDaysBetweenEndsOfDays
(
@DateFrom DATE ,
@DateTo DATE
)
RETURNS TABLE
AS RETURN
( SELECT CASE WHEN EXISTS ( SELECT *
FROM dbo.Calendar
WHERE [Date] = @DateFrom )
AND EXISTS ( SELECT *
FROM dbo.Calendar
WHERE [Date] = @DateTo )
THEN COUNT(*)
END AS NumDays
FROM dbo.Calendar
WHERE [Date] > @DateFrom
AND [Date] <= @DateTo
AND IsBusinessDate = 'Y'
)
So far so good, but what shall we do if DateTo is earlier than DateFrom? The current version of the function returns either null or a positive number. If the requirements are to return negative number in such cases, we would need one more iteration of adding one more test and altering the function to pass it.
What can go wrong with our solution?
As careful and experienced developers, we must not rush to deploy as soon as all our tests have passed. We need to think of cases of unintended use; we need to spend some time trying to break our own solutions. In this case, breaking our solution is very easy. All our code assumes that our calendar table stores dates without gaps. If someone occasionally deletes rows from the middle of the range of the calendar table, we shall start getting incorrect results.
Can we make sure that our calendar table has no gaps? Definitely: a couple of computed columns and constraint should take care of that. This is doable, not too complex, and we shall discuss it in some later article.
Do we really have to do that? It depends. If we are developing a solution that is going to last, then in the long run it is usually easier to provide a foolproof solution from the very beginning than to go through an expensive cycle of troubleshooting, fixing, testing, and redeploying.