Introduction
In the past, several times I had read about generating missing dates and numbers. But I never paid serious attention to the scripts because I never came across an assignment which needed them. But recently I had to write a few stored procedures which dealt with missing numbers and missing dates.
Most of the stuff I read in the past were suggesting a temp table which holds a series of dates. So the 'Missing Dates' query can use a NOT IN clause with the pre-populated table and retrieve the missing dates.
In this article, I am presenting a different approach to generate Missing Dates and Missing Numbers without using a temp table. No, I am not against the temp table approach. But I am trying to present another way (which looks simpler to me) of doing the same thing.
Find Missing Numbers
It is obvious that, to find missing numbers, we need to have a table or resultset which contains all the numbers. Then we can apply a NOT IN clause and identify the missing numbers.
If you are wondering, why I needed missing numbers, here is the reason. In one of my applications, there is an entity that we call coordinator. Each coordinator has a 4 digit numeric code. The code starts at 1000 and ends at 9999. (There is no chance that we will have more than 9000 coordinators at any point of time, so a 4 digit code is sufficient). When the user creates a new coordinator, the system automatically generates a new coordinator number. However, the user can still edit it. The system will allow the user to edit the coordinator number (during a new entry) as long as it does not produce a duplicate code.
With the above functionality, we needed a way to reuse un-used/missing numbers. When we create a new coordinator, we need to find the lowest unused coordinator
code. For this purpose we needed a way to identify the missing numbers.
As mentioned earlier in this article, I did not want to keep a temp table. I want to generate the numbers on the fly. There are several ways to do this. I like to use a CTE to do this.
In one of my previous articles, I had presented a way to generate a sequence of numbers by using a CTE. It was fast. But a faster code fragment was posted by Kathi Kellenberger in the discussion forum. It was taken from Itzik Ben-Gan's book "Inside Microsoft SQL Server 2005 T-SQL Querying". Here is the code fragment which generates 1 million records in the fraction of a second. [code]
1 WITH
2 L0 AS (SELECT 1 AS C UNION ALL SELECT 1), --2 rows
3 L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B),--4 rows
4 L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B),--16 rows
5 L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B),--256 rows
6 L4 AS (SELECT 1 AS C FROM L3 AS A, L3 AS B),--65536 rows
7 L5 AS (SELECT 1 AS C FROM L4 AS A, L4 AS B),--4294967296 rows
8 num AS (SELECT ROW_NUMBER() OVER(ORDER BY C) AS N FROM L5)
9 SELECT N FROM NUM WHERE N <= 1000000;
This code is VERY VERY fast and I find it to be the best candidate for generating a sequence of numbers on the fly. I wanted to re-use this code and hence I created a function that takes a Minimum and Maximum value and returns a resultset containing sequence numbers within the given range. [code]
1 CREATE FUNCTION dbo.GetNumbers
2 (
3 @Start BIGINT,
4 @End BIGINT
5 )
6 RETURNS @ret TABLE(Number BIGINT)
7 AS
8 BEGIN
9 WITH
10 L0 AS (SELECT 1 AS C UNION ALL SELECT 1), --2 rows
11 L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B),--4 rows
12 L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B),--16 rows
13 L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B),--256 rows
14 L4 AS (SELECT 1 AS C FROM L3 AS A, L3 AS B),--65536 rows
15 L5 AS (SELECT 1 AS C FROM L4 AS A, L4 AS B),--4294967296 rows
16 num AS (SELECT ROW_NUMBER() OVER(ORDER BY C) AS N FROM L5)
17
18 INSERT INTO @ret(Number)
19 SELECT N FROM NUM WHERE N BETWEEN @Start AND @End
20 RETURN
21 END
Once we have the function to generate the sequence number, we can easily write the code to identify the missing numbers. [code]
1 SELECT MIN(Number)
2 FROM dbo.GetNumbers(1000, 9999)
3 WHERE Number NOT IN (SELECT CoordinatorID FROM Coordinators)
Find Missing Dates
I guess the requirement to find missing dates must be more common than the missing number requirement I mentioned above. To find the missing dates (Dates at which no sales took place, Dates at which a given employ was absent etc) we need to have a table or resultset which contains all the dates within a given range.
The following example shows how to generate a sequence of dates using the same logic we discussed earlier. [code]
1 /*
2 Generate a sequence of all the dates for the month
3 of October, 2007
4 */
5 SELECT
6 CAST('2007-10-01' AS DATETIME) + Number-1
7 FROM dbo.GetNumbers(1, 30)
8
9 /*
10 Generate a sequence of all the dates for
11 year 2007
12 */
13 SELECT
14 CAST('2007-01-01' AS DATETIME) + Number-1
15 FROM dbo.GetNumbers(1, 365)
Once we have a sequence of dates, we can easily apply a NOT IN clause and find the missing dates based on the specific requirement.
Conclusions
This article presents an approach to generate missing numbers and dates without using a temp table. I do not intend to claim that this approach is better than the others. There is no programming approach that suites all requirements. Based on the specific application scenario a certain approach may be found suitable than the others. I guess some of you may find this approach easier to integrate into your application specific requirements.