May 30, 2008 at 6:29 am
I have a table where one of the columns is a timestamp (for an event that happens every five minutes), e.g.
2008-05-30 12:00:00
2008-05-30 12:05:00
2008-05-30 12:10:00
2008-05-30 12:15:00
2008-05-30 12:20:00
2008-05-30 12:25:00
2008-05-30 12:30:00
due to the nature of the source of this data, sometimes there will be missing entries, e.g.
2008-05-30 12:00:00 --> one entry missing
2008-05-30 12:10:00
2008-05-30 12:15:00 --> two entries missing
2008-05-30 12:30:00
I would now like to have a query that gives me a list of all the missing entries, e.g. in this case
2008-05-30 12:05:00
2008-05-30 12:20:00
2008-05-30 12:25:00
(obviously there would also be infinite "missing" entries before 2008-05-30 12:00:00 and again after 2008-05-30 12:30:00, so we should limit the search between the smallest and largest timestamps in the table)
btw, the events happen exactly every full five minutes, i.e. the minutes part is never 01, 02, 03, etc. but only 00, 05, 10, etc. and the seconds part is always 00. I assume we can use that somehow...
What would be the best approach to get a list of the missing values, if possible without using cursors?
thanks in advance 🙂
May 30, 2008 at 7:06 am
You could create the date list with a tally table to improve the performance, but an efficient way of doing what you want is to compare your list of dates to a complete set and return the missing ones.
[font="Courier New"]CREATE TABLE #tmpDates (MyDate DATETIME)
INSERT #tmpDates VALUES ('2008-05-30 12:00:00')
INSERT #tmpDates VALUES ('2008-05-30 12:05:00')
INSERT #tmpDates VALUES ('2008-05-30 12:10:00')
INSERT #tmpDates VALUES ('2008-05-30 12:15:00')
INSERT #tmpDates VALUES ('2008-05-30 12:20:00')
INSERT #tmpDates VALUES ('2008-05-30 12:25:00')
INSERT #tmpDates VALUES ('2008-05-30 12:30:00')
INSERT #tmpDates VALUES ('2008-05-30 12:45:00')
INSERT #tmpDates VALUES ('2008-05-30 12:50:00')
INSERT #tmpDates VALUES ('2008-05-30 12:55:00')
INSERT #tmpDates VALUES ('2008-05-30 13:05:00')
INSERT #tmpDates VALUES ('2008-05-30 13:10:00')
INSERT #tmpDates VALUES ('2008-05-30 13:15:00')
GO
DECLARE @MaxDate DATETIME
SET @MaxDate = (SELECT MAX(MyDate) FROM #tmpDates)
; WITH FullDateList (DateVal)
AS (
SELECT MIN(MyDate) FROM #tmpDates
UNION ALL
SELECT DATEADD(MINUTE,5,DateVal) FROM FullDateList WHERE
DATEADD(MINUTE,5,DateVal) <= @MaxDate
)
SELECT
*
FROM
FullDateList F
LEFT JOIN #tmpDates D ON F.DateVal = D.MyDate
WHERE
D.MyDate IS NULL
OPTION(MAXRECURSION 10000)[/font]
May 30, 2008 at 7:31 am
wow, thanks Michael, that works perfectly! 😀
I should really learn more about those Common Table Expressions, they are quite handy at times...
May 30, 2008 at 7:37 am
stk (5/30/2008)
wow, thanks Michael, that works perfectly! 😀I should really learn more about those Common Table Expressions, they are quite handy at times...
CTE's aren't nearly as handy as a tally table like Michael alluded to. Take a minute and read this excellent article by Jeff Moden and you'll be quite surprised by their power. 😀
http://www.sqlservercentral.com/articles/TSQL/62867/
______________________________________________________________________
Personal Motto: Why push the envelope when you can just open it?
If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.
Jason L. SelburgJune 4, 2008 at 6:20 am
Heh... thanks for the plug, Jason.
Stk, if you can, you should probably avoid the many forms of recursion. Recursion is a form of hidden RBAR that's not much better than a cursor... like the other folks have suggested, a Tally table solution would be much more efficient and it can be used in many ways... here's some examples using the data setup that Michael was kind enough to provide...
--===== Create and populate a test table (Thanks to Michael)
CREATE TABLE #tmpDates (MyDate DATETIME)
INSERT #tmpDates VALUES ('2008-05-30 12:00:00')
INSERT #tmpDates VALUES ('2008-05-30 12:05:00')
INSERT #tmpDates VALUES ('2008-05-30 12:10:00')
INSERT #tmpDates VALUES ('2008-05-30 12:15:00')
INSERT #tmpDates VALUES ('2008-05-30 12:20:00')
INSERT #tmpDates VALUES ('2008-05-30 12:25:00')
INSERT #tmpDates VALUES ('2008-05-30 12:30:00')
INSERT #tmpDates VALUES ('2008-05-30 12:45:00')
INSERT #tmpDates VALUES ('2008-05-30 12:50:00')
INSERT #tmpDates VALUES ('2008-05-30 12:55:00')
INSERT #tmpDates VALUES ('2008-05-30 13:05:00')
INSERT #tmpDates VALUES ('2008-05-30 13:10:00')
INSERT #tmpDates VALUES ('2008-05-30 13:15:00')
GO
-----------------------------------------------------------------
DECLARE @DateMin DATETIME
DECLARE @TMax INT
SELECT @DateMin = MIN(DATEADD(mi,-5,MyDate)),
@TMax = DATEDIFF(mi,@DateMin,MAX(MyDate))/5
FROM #tmpDates
--===== Demo a Tally table solution that works in 2k and 2k5
-- that's limited to about 5 weeks
SELECT dg.DateGen AS DateMissing
FROM #tmpDates td
RIGHT OUTER JOIN
(--==== Find all 5 minute intervals in the date range
SELECT DATEADD(mi,t.N*5,@DateMin) AS DateGen
FROM dbo.Tally t
WHERE t.N <= @TMax) dg
ON dg.DateGen = td.MyDate
WHERE td.MyDate IS NULL
--===== Demo a Tally solution that works 2k5 for virtually unlimited
-- date ranges
SELECT dg.DateGen AS DateMissing
FROM #tmpDates td
RIGHT OUTER JOIN
(--==== Find all 5 minute intervals in the date range
SELECT TOP (@TMax)
DATEADD(mi,ROW_NUMBER() OVER (ORDER BY t1.N)*5,@DateMin) AS DateGen
FROM dbo.Tally t1
CROSS JOIN
dbo.Tally t2) dg
ON dg.DateGen = td.MyDate
WHERE td.MyDate IS NULL
--===== Or, if you'd rather see it as a cte...
;WITH
cteDateGen AS
(--==== Find all 5 minute intervals in the date range
SELECT TOP (@TMax)
DATEADD(mi,ROW_NUMBER() OVER (ORDER BY t1.N)*5,@DateMin) AS DateGen
FROM dbo.Tally t1
CROSS JOIN
dbo.Tally t2
)
SELECT dg.DateGen AS DateMissing
FROM #tmpDates td
RIGHT OUTER JOIN
cteDateGen dg
ON dg.DateGen = td.MyDate
WHERE td.MyDate IS NULL
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply