find missing entries in series of datetimes

  • I have a table where one of the columns is a timestamp (for an event that happens every five minutes), e.g.

    2008-05-30 12:00:00

    2008-05-30 12:05:00

    2008-05-30 12:10:00

    2008-05-30 12:15:00

    2008-05-30 12:20:00

    2008-05-30 12:25:00

    2008-05-30 12:30:00

    due to the nature of the source of this data, sometimes there will be missing entries, e.g.

    2008-05-30 12:00:00 --> one entry missing

    2008-05-30 12:10:00

    2008-05-30 12:15:00 --> two entries missing

    2008-05-30 12:30:00

    I would now like to have a query that gives me a list of all the missing entries, e.g. in this case

    2008-05-30 12:05:00

    2008-05-30 12:20:00

    2008-05-30 12:25:00

    (obviously there would also be infinite "missing" entries before 2008-05-30 12:00:00 and again after 2008-05-30 12:30:00, so we should limit the search between the smallest and largest timestamps in the table)

    btw, the events happen exactly every full five minutes, i.e. the minutes part is never 01, 02, 03, etc. but only 00, 05, 10, etc. and the seconds part is always 00. I assume we can use that somehow...

    What would be the best approach to get a list of the missing values, if possible without using cursors?

    thanks in advance 🙂

  • You could create the date list with a tally table to improve the performance, but an efficient way of doing what you want is to compare your list of dates to a complete set and return the missing ones.

    [font="Courier New"]CREATE TABLE #tmpDates (MyDate DATETIME)

    INSERT #tmpDates VALUES ('2008-05-30 12:00:00')

    INSERT #tmpDates VALUES ('2008-05-30 12:05:00')

    INSERT #tmpDates VALUES ('2008-05-30 12:10:00')

    INSERT #tmpDates VALUES ('2008-05-30 12:15:00')

    INSERT #tmpDates VALUES ('2008-05-30 12:20:00')

    INSERT #tmpDates VALUES ('2008-05-30 12:25:00')

    INSERT #tmpDates VALUES ('2008-05-30 12:30:00')

    INSERT #tmpDates VALUES ('2008-05-30 12:45:00')

    INSERT #tmpDates VALUES ('2008-05-30 12:50:00')

    INSERT #tmpDates VALUES ('2008-05-30 12:55:00')

    INSERT #tmpDates VALUES ('2008-05-30 13:05:00')

    INSERT #tmpDates VALUES ('2008-05-30 13:10:00')

    INSERT #tmpDates VALUES ('2008-05-30 13:15:00')

    GO

    DECLARE @MaxDate DATETIME

    SET @MaxDate = (SELECT MAX(MyDate) FROM #tmpDates)

    ; WITH FullDateList (DateVal)

    AS (

    SELECT MIN(MyDate) FROM #tmpDates

    UNION ALL

    SELECT DATEADD(MINUTE,5,DateVal) FROM FullDateList WHERE

    DATEADD(MINUTE,5,DateVal) <= @MaxDate

    )

    SELECT

    *

    FROM

    FullDateList F

    LEFT JOIN #tmpDates D ON F.DateVal = D.MyDate

    WHERE

    D.MyDate IS NULL

    OPTION(MAXRECURSION 10000)[/font]

  • wow, thanks Michael, that works perfectly! 😀

    I should really learn more about those Common Table Expressions, they are quite handy at times...

  • stk (5/30/2008)


    wow, thanks Michael, that works perfectly! 😀

    I should really learn more about those Common Table Expressions, they are quite handy at times...

    CTE's aren't nearly as handy as a tally table like Michael alluded to. Take a minute and read this excellent article by Jeff Moden and you'll be quite surprised by their power. 😀

    http://www.sqlservercentral.com/articles/TSQL/62867/

    ______________________________________________________________________

    Personal Motto: Why push the envelope when you can just open it?

    If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.

    Jason L. Selburg
  • Heh... thanks for the plug, Jason.

    Stk, if you can, you should probably avoid the many forms of recursion. Recursion is a form of hidden RBAR that's not much better than a cursor... like the other folks have suggested, a Tally table solution would be much more efficient and it can be used in many ways... here's some examples using the data setup that Michael was kind enough to provide...

    --===== Create and populate a test table (Thanks to Michael)

    CREATE TABLE #tmpDates (MyDate DATETIME)

    INSERT #tmpDates VALUES ('2008-05-30 12:00:00')

    INSERT #tmpDates VALUES ('2008-05-30 12:05:00')

    INSERT #tmpDates VALUES ('2008-05-30 12:10:00')

    INSERT #tmpDates VALUES ('2008-05-30 12:15:00')

    INSERT #tmpDates VALUES ('2008-05-30 12:20:00')

    INSERT #tmpDates VALUES ('2008-05-30 12:25:00')

    INSERT #tmpDates VALUES ('2008-05-30 12:30:00')

    INSERT #tmpDates VALUES ('2008-05-30 12:45:00')

    INSERT #tmpDates VALUES ('2008-05-30 12:50:00')

    INSERT #tmpDates VALUES ('2008-05-30 12:55:00')

    INSERT #tmpDates VALUES ('2008-05-30 13:05:00')

    INSERT #tmpDates VALUES ('2008-05-30 13:10:00')

    INSERT #tmpDates VALUES ('2008-05-30 13:15:00')

    GO

    -----------------------------------------------------------------

    DECLARE @DateMin DATETIME

    DECLARE @TMax INT

    SELECT @DateMin = MIN(DATEADD(mi,-5,MyDate)),

    @TMax = DATEDIFF(mi,@DateMin,MAX(MyDate))/5

    FROM #tmpDates

    --===== Demo a Tally table solution that works in 2k and 2k5

    -- that's limited to about 5 weeks

    SELECT dg.DateGen AS DateMissing

    FROM #tmpDates td

    RIGHT OUTER JOIN

    (--==== Find all 5 minute intervals in the date range

    SELECT DATEADD(mi,t.N*5,@DateMin) AS DateGen

    FROM dbo.Tally t

    WHERE t.N <= @TMax) dg

    ON dg.DateGen = td.MyDate

    WHERE td.MyDate IS NULL

    --===== Demo a Tally solution that works 2k5 for virtually unlimited

    -- date ranges

    SELECT dg.DateGen AS DateMissing

    FROM #tmpDates td

    RIGHT OUTER JOIN

    (--==== Find all 5 minute intervals in the date range

    SELECT TOP (@TMax)

    DATEADD(mi,ROW_NUMBER() OVER (ORDER BY t1.N)*5,@DateMin) AS DateGen

    FROM dbo.Tally t1

    CROSS JOIN

    dbo.Tally t2) dg

    ON dg.DateGen = td.MyDate

    WHERE td.MyDate IS NULL

    --===== Or, if you'd rather see it as a cte...

    ;WITH

    cteDateGen AS

    (--==== Find all 5 minute intervals in the date range

    SELECT TOP (@TMax)

    DATEADD(mi,ROW_NUMBER() OVER (ORDER BY t1.N)*5,@DateMin) AS DateGen

    FROM dbo.Tally t1

    CROSS JOIN

    dbo.Tally t2

    )

    SELECT dg.DateGen AS DateMissing

    FROM #tmpDates td

    RIGHT OUTER JOIN

    cteDateGen dg

    ON dg.DateGen = td.MyDate

    WHERE td.MyDate IS NULL

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply