Return records at least 10 minutes apart from last valid record without CURSOR

  • I know this is probably pretty easy but it's late and my head is frazzled and I have a desire to go to bed. So I am hoping that by the time I get into work tomorrow someone who is wide awake from the USA has read this and given me an answer 😀

    I can already do this with a cursor OR while loop but I want to do it with one SELECT if possible. The database is currently SQL 2005 so I need a solution that works with that system.

    The Problem

    I have a table of horse races for a given day. I am trying to return the first X or last X records in the table that are at least 10 minutes apart (could be more than 10 minutes but definitely not less). There are lots of races that are only 5 mins apart from the next one which means I cannot just do a DATEDIFF between the current record and the last one as it might miss out records.

    You can create some test data with this (I know its a SQL 2008 insert but the solution needs to be SQL 2005)

    declare @races table(raceno int identity(1,1),racedatetime datetime,diff int)

    insert into @races

    (racedatetime)

    values

    ( '2011-02-20 13:50:00'),('2011-02-20 13:55:00'),('2011-02-20 14:00:00'),('2011-02-20 14:10:00'),

    ('2011-02-20 14:20:00'),('2011-02-20 14:25:00')

    so from that data set I need the following rows returned when doing a TOP 4

    RaceNo Racedatetime

    12011-02-20 13:50:00

    32011-02-20 14:00:00

    42011-02-20 14:10:00

    52011-02-20 14:20:00

    Doing something like this (which is as far as I have got at the moment)

    SELECTTOP 4 *

    FROM(

    SELECTTOP 6

    raceno,Racedatetime,

    DATEDIFF(mi,

    (SELECT MAX(Racedatetime)

    FROM@RACES

    WHERERacedatetime < R.Racedatetime), Racedatetime) as Diff

    FROM@RACES as R

    ORDER BY RaceDateTime

    ) as t

    WHEREDiff IS NULL OR Diff >= 10

    returns this

    RaceNo Racedatetime Diff

    12011-02-20 13:50:00.000NULL

    42011-02-20 14:10:00.00010

    52011-02-20 14:20:00.00010

    Which obviously misses out the race no 3 at 14:00

    I am sure I can accomplish it with some more sub selects and some ROW_NUMBER() type function but I don't want to be replacing a working cursor that just checks the DATEDIFF between the current record and the last accepted record with something convoluted so I am hoping for an elegant and simple solution if possible and I am sure there is one but I just cannot get my head in gear at the moment :w00t:

    Thanks for any help received in advance.

  • Try something like this

    ;With RaceRows AS

    (

    SELECT

    ROW_NUMBER() OVER (ORDER BY racedatetime) AS row,

    racedatetime

    FROM

    @races

    ),

    RaceIntervals AS

    (

    SELECT

    rr1.row AS row1,

    rr2.row AS row2,

    rr1.racedatetime AS racedatetime1,

    rr2.racedatetime AS racedatetime2,

    DATEDIFF(ss, rr1.racedatetime, rr2.racedatetime) AS raceinterval

    FROM

    RaceRows rr1

    INNER JOIN

    RaceRows rr2

    ON rr1.row = rr2.row - 1

    )

    SELECT

    *

    FROM

    RaceIntervals

    WHERE

    raceinterval >= 600

    I didn't actually run the query above so I might have a slight syntax error but you should at least get the concept of the query. The semicolon at the beginning is important and must be used if any other code precedes the "WITH" statement.

  • I re-read your original question. The query I provided doesn't directly address your issue. I apologize. I should have taken more time to interpret you question. I'm experimenting and hope to get back to you soon with a real solution...

  • Ok, let's give it another try...

    declare @races table(raceno int identity(1,1),racedatetime datetime)

    insert into @races

    (racedatetime)

    SELECT '2011-02-20 13:50:00'

    UNION ALL

    SELECT '2011-02-20 13:55:00'

    UNION ALL

    SELECT '2011-02-20 14:00:00'

    UNION ALL

    SELECT '2011-02-20 14:10:00'

    UNION ALL

    SELECT '2011-02-20 14:20:00'

    UNION ALL

    SELECT '2011-02-20 14:25:00'

    SELECT

    *

    FROM

    @races

    ORDER BY

    racedatetime

    ;WITH RaceRows AS

    (

    SELECT

    ROW_NUMBER() OVER (ORDER BY racedatetime) AS row,

    racedatetime

    FROM

    @races

    ),

    RaceIntervals AS

    (

    SELECT

    rr1.row

    FROM

    RaceRows rr1

    INNER JOIN

    RaceRows rr2

    ON rr1.row = rr2.row - 1

    WHERE

    DATEDIFF(ss, rr1.racedatetime, rr2.racedatetime) < 600 AND rr1.row <> 1

    )

    SELECT

    *

    FROM

    RaceRows rr

    WHERE

    NOT EXISTS (SELECT * FROM RaceIntervals ri WHERE rr.row = ri.row)

    The basic idea is to exclude all rows from @races that are not greater than 600 seconds after the preceding race. Sometimes, it just depends on how the question is asked....

    If your @races table is large, it may takes some good indexing to make this solution efficient.

    If you are sure that there are no gaps in raceno and that racedatetimes are inserted in chronological order, you don't have to use the ROW_NUMBER() function.

  • Thanks for replying but that second solution still doesn't work.

    If you add a few more records into the race table e.g

    insert into @races

    (racedatetime)

    SELECT '2011-02-20 13:50:00'

    UNION ALL

    SELECT '2011-02-20 13:55:00'

    UNION ALL

    SELECT '2011-02-20 14:00:00'

    UNION ALL

    SELECT '2011-02-20 14:10:00'

    UNION ALL

    SELECT '2011-02-20 14:20:00'

    UNION ALL

    SELECT '2011-02-20 14:25:00'

    UNION ALL

    SELECT '2011-02-20 14:26:00'

    UNION ALL

    SELECT '2011-02-20 14:27:00'

    UNION ALL

    SELECT '2011-02-20 14:30:00'

    UNION ALL

    SELECT '2011-02-20 14:33:00'

    UNION ALL

    SELECT '2011-02-20 14:36:00'

    UNION ALL

    SELECT '2011-02-20 14:45:00'

    UNION ALL

    SELECT '2011-02-20 14:46:00'

    UNION ALL

    SELECT '2011-02-20 14:56:01'

    Then you get this back

    rowracedatetime

    12011-02-20 13:50:00.000

    32011-02-20 14:00:00.000

    42011-02-20 14:10:00.000

    132011-02-20 14:46:00.000

    142011-02-20 14:56:01.000

    when you really should get this

    racenoracedatetime

    12011-02-20 13:50:00.000

    32011-02-20 14:00:00.000

    42011-02-20 14:10:00.000

    52011-02-20 14:20:00.000

    92011-02-20 14:30:00.000

    122011-02-20 14:45:00.000

    142011-02-20 14:56:01.000

    So what I require is a non cursor solution that will return records that are at least 10 minutes apart from the last matching record. The gap between records may be one or it may be twenty other records but as soon as one record matches the next subsequent matching record must take its time difference from that date stamp.

    Using a cursor/loop is simple as you just log the date stamp of the last matching record and use that for the comparison with the next record (starting with a null so the first record always matches). Once a record is found that is at least 10 minutes apart from the last match the date stamp is updated to that records date.

    Hopefully that question is posed more to your liking...

    Thanks for your help.

  • Hey Rob,

    You have a row's dependency based upon the success/fail of another row's dependency. You're pretty much inside that 1% of necessary RBAR.

    You're looking at a manual loop, or perhaps working with the "Quirky Update" in a temp table, to find your results. You could speed up your iteration by doing an initial find of valids and then looping to find any additional valids, but depending on depth that could even be worse then a single pass method.

    If you're interested in Quirky Update or Serial Updating, check out the search for articles here for "Quirky Update" and "Jeff Moden". The article is pretty comprehensive on what you'll need.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Check out LutzM's answer in this thread: http://www.sqlservercentral.com/Forums/Topic1067681-338-1.aspx

    It's practically the same problem, but with different value differences. It uses the Quirky Update method to arrive at the solution.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • This thread has a similar requirement.

    I posted a "quirky update" over there you might be able to modify.

    Edit: Craig, you beat me to it. Again.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • You're right of course. My solution only works for the specific case where the time difference is consistent and even multiples of 5 minutes.

  • Thanks for the replies I will try and check the article when I can but I am assuming from your definition of Quirky Update you mean something like

    UPDATE @RACES

    SET @x = [something, maybe a CASE or sub select looking at other rows in the table using the row id]

    ,@y = [something else]

    ,@x = [maybe do something else to this]

    column = @x

    WHERE blah

    Where I could store the difference between the last race I want to measure from in a variable and then do checks along the way by updating various variables in the UPDATE before setting the value.

    Whilst I agree that this could be a viable solution and that RBAR is evil and that SELECTS are good as long as they don't just replace the CURSOR with a pseudo cursor:

    a) I already have a viable solution that uses a while loop OR cursor to achieve the desired outcome BUT

    b) The primary reason that I wanted (if possible) to not use a cursor and accomplish the task with one select statement is that..

    I already have a large number of stored procedures that contain many multitudes of SELECT statements from the table in question (for reporting and for analysis of bet placements) BUT they don't currently take into account the minimum time gap between races.

    Therefore they are returning bet sequences (For accumulator bets) that contain races that I cannot possibly place bets on. It usually takes Betfair around 7 minutes to settle a bet) therefore for an accumulator of 4 bets (4 races where the profit from one race gets staked on the next) I am actually only able to place 2 of the 4 when the races are only 5 minutes apart.

    So..

    I wanted to be able to quickly update all these existing SELECT statments with a regular expression OR find/replace statement to add in any new code (add extra WHERE statements etc) so that I didn't have to manually edit each one.

    I know this might sound like pure laziness on my behalf but

    a) this is not a mission critical OR customer paid for system - it is something I have created for my own personal use

    b) I have very little time to spend on it so what time I do have I need to spend on other parts of the system

    c) how the task is acomplished and why the solution is bad, evil and all things never to teach a newbie is not my main priority as long as it works

    and can be easily inserted into my existing code 😀

    I know that might be asking a lot but as I have said I have already found a working RBAR solution that does use a cursor/while loop but the only way I would be able to replace all my existing reports with this solution is by going through them one by one. I have one report that contains over 200+ SELECT statements that would need replacing and I would need to update a good dozen or so stored procs.

    So the primary reason for asking for a sub select solution (if possible) is so I can easily update existing code.

    The table in question already has a unique RaceID integer primary key, the racedatetime cannot be guaranteed to be unique (multiple courses may have a race at the same time) and I can add new columns into the table if that would help.

    Thanks for your help and advice.

  • Pseudo-Cursors are great for replacing cursors. It's recursion and triangular joins you usually want to avoid as a replacement.

    So far as a replacement so you don't have to redact a bunch of code, consider renaming the base table to something else, using your cursor or a Quirky Update to quickly create a new table from the renamed table, and using a synonym with the orginal name of the original table to point to the new table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The logic is easier to express in loop form, so if you need speed and elegance, I would transfer the logic to your favourite .NET language (I would use C#) and call it via a SQLCLR TVF (streaming table-valued function). If the task is truly huge, the CLR solution can take advantage of multiple processors using parallelism - something even the Quirky Update cannot do.

  • SQLkiwi (2/24/2011)


    The logic is easier to express in loop form, so if you need speed and elegance, I would transfer the logic to your favourite .NET language (I would use C#) and call it via a SQLCLR TVF (streaming table-valued function). If the task is truly huge, the CLR solution can take advantage of multiple processors using parallelism - something even the Quirky Update cannot do.

    Hmmm... how would parallelism work correctly on this when you cannot know the answers in any manner other than a serial manner to do the calculations correctly?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • @rob-2,

    Ok, considering the Quirky Update can do this on a million rows in somewhere between 2 and 7 seconds, I have to ask... are you all set or do you still need some help?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (2/24/2011)


    Hmmm... how would parallelism work correctly on this when you cannot know the answers in any manner other than a serial manner to do the calculations correctly?

    Naturally, there would need to be some way to break the task in multiple pieces. Perhaps the real work applied to multiple race courses, or multiple countries, or something like that - examples are frequently simplified for the forum. I'm not saying one shouldn't use the Quirky Update here (though some would be uncomfortable with it, even if the safety mechanism were employed) so I'm just presenting an alternative 🙂

Viewing 15 posts - 1 through 15 (of 28 total)

You must be logged in to reply to this topic. Login to reply