Return records at least 10 minutes apart from last valid record without CURSOR

Question

Post reply

Return records at least 10 minutes apart from last valid record without CURSOR

Rob Reid-246754

SSCarpal Tunnel

Points: 4183
More actions
February 20, 2011 at 6:48 pm

#99185

I know this is probably pretty easy but it's late and my head is frazzled and I have a desire to go to bed. So I am hoping that by the time I get into work tomorrow someone who is wide awake from the USA has read this and given me an answer 😀
I can already do this with a cursor OR while loop but I want to do it with one SELECT if possible. The database is currently SQL 2005 so I need a solution that works with that system.
The Problem
I have a table of horse races for a given day. I am trying to return the first X or last X records in the table that are at least 10 minutes apart (could be more than 10 minutes but definitely not less). There are lots of races that are only 5 mins apart from the next one which means I cannot just do a DATEDIFF between the current record and the last one as it might miss out records.
You can create some test data with this (I know its a SQL 2008 insert but the solution needs to be SQL 2005)
declare @races table(raceno int identity(1,1),racedatetime datetime,diff int)
insert into @races
(racedatetime)
values
( '2011-02-20 13:50:00'),('2011-02-20 13:55:00'),('2011-02-20 14:00:00'),('2011-02-20 14:10:00'),
('2011-02-20 14:20:00'),('2011-02-20 14:25:00')
so from that data set I need the following rows returned when doing a TOP 4
RaceNo Racedatetime
12011-02-20 13:50:00
32011-02-20 14:00:00
42011-02-20 14:10:00
52011-02-20 14:20:00
Doing something like this (which is as far as I have got at the moment)
SELECTTOP 4 *
FROM(
SELECTTOP 6
raceno,Racedatetime,
DATEDIFF(mi,
(SELECT MAX(Racedatetime)
FROM@RACES
WHERERacedatetime < R.Racedatetime), Racedatetime) as Diff
FROM@RACES as R
ORDER BY RaceDateTime
) as t
WHEREDiff IS NULL OR Diff >= 10
returns this
RaceNo Racedatetime Diff
12011-02-20 13:50:00.000NULL
42011-02-20 14:10:00.00010
52011-02-20 14:20:00.00010
Which obviously misses out the race no 3 at 14:00
I am sure I can accomplish it with some more sub selects and some ROW_NUMBER() type function but I don't want to be replacing a working cursor that just checks the DATEDIFF between the current record and the last accepted record with something convoluted so I am hoping for an elegant and simple solution if possible and I am sure there is one but I just cannot get my head in gear at the moment :w00t:
Thanks for any help received in advance.

Viewing 15 posts - 1 through 15 (of 29 total)

You must be logged in to reply to this topic. Login to reply

bobhines Old Hand Points: 366 More actions · Answer 1

Try something like this

;With RaceRows AS

(

SELECT

ROW_NUMBER() OVER (ORDER BY racedatetime) AS row,

racedatetime

FROM

@races

),

RaceIntervals AS

(

SELECT

rr1.row AS row1,

rr2.row AS row2,

rr1.racedatetime AS racedatetime1,

rr2.racedatetime AS racedatetime2,

DATEDIFF(ss, rr1.racedatetime, rr2.racedatetime) AS raceinterval

FROM

RaceRows rr1

INNER JOIN

RaceRows rr2

ON rr1.row = rr2.row - 1

)

SELECT

*

FROM

RaceIntervals

WHERE

raceinterval >= 600

I didn't actually run the query above so I might have a slight syntax error but you should at least get the concept of the query. The semicolon at the beginning is important and must be used if any other code precedes the "WITH" statement.

bobhines Old Hand Points: 366 More actions · Answer 2

I re-read your original question. The query I provided doesn't directly address your issue. I apologize. I should have taken more time to interpret you question. I'm experimenting and hope to get back to you soon with a real solution...

bobhines Old Hand Points: 366 More actions · Answer 3

Ok, let's give it another try...

declare @races table(raceno int identity(1,1),racedatetime datetime)

insert into @races

(racedatetime)

SELECT '2011-02-20 13:50:00'

UNION ALL

SELECT '2011-02-20 13:55:00'

UNION ALL

SELECT '2011-02-20 14:00:00'

UNION ALL

SELECT '2011-02-20 14:10:00'

UNION ALL

SELECT '2011-02-20 14:20:00'

UNION ALL

SELECT '2011-02-20 14:25:00'

SELECT

*

FROM

@races

ORDER BY

racedatetime

;WITH RaceRows AS

(

SELECT

ROW_NUMBER() OVER (ORDER BY racedatetime) AS row,

racedatetime

FROM

@races

),

RaceIntervals AS

(

SELECT

rr1.row

FROM

RaceRows rr1

INNER JOIN

RaceRows rr2

ON rr1.row = rr2.row - 1

WHERE

DATEDIFF(ss, rr1.racedatetime, rr2.racedatetime) < 600 AND rr1.row <> 1

)

SELECT

*

FROM

RaceRows rr

WHERE

NOT EXISTS (SELECT * FROM RaceIntervals ri WHERE rr.row = ri.row)

The basic idea is to exclude all rows from @races that are not greater than 600 seconds after the preceding race. Sometimes, it just depends on how the question is asked....

If your @races table is large, it may takes some good indexing to make this solution efficient.

If you are sure that there are no gaps in raceno and that racedatetimes are inserted in chronological order, you don't have to use the ROW_NUMBER() function.

Rob Reid-246754 SSCarpal Tunnel Points: 4183 More actions · Answer 4

Thanks for replying but that second solution still doesn't work.

If you add a few more records into the race table e.g

insert into @races

(racedatetime)

SELECT '2011-02-20 13:50:00'

UNION ALL

SELECT '2011-02-20 13:55:00'

UNION ALL

SELECT '2011-02-20 14:00:00'

UNION ALL

SELECT '2011-02-20 14:10:00'

UNION ALL

SELECT '2011-02-20 14:20:00'

UNION ALL

SELECT '2011-02-20 14:25:00'

UNION ALL

SELECT '2011-02-20 14:26:00'

UNION ALL

SELECT '2011-02-20 14:27:00'

UNION ALL

SELECT '2011-02-20 14:30:00'

UNION ALL

SELECT '2011-02-20 14:33:00'

UNION ALL

SELECT '2011-02-20 14:36:00'

UNION ALL

SELECT '2011-02-20 14:45:00'

UNION ALL

SELECT '2011-02-20 14:46:00'

UNION ALL

SELECT '2011-02-20 14:56:01'

Then you get this back

rowracedatetime

12011-02-20 13:50:00.000

32011-02-20 14:00:00.000

42011-02-20 14:10:00.000

132011-02-20 14:46:00.000

142011-02-20 14:56:01.000

when you really should get this

racenoracedatetime

12011-02-20 13:50:00.000

32011-02-20 14:00:00.000

42011-02-20 14:10:00.000

52011-02-20 14:20:00.000

92011-02-20 14:30:00.000

122011-02-20 14:45:00.000

142011-02-20 14:56:01.000

So what I require is a non cursor solution that will return records that are at least 10 minutes apart from the last matching record. The gap between records may be one or it may be twenty other records but as soon as one record matches the next subsequent matching record must take its time difference from that date stamp.

Using a cursor/loop is simple as you just log the date stamp of the last matching record and use that for the comparison with the next record (starting with a null so the first record always matches). Once a record is found that is at least 10 minutes apart from the last match the date stamp is updated to that records date.

Hopefully that question is posed more to your liking...

Thanks for your help.

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 5

Hey Rob,

You have a row's dependency based upon the success/fail of another row's dependency. You're pretty much inside that 1% of necessary RBAR.

You're looking at a manual loop, or perhaps working with the "Quirky Update" in a temp table, to find your results. You could speed up your iteration by doing an initial find of valids and then looping to find any additional valids, but depending on depth that could even be worse then a single pass method.

If you're interested in Quirky Update or Serial Updating, check out the search for articles here for "Quirky Update" and "Jeff Moden". The article is pretty comprehensive on what you'll need.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 6

Check out LutzM's answer in this thread: http://www.sqlservercentral.com/Forums/Topic1067681-338-1.aspx

It's practically the same problem, but with different value differences. It uses the Quirky Update method to arrive at the solution.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

LutzM SSC Guru Points: 107049 More actions · Answer 7

This thread has a similar requirement.

I posted a "quirky update" over there you might be able to modify.

Edit: Craig, you beat me to it. Again.

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

bobhines Old Hand Points: 366 More actions · Answer 8

You're right of course. My solution only works for the specific case where the time difference is consistent and even multiples of 5 minutes.

Rob Reid-246754 SSCarpal Tunnel Points: 4183 More actions · Answer 9

Thanks for the replies I will try and check the article when I can but I am assuming from your definition of Quirky Update you mean something like

UPDATE @RACES

SET @x = [something, maybe a CASE or sub select looking at other rows in the table using the row id]

,@y = [something else]

,@x = [maybe do something else to this]

column = @x

WHERE blah

Where I could store the difference between the last race I want to measure from in a variable and then do checks along the way by updating various variables in the UPDATE before setting the value.

Whilst I agree that this could be a viable solution and that RBAR is evil and that SELECTS are good as long as they don't just replace the CURSOR with a pseudo cursor:

a) I already have a viable solution that uses a while loop OR cursor to achieve the desired outcome BUT

b) The primary reason that I wanted (if possible) to not use a cursor and accomplish the task with one select statement is that..

I already have a large number of stored procedures that contain many multitudes of SELECT statements from the table in question (for reporting and for analysis of bet placements) BUT they don't currently take into account the minimum time gap between races.

Therefore they are returning bet sequences (For accumulator bets) that contain races that I cannot possibly place bets on. It usually takes Betfair around 7 minutes to settle a bet) therefore for an accumulator of 4 bets (4 races where the profit from one race gets staked on the next) I am actually only able to place 2 of the 4 when the races are only 5 minutes apart.

So..

I wanted to be able to quickly update all these existing SELECT statments with a regular expression OR find/replace statement to add in any new code (add extra WHERE statements etc) so that I didn't have to manually edit each one.

I know this might sound like pure laziness on my behalf but

a) this is not a mission critical OR customer paid for system - it is something I have created for my own personal use

b) I have very little time to spend on it so what time I do have I need to spend on other parts of the system

c) how the task is acomplished and why the solution is bad, evil and all things never to teach a newbie is not my main priority as long as it works

and can be easily inserted into my existing code 😀

I know that might be asking a lot but as I have said I have already found a working RBAR solution that does use a cursor/while loop but the only way I would be able to replace all my existing reports with this solution is by going through them one by one. I have one report that contains over 200+ SELECT statements that would need replacing and I would need to update a good dozen or so stored procs.

So the primary reason for asking for a sub select solution (if possible) is so I can easily update existing code.

The table in question already has a unique RaceID integer primary key, the racedatetime cannot be guaranteed to be unique (multiple courses may have a race at the same time) and I can add new columns into the table if that would help.

Thanks for your help and advice.

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 10

Pseudo-Cursors are great for replacing cursors. It's recursion and triangular joins you usually want to avoid as a replacement.

So far as a replacement so you don't have to redact a bunch of code, consider renaming the base table to something else, using your cursor or a Quirky Update to quickly create a new table from the renamed table, and using a synonym with the orginal name of the original table to point to the new table.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paul White SSC Guru Points: 150468 More actions · Answer 11

The logic is easier to express in loop form, so if you need speed and elegance, I would transfer the logic to your favourite .NET language (I would use C#) and call it via a SQLCLR TVF (streaming table-valued function). If the task is truly huge, the CLR solution can take advantage of multiple processors using parallelism - something even the Quirky Update cannot do.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 12

SQLkiwi (2/24/2011)
The logic is easier to express in loop form, so if you need speed and elegance, I would transfer the logic to your favourite .NET language (I would use C#) and call it via a SQLCLR TVF (streaming table-valued function). If the task is truly huge, the CLR solution can take advantage of multiple processors using parallelism - something even the Quirky Update cannot do.

Hmmm... how would parallelism work correctly on this when you cannot know the answers in any manner other than a serial manner to do the calculations correctly?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 13

@rob-2,

Ok, considering the Quirky Update can do this on a million rows in somewhere between 2 and 7 seconds, I have to ask... are you all set or do you still need some help?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paul White SSC Guru Points: 150468 More actions · Answer 14

Jeff Moden (2/24/2011)
Hmmm... how would parallelism work correctly on this when you cannot know the answers in any manner other than a serial manner to do the calculations correctly?

Naturally, there would need to be some way to break the task in multiple pieces. Perhaps the real work applied to multiple race courses, or multiple countries, or something like that - examples are frequently simplified for the forum. I'm not saying one shouldn't use the Quirky Update here (though some would be uncomfortable with it, even if the safety mechanism were employed) so I'm just presenting an alternative 🙂

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi