February 20, 2011 at 6:48 pm
I know this is probably pretty easy but it's late and my head is frazzled and I have a desire to go to bed. So I am hoping that by the time I get into work tomorrow someone who is wide awake from the USA has read this and given me an answer 😀
I can already do this with a cursor OR while loop but I want to do it with one SELECT if possible. The database is currently SQL 2005 so I need a solution that works with that system.
The Problem
I have a table of horse races for a given day. I am trying to return the first X or last X records in the table that are at least 10 minutes apart (could be more than 10 minutes but definitely not less). There are lots of races that are only 5 mins apart from the next one which means I cannot just do a DATEDIFF between the current record and the last one as it might miss out records.
You can create some test data with this (I know its a SQL 2008 insert but the solution needs to be SQL 2005)
declare @races table(raceno int identity(1,1),racedatetime datetime,diff int)
insert into @races
(racedatetime)
values
( '2011-02-20 13:50:00'),('2011-02-20 13:55:00'),('2011-02-20 14:00:00'),('2011-02-20 14:10:00'),
('2011-02-20 14:20:00'),('2011-02-20 14:25:00')
so from that data set I need the following rows returned when doing a TOP 4
RaceNo Racedatetime
12011-02-20 13:50:00
32011-02-20 14:00:00
42011-02-20 14:10:00
52011-02-20 14:20:00
Doing something like this (which is as far as I have got at the moment)
SELECTTOP 4 *
FROM(
SELECTTOP 6
raceno,Racedatetime,
DATEDIFF(mi,
(SELECT MAX(Racedatetime)
FROM@RACES
WHERERacedatetime < R.Racedatetime), Racedatetime) as Diff
FROM@RACES as R
ORDER BY RaceDateTime
) as t
WHEREDiff IS NULL OR Diff >= 10
returns this
RaceNo Racedatetime Diff
12011-02-20 13:50:00.000NULL
42011-02-20 14:10:00.00010
52011-02-20 14:20:00.00010
Which obviously misses out the race no 3 at 14:00
I am sure I can accomplish it with some more sub selects and some ROW_NUMBER() type function but I don't want to be replacing a working cursor that just checks the DATEDIFF between the current record and the last accepted record with something convoluted so I am hoping for an elegant and simple solution if possible and I am sure there is one but I just cannot get my head in gear at the moment :w00t:
Thanks for any help received in advance.
February 21, 2011 at 7:29 am
Try something like this
;With RaceRows AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY racedatetime) AS row,
racedatetime
FROM
@races
),
RaceIntervals AS
(
SELECT
rr1.row AS row1,
rr2.row AS row2,
rr1.racedatetime AS racedatetime1,
rr2.racedatetime AS racedatetime2,
DATEDIFF(ss, rr1.racedatetime, rr2.racedatetime) AS raceinterval
FROM
RaceRows rr1
INNER JOIN
RaceRows rr2
ON rr1.row = rr2.row - 1
)
SELECT
*
FROM
RaceIntervals
WHERE
raceinterval >= 600
I didn't actually run the query above so I might have a slight syntax error but you should at least get the concept of the query. The semicolon at the beginning is important and must be used if any other code precedes the "WITH" statement.
February 21, 2011 at 7:58 am
I re-read your original question. The query I provided doesn't directly address your issue. I apologize. I should have taken more time to interpret you question. I'm experimenting and hope to get back to you soon with a real solution...
February 21, 2011 at 10:04 am
Ok, let's give it another try...
declare @races table(raceno int identity(1,1),racedatetime datetime)
insert into @races
(racedatetime)
SELECT '2011-02-20 13:50:00'
UNION ALL
SELECT '2011-02-20 13:55:00'
UNION ALL
SELECT '2011-02-20 14:00:00'
UNION ALL
SELECT '2011-02-20 14:10:00'
UNION ALL
SELECT '2011-02-20 14:20:00'
UNION ALL
SELECT '2011-02-20 14:25:00'
SELECT
*
FROM
@races
ORDER BY
racedatetime
;WITH RaceRows AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY racedatetime) AS row,
racedatetime
FROM
@races
),
RaceIntervals AS
(
SELECT
rr1.row
FROM
RaceRows rr1
INNER JOIN
RaceRows rr2
ON rr1.row = rr2.row - 1
WHERE
DATEDIFF(ss, rr1.racedatetime, rr2.racedatetime) < 600 AND rr1.row <> 1
)
SELECT
*
FROM
RaceRows rr
WHERE
NOT EXISTS (SELECT * FROM RaceIntervals ri WHERE rr.row = ri.row)
The basic idea is to exclude all rows from @races that are not greater than 600 seconds after the preceding race. Sometimes, it just depends on how the question is asked....
If your @races table is large, it may takes some good indexing to make this solution efficient.
If you are sure that there are no gaps in raceno and that racedatetimes are inserted in chronological order, you don't have to use the ROW_NUMBER() function.
February 22, 2011 at 3:01 pm
Thanks for replying but that second solution still doesn't work.
If you add a few more records into the race table e.g
insert into @races
(racedatetime)
SELECT '2011-02-20 13:50:00'
UNION ALL
SELECT '2011-02-20 13:55:00'
UNION ALL
SELECT '2011-02-20 14:00:00'
UNION ALL
SELECT '2011-02-20 14:10:00'
UNION ALL
SELECT '2011-02-20 14:20:00'
UNION ALL
SELECT '2011-02-20 14:25:00'
UNION ALL
SELECT '2011-02-20 14:26:00'
UNION ALL
SELECT '2011-02-20 14:27:00'
UNION ALL
SELECT '2011-02-20 14:30:00'
UNION ALL
SELECT '2011-02-20 14:33:00'
UNION ALL
SELECT '2011-02-20 14:36:00'
UNION ALL
SELECT '2011-02-20 14:45:00'
UNION ALL
SELECT '2011-02-20 14:46:00'
UNION ALL
SELECT '2011-02-20 14:56:01'
Then you get this back
rowracedatetime
12011-02-20 13:50:00.000
32011-02-20 14:00:00.000
42011-02-20 14:10:00.000
132011-02-20 14:46:00.000
142011-02-20 14:56:01.000
when you really should get this
racenoracedatetime
12011-02-20 13:50:00.000
32011-02-20 14:00:00.000
42011-02-20 14:10:00.000
52011-02-20 14:20:00.000
92011-02-20 14:30:00.000
122011-02-20 14:45:00.000
142011-02-20 14:56:01.000
So what I require is a non cursor solution that will return records that are at least 10 minutes apart from the last matching record. The gap between records may be one or it may be twenty other records but as soon as one record matches the next subsequent matching record must take its time difference from that date stamp.
Using a cursor/loop is simple as you just log the date stamp of the last matching record and use that for the comparison with the next record (starting with a null so the first record always matches). Once a record is found that is at least 10 minutes apart from the last match the date stamp is updated to that records date.
Hopefully that question is posed more to your liking...
Thanks for your help.
February 22, 2011 at 3:14 pm
Hey Rob,
You have a row's dependency based upon the success/fail of another row's dependency. You're pretty much inside that 1% of necessary RBAR.
You're looking at a manual loop, or perhaps working with the "Quirky Update" in a temp table, to find your results. You could speed up your iteration by doing an initial find of valids and then looping to find any additional valids, but depending on depth that could even be worse then a single pass method.
If you're interested in Quirky Update or Serial Updating, check out the search for articles here for "Quirky Update" and "Jeff Moden". The article is pretty comprehensive on what you'll need.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
February 22, 2011 at 3:31 pm
Check out LutzM's answer in this thread: http://www.sqlservercentral.com/Forums/Topic1067681-338-1.aspx
It's practically the same problem, but with different value differences. It uses the Quirky Update method to arrive at the solution.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
February 22, 2011 at 3:36 pm
This thread has a similar requirement.
I posted a "quirky update" over there you might be able to modify.
Edit: Craig, you beat me to it. Again.
February 22, 2011 at 7:43 pm
You're right of course. My solution only works for the specific case where the time difference is consistent and even multiples of 5 minutes.
February 24, 2011 at 4:29 am
Thanks for the replies I will try and check the article when I can but I am assuming from your definition of Quirky Update you mean something like
UPDATE @RACES
SET @x = [something, maybe a CASE or sub select looking at other rows in the table using the row id]
,@y = [something else]
,@x = [maybe do something else to this]
column = @x
WHERE blah
Where I could store the difference between the last race I want to measure from in a variable and then do checks along the way by updating various variables in the UPDATE before setting the value.
Whilst I agree that this could be a viable solution and that RBAR is evil and that SELECTS are good as long as they don't just replace the CURSOR with a pseudo cursor:
a) I already have a viable solution that uses a while loop OR cursor to achieve the desired outcome BUT
b) The primary reason that I wanted (if possible) to not use a cursor and accomplish the task with one select statement is that..
I already have a large number of stored procedures that contain many multitudes of SELECT statements from the table in question (for reporting and for analysis of bet placements) BUT they don't currently take into account the minimum time gap between races.
Therefore they are returning bet sequences (For accumulator bets) that contain races that I cannot possibly place bets on. It usually takes Betfair around 7 minutes to settle a bet) therefore for an accumulator of 4 bets (4 races where the profit from one race gets staked on the next) I am actually only able to place 2 of the 4 when the races are only 5 minutes apart.
So..
I wanted to be able to quickly update all these existing SELECT statments with a regular expression OR find/replace statement to add in any new code (add extra WHERE statements etc) so that I didn't have to manually edit each one.
I know this might sound like pure laziness on my behalf but
a) this is not a mission critical OR customer paid for system - it is something I have created for my own personal use
b) I have very little time to spend on it so what time I do have I need to spend on other parts of the system
c) how the task is acomplished and why the solution is bad, evil and all things never to teach a newbie is not my main priority as long as it works
and can be easily inserted into my existing code 😀
I know that might be asking a lot but as I have said I have already found a working RBAR solution that does use a cursor/while loop but the only way I would be able to replace all my existing reports with this solution is by going through them one by one. I have one report that contains over 200+ SELECT statements that would need replacing and I would need to update a good dozen or so stored procs.
So the primary reason for asking for a sub select solution (if possible) is so I can easily update existing code.
The table in question already has a unique RaceID integer primary key, the racedatetime cannot be guaranteed to be unique (multiple courses may have a race at the same time) and I can add new columns into the table if that would help.
Thanks for your help and advice.
February 24, 2011 at 7:11 am
Pseudo-Cursors are great for replacing cursors. It's recursion and triangular joins you usually want to avoid as a replacement.
So far as a replacement so you don't have to redact a bunch of code, consider renaming the base table to something else, using your cursor or a Quirky Update to quickly create a new table from the renamed table, and using a synonym with the orginal name of the original table to point to the new table.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 24, 2011 at 7:24 am
The logic is easier to express in loop form, so if you need speed and elegance, I would transfer the logic to your favourite .NET language (I would use C#) and call it via a SQLCLR TVF (streaming table-valued function). If the task is truly huge, the CLR solution can take advantage of multiple processors using parallelism - something even the Quirky Update cannot do.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
February 24, 2011 at 7:15 pm
SQLkiwi (2/24/2011)
The logic is easier to express in loop form, so if you need speed and elegance, I would transfer the logic to your favourite .NET language (I would use C#) and call it via a SQLCLR TVF (streaming table-valued function). If the task is truly huge, the CLR solution can take advantage of multiple processors using parallelism - something even the Quirky Update cannot do.
Hmmm... how would parallelism work correctly on this when you cannot know the answers in any manner other than a serial manner to do the calculations correctly?
--Jeff Moden
Change is inevitable... Change for the better is not.
February 24, 2011 at 7:18 pm
Ok, considering the Quirky Update can do this on a million rows in somewhere between 2 and 7 seconds, I have to ask... are you all set or do you still need some help?
--Jeff Moden
Change is inevitable... Change for the better is not.
February 24, 2011 at 9:30 pm
Jeff Moden (2/24/2011)
Hmmm... how would parallelism work correctly on this when you cannot know the answers in any manner other than a serial manner to do the calculations correctly?
Naturally, there would need to be some way to break the task in multiple pieces. Perhaps the real work applied to multiple race courses, or multiple countries, or something like that - examples are frequently simplified for the forum. I'm not saying one shouldn't use the Quirky Update here (though some would be uncomfortable with it, even if the safety mechanism were employed) so I'm just presenting an alternative 🙂
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
Viewing 15 posts - 1 through 15 (of 28 total)
You must be logged in to reply to this topic. Login to reply