"SQL Spackle" is a collection of short articles written based on multiple requests for similar code. These short articles are NOT meant to be complete solutions. Rather, they are meant to "fill in the cracks."
--Phil McCracken
The calculation of elapsed times in a runner’s race can be an interesting problem, so let’s explore that today. Given a set of start and stop times for runners in a race, we can construct a table containing some sample data as follows.
CREATE TABLE #Runners ( TimeID INT IDENTITY PRIMARY KEY ,Runnerid INT ,Timemark TIME ); INSERT #Runners (Runnerid, Timemark) VALUES (1, '12:00'), (2, '12:00'), (3, '12:00'), (4, '12:00') ,(1, '12:15:01'), (2, '12:17:42'), (3, '12:16:19'), (4, '12:15:34');
We’ve included a IDENTITY column because we’ll use it in some later examples.
Since each runner has a single start and a stop time (i.e., each runner has precisely two time marks), the SQL to calculate the elapsed time for each is quite simple.
SELECT RunnerID ,ElapsedSecs=DATEDIFF(second, MIN(Timemark), MAX(TimeMark)) ,ElapsedTime=CAST(DATEADD(second, DATEDIFF(second, MIN(Timemark), MAX(TimeMark)), 0) AS TIME) FROM #Runners GROUP BY RunnerID;
By grouping on each runner’s ID, we can use the MAX and MIN values in a DATEDIFF calculation to calculate how many minutes has elapsed for each runner’s time (ElapsedSecs). The SQL TIME data type is meant to store a time of day, however if we are certain our elapsed times are less than 24 hours, it can also be used to store a time period. This is what is shown in ElapsedTime (as HH:MM:SS), which is calculated as by adding the number of elapsed seconds to a base date (0) using DATEADD. The results for this query are shown below.
RunnerID ElapsedSecs ElapsedTime 1 901 00:15:01.0000000 2 1062 00:17:42.0000000 3 979 00:16:19.0000000 4 934 00:15:34.0000000
Now we’re going to make this problem a bit more interesting by introducing some data anomalies; because let’s face it, data stored in SQL tables is often not as clean as we’d like it to be.
We’ll start by adding an open ended start time for a couple of runners, without also including an associated end time. Runner number 5 is new, but runner number 1 already has 2 entries so perhaps this time represents the time of a second heat. Now is when we’ll use the IDENTITY column to help us.
INSERT #Runners (Runnerid, Timemark) VALUES (1, '13:00'), (5, '12:00');
Let’s add a ROW_NUMBER() to our data based on the IDENTITY column within a Common Table Expression (CTE) and see if we can play around with it a bit to help us in eliminating the newly inserted values.
WITH RaceTimes AS ( SELECT RunnerID, Timemark ,rn=ROW_NUMBER() OVER (PARTITION BY Runnerid ORDER BY TimeID) FROM #Runners) SELECT RunnerID, Timemark, rn, (rn+1)/2 FROM RaceTimes;
The results are:
RunnerID Timemark rn (No column name) 1 12:00:00.0000000 1 1 1 12:15:01.0000000 2 1 1 13:00:00.0000000 3 2 2 12:00:00.0000000 1 1 2 12:17:42.0000000 2 1 3 12:00:00.0000000 1 1 3 12:16:19.0000000 2 1 4 12:00:00.0000000 1 1 4 12:15:34.0000000 2 1 5 12:00:00.0000000 1 1
By using the column (rn+1)/2 we have created a calculated grouping column that represents each heat of each race. Incorporating this into our query and only selecting completed races (HAVING COUNT(*) = 3) we have solved the problem of only showing completed races.
WITH RaceTimes AS ( SELECT RunnerID, Timemark ,rn=ROW_NUMBER() OVER (PARTITION BY Runnerid ORDER BY TimeID) FROM #Runners) SELECT RunnerID ,ElapsedSecs=DATEDIFF(second, MIN(Timemark), MAX(TimeMark)) ,ElapsedTime=CAST(DATEADD(second, DATEDIFF(second, MIN(Timemark), MAX(TimeMark)), 0) AS TIME) FROM RaceTimes GROUP BY RunnerID, (rn+1)/2 HAVING COUNT(*) = 2;
The results of this query should appear exactly the same as for the first.
Let’s look at another possible data anomaly. Suppose a race ends at a time that appears to be less than its start time, meaning that the race actually spanned the day past midnight. For example:
INSERT #Runners (Runnerid, Timemark) VALUES (1, '02:24'), (5, '00:15');
In this case, the second race for runner number 1 starts at 13:00 and ending at 02:24, while the first race for runner number 5 starts at 12:00 and ends just past midnight at 00:15. We can no longer use MIN and MAX to distinguish the start and end times. Instead we must use the record ordering to determine this. Note that it would normally be best to handle each race with a different race ID; however we can still solve this problem with our current data structure using something similar to a cross tab query.
WITH RaceTimes AS ( SELECT RunnerID, Timemark ,rn=ROW_NUMBER() OVER (PARTITION BY Runnerid ORDER BY TimeID) FROM #Runners) SELECT RunnerID, ElapsedSecs ,ElapsedTime=CAST(DATEADD(second, ElapsedSecs, 0) AS TIME) FROM ( SELECT RunnerID ,ElapsedSecs=CASE WHEN MAX(StartTime) < MAX(EndTime) THEN DATEDIFF(second, MAX(StartTime), MAX(EndTime)) ELSE 86400-DATEDIFF(second, MAX(EndTime), MAX(StartTime)) END FROM ( SELECT RunnerID, rn ,StartTime = CASE rn%2 WHEN 1 THEN Timemark END -- Odd rn = start time ,EndTime = CASE rn%2 WHEN 0 THEN TimeMark END -- Even rn = end time FROM RaceTimes ) a GROUP BY RunnerID, (rn+1)/2 HAVING COUNT(*) = 2 ) b;
The results for this query are:
RunnerID ElapsedSecs ElapsedTime 1 901 00:15:01.0000000 2 1062 00:17:42.0000000 3 979 00:16:19.0000000 4 934 00:15:34.0000000 5 44100 12:15:00.0000000 1 48240 13:24:00.0000000
To somewhat simplify the steps to produce our results, we have abstracted the 3 solution steps into separate derived tables:
- The innermost derived table (a) segregates start from end times based on our row number, where odd row numbers represent start times and even row numbers are end times.
- In the middle-most derived table (b), we have calculated the elapsed time. This part of the query, basically the same grouping query we constructed prior, also collapses start/end times using MAX to eliminate the NULL entries resulting from the innermost cross tab. Cases where the StartTime is greater than the end time are inverted to the other portion of the 24 hour day by subtracting from the number of seconds in a day (86400).
- The outermost query simply displays the calculated elapsed time and converts it to the HH:MM:DD format.
The very simple table structure proposed for this problem offers a number of challenging possibilities for anomalous data to be included, but one can hope that most of these will be eliminated by a robust front end application.
Regardless, we were able to handle a couple of simple anomalies, and calculate elapsed time between start/end time entries, within the SQL.
We hope that you found this bit of SQL Spackle to fill a gap. Thanks for listening once again, my valued readers.
Dwain Camps
SQL Enthusiast
Skype: dwaincamps