May 2, 2014 at 6:30 am
I need to calculate the running total for each individual in a dataset similar to the one below.
create table #timeSpent (PirateID varchar(6),StartEvent datetime, EndEvent datetime)
insert into #timeSpent
select1, '2014-04-25 09:00:00', '2014-04-25 09:15:00' union all
select1, '2014-04-25 09:18:00', '2014-04-25 09:30:00' union all
select1, '2014-04-26 09:17:00', '2014-04-26 09:41:00' union all
select2, '2014-04-25 09:29:00', '2014-04-25 09:45:00' union all
select2, '2014-04-27 10:59:00', '2014-04-27 11:05:00' union all
select2, '2014-04-28 13:29:00', '2014-04-28 15:01:00' union all
select2, '2014-04-25 09:59:00', '2014-04-25 10:45:00' union all
select3, '2014-04-25 10:20:00', '2014-04-25 11:45:00'
I need to work out the individual running totals for each ID. I'm using this code at the moment.
create table #runTot (PirateID varchar(6),StartEvent datetime, EndEvent datetime, duration int, running int)
insert into #runTot
select
ts.PirateID
,ts.StartEvent
,ts.EndEvent
,duration = DATEDIFF(second,ts.StartEvent,ts.EndEvent)
,null
from #timeSpent ts
update #runTot
set running = duration
declare @runTot int = 0
update #runTot
set @runTot = running = (@runTot + running)
select * from #runTot
drop table #timeSpent,#runTot
As you can see it gives a running total but it's for all ID's where I need the running total for each one, like this
create table #results
(PirateID varchar(6),StartEvent datetime, EndEvent datetime, duration int, running int)
insert into #results
select '1','2014-4-25 09:00:00','2014-4-25 09:15:00','900','900' union all
select '1','2014-4-25 09:18:00','2014-4-25 09:30:00','720','1620'union all
select '1','2014-4-26 09:17:00','2014-4-26 09:41:00','1440','3060'union all
select '2','2014-4-25 09:29:00','2014-4-25 09:45:00','960','960'union all
select '2','2014-4-27 10:59:00','2014-4-27 11:05:00','360','1320'union all
select '2','2014-4-28 13:29:00','2014-4-28 15:01:00','5520','6840'union all
select '2','2014-4-25 09:59:00','2014-4-25 10:45:00','2760','9600'union all
select '3','2014-4-25 10:20:00','2014-4-25 11:45:00','5100','5100'
select * from #results
The number of rows expected is in the hundreds and the results are being fed into an SSRS report where they're aggregated. I need the running total here to establish which event took the total duration over two hours so the end event can be displayed on the report.
Does anybody have any suggestions? I'm thinking along the lines of a window function somewhere, maybe partitioning by the id but I'm not sure where.
How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537
May 2, 2014 at 6:48 am
Jeff Moden's paper here [/url]will show you how to correctly set up a Quirky Update, included iirc a partitioned update, which requires an extra variable. There are a few rules which you must follow. You can test the results using this simple query
;With Precalc AS(
SELECT
PirateID, StartEvent, EndEvent,
duration = DATEDIFF(second, StartEvent, EndEvent),
rn = ROW_NUMBER() OVER(PARTITION BY PirateID ORDER BY StartEvent) FROM #timeSpent
)
SELECT PirateID, StartEvent, EndEvent, x.running
FROM Precalc p
CROSS APPLY (
SELECT running = SUM(duration)
FROM Precalc i
WHERE i.PirateID = p.PirateID AND i.rn <= p.rn
) x
- which employs a triangular join. Read the article to understand what the performance implications are of this, and judge for yourself if your partition size relative to the rowcount of the table, makes it worthwhile to pursue a TJ over a QU.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
May 2, 2014 at 8:09 am
Non Quirky Update solution (uses Tally table)
;WITH d (RowNo,PirateID,StartEvent,EndEvent,Duration) AS (
SELECT ROW_NUMBER() OVER (PARTITION BY PirateID ORDER BY StartEvent),
PirateID,StartEvent,EndEvent,DATEDIFF(second,StartEvent,EndEvent)
FROM timeSpent
),
r (RowNo,PirateID,N) AS (
SELECT d.RowNo,d.PirateID,t.N
FROM d
JOIN dbo.Tally t ON t.N BETWEEN 1 AND d.RowNo
),
s (RowNo,PirateID,DurationTotal) AS (
SELECT r.RowNo,r.PirateID,SUM(d.Duration)
FROM r
JOIN d ON d.PirateID = r.PirateID AND d.RowNo = r.N
GROUP BY r.PirateID,r.RowNo
)
SELECT d.PirateID,d.StartEvent,d.EndEvent,d.Duration,s.DurationTotal
FROM d
JOIN s ON s.PirateID = d.PirateID AND s.RowNo = d.RowNo
ORDER BY d.PirateID,d.StartEvent
Far away is close at hand in the images of elsewhere.
Anon.
May 2, 2014 at 8:29 am
Thanks Chris, that worked a treat. I'd seen the Quirky Update article yesterday but got scared off by the length of the article and the proximity to four o'clock. It's done exactly what I needed it to do and was much more concise codewise than I originally thought.
How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537
May 2, 2014 at 8:43 am
You're welcome. Bear in mind that the triangular join method should only be applied when the partitions are many and small relative to the whole affected set, and you should make a note (better still a warning) of that in the code.
Also, try David's method too. It's not quite as compact and easy to digest as the simple TJ solution but might perform better against your data.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
May 2, 2014 at 10:09 am
ChrisM@Work (5/2/2014)
... might perform better against your data.
Not sure about that, my query took 3 secs with 80K rows on my test server.
Far away is close at hand in the images of elsewhere.
Anon.
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply