Running Totals Problem

  • I need to calculate the running total for each individual in a dataset similar to the one below.

    create table #timeSpent (PirateID varchar(6),StartEvent datetime, EndEvent datetime)

    insert into #timeSpent

    select1, '2014-04-25 09:00:00', '2014-04-25 09:15:00' union all

    select1, '2014-04-25 09:18:00', '2014-04-25 09:30:00' union all

    select1, '2014-04-26 09:17:00', '2014-04-26 09:41:00' union all

    select2, '2014-04-25 09:29:00', '2014-04-25 09:45:00' union all

    select2, '2014-04-27 10:59:00', '2014-04-27 11:05:00' union all

    select2, '2014-04-28 13:29:00', '2014-04-28 15:01:00' union all

    select2, '2014-04-25 09:59:00', '2014-04-25 10:45:00' union all

    select3, '2014-04-25 10:20:00', '2014-04-25 11:45:00'

    I need to work out the individual running totals for each ID. I'm using this code at the moment.

    create table #runTot (PirateID varchar(6),StartEvent datetime, EndEvent datetime, duration int, running int)

    insert into #runTot

    select

    ts.PirateID

    ,ts.StartEvent

    ,ts.EndEvent

    ,duration = DATEDIFF(second,ts.StartEvent,ts.EndEvent)

    ,null

    from #timeSpent ts

    update #runTot

    set running = duration

    declare @runTot int = 0

    update #runTot

    set @runTot = running = (@runTot + running)

    select * from #runTot

    drop table #timeSpent,#runTot

    As you can see it gives a running total but it's for all ID's where I need the running total for each one, like this

    create table #results

    (PirateID varchar(6),StartEvent datetime, EndEvent datetime, duration int, running int)

    insert into #results

    select '1','2014-4-25 09:00:00','2014-4-25 09:15:00','900','900' union all

    select '1','2014-4-25 09:18:00','2014-4-25 09:30:00','720','1620'union all

    select '1','2014-4-26 09:17:00','2014-4-26 09:41:00','1440','3060'union all

    select '2','2014-4-25 09:29:00','2014-4-25 09:45:00','960','960'union all

    select '2','2014-4-27 10:59:00','2014-4-27 11:05:00','360','1320'union all

    select '2','2014-4-28 13:29:00','2014-4-28 15:01:00','5520','6840'union all

    select '2','2014-4-25 09:59:00','2014-4-25 10:45:00','2760','9600'union all

    select '3','2014-4-25 10:20:00','2014-4-25 11:45:00','5100','5100'

    select * from #results

    The number of rows expected is in the hundreds and the results are being fed into an SSRS report where they're aggregated. I need the running total here to establish which event took the total duration over two hours so the end event can be displayed on the report.

    Does anybody have any suggestions? I'm thinking along the lines of a window function somewhere, maybe partitioning by the id but I'm not sure where.


    On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
    —Charles Babbage, Passages from the Life of a Philosopher

    How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537

  • Jeff Moden's paper here [/url]will show you how to correctly set up a Quirky Update, included iirc a partitioned update, which requires an extra variable. There are a few rules which you must follow. You can test the results using this simple query

    ;With Precalc AS(

    SELECT

    PirateID, StartEvent, EndEvent,

    duration = DATEDIFF(second, StartEvent, EndEvent),

    rn = ROW_NUMBER() OVER(PARTITION BY PirateID ORDER BY StartEvent) FROM #timeSpent

    )

    SELECT PirateID, StartEvent, EndEvent, x.running

    FROM Precalc p

    CROSS APPLY (

    SELECT running = SUM(duration)

    FROM Precalc i

    WHERE i.PirateID = p.PirateID AND i.rn <= p.rn

    ) x

    - which employs a triangular join. Read the article to understand what the performance implications are of this, and judge for yourself if your partition size relative to the rowcount of the table, makes it worthwhile to pursue a TJ over a QU.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Non Quirky Update solution (uses Tally table)

    ;WITH d (RowNo,PirateID,StartEvent,EndEvent,Duration) AS (

    SELECT ROW_NUMBER() OVER (PARTITION BY PirateID ORDER BY StartEvent),

    PirateID,StartEvent,EndEvent,DATEDIFF(second,StartEvent,EndEvent)

    FROM timeSpent

    ),

    r (RowNo,PirateID,N) AS (

    SELECT d.RowNo,d.PirateID,t.N

    FROM d

    JOIN dbo.Tally t ON t.N BETWEEN 1 AND d.RowNo

    ),

    s (RowNo,PirateID,DurationTotal) AS (

    SELECT r.RowNo,r.PirateID,SUM(d.Duration)

    FROM r

    JOIN d ON d.PirateID = r.PirateID AND d.RowNo = r.N

    GROUP BY r.PirateID,r.RowNo

    )

    SELECT d.PirateID,d.StartEvent,d.EndEvent,d.Duration,s.DurationTotal

    FROM d

    JOIN s ON s.PirateID = d.PirateID AND s.RowNo = d.RowNo

    ORDER BY d.PirateID,d.StartEvent

    Far away is close at hand in the images of elsewhere.
    Anon.

  • Thanks Chris, that worked a treat. I'd seen the Quirky Update article yesterday but got scared off by the length of the article and the proximity to four o'clock. It's done exactly what I needed it to do and was much more concise codewise than I originally thought.


    On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
    —Charles Babbage, Passages from the Life of a Philosopher

    How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537

  • You're welcome. Bear in mind that the triangular join method should only be applied when the partitions are many and small relative to the whole affected set, and you should make a note (better still a warning) of that in the code.

    Also, try David's method too. It's not quite as compact and easy to digest as the simple TJ solution but might perform better against your data.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (5/2/2014)


    ... might perform better against your data.

    Not sure about that, my query took 3 secs with 80K rows on my test server.

    Far away is close at hand in the images of elsewhere.
    Anon.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply