Subtracting SQL Transaction Times Row to Row

  • I have a table called Transactions, which has 4 fields (for the sake of this example):

    [TimePunchID] int PK auto#

    [DateAdded] datetime

    [UserID] varchar(100)

    [Activity] varchar(100)

    Example output:

    IDdateaddeduserid activity DifferenceInMinutes

    12012-04-20 08:15:00 AM 200Break

    22012-04-20 08:30:00 AM 200Project A

    32012-04-20 10:30:00 AM 200Project B

    42012-04-20 08:15:00 AM 240Break

    52012-04-20 09:30:00 AM 240Project C

    62012-04-20 11:30:00 AM 240Project A

    I want to find an efficient way to programmatically subtract the date difference between the rows so that I can group and display the hours that each person spent on each activity. The first entry for each user won't have a difference as there would be nothing from which to subtract; the following entries would. The idea is to populate the DifferenceInMinutes field with the minute difference between the dates.

    Thanks so much!!

  • what is the end time you are using to figure out the difference in min. and can you post the DDL and Sample data in a more consumer friendly format?? please see my signature for help on how we like to see it here.


    For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

    For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

    Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
    Jeff Moden's Cross tab and Pivots Part 1[/url]
    Jeff Moden's Cross tab and Pivots Part 2[/url]

  • Thanks for the reply. Here is a quick example of some test data.

    --Create a table to hold the data.

    create table #tbl_Data (TimePunchID int identity(1,1), dateadded datetime, userid varchar(100), activity varchar(100))

    --Populate some data in the table.

    insert into #tbl_Data (dateadded, userid, activity)

    SELECT '2012-04-20 08:15:00 AM','12345','Project A' UNION ALL

    SELECT '2012-04-20 08:30:00 AM','12345','Project D' UNION ALL

    SELECT '2012-04-20 09:45:00 AM','12345','Break' UNION ALL

    SELECT '2012-04-20 11:45:00 AM','12345','Project A' UNION ALL

    SELECT '2012-04-20 11:50:00 AM','12345','Break' UNION ALL

    SELECT '2012-04-20 08:13:00 AM','23456','Project B' UNION ALL

    SELECT '2012-04-20 08:28:00 AM','23456','Break' UNION ALL

    SELECT '2012-04-20 09:58:00 AM','23456','Project C'

    --View the results.

    select * from #tbl_Data

    --Release the table.

    drop table #tbl_data

    Ideally, I would like to see the difference between each row (subtracting the dates in minutes). For example, if I summed the total amount of work that UserID 12345 performed on Project A, I would get X minutes.

    Please let me know if I can clarify anything else.

  • I can actually get close to where I think that I need to be. The problem with these results is that the first entry gets no time associated with it (because it's the first entry), even though technically 15 minutes were spent on it (8:15 to 8:30). Therefore, when the data is summed up by activity, the times are off by each previous amount.

    --Create a table to hold the data.

    create table #tbl_Data (TimePunchID int identity(1,1), dateadded datetime, userid varchar(100), activity varchar(100))

    --Populate some data in the table.

    insert into #tbl_Data (dateadded, userid, activity)

    SELECT '2012-04-20 08:15:00 AM','12345','Project A' UNION ALL

    SELECT '2012-04-20 08:30:00 AM','12345','Project D' UNION ALL

    SELECT '2012-04-20 09:45:00 AM','12345','Break' UNION ALL

    SELECT '2012-04-20 11:45:00 AM','12345','Project A' UNION ALL

    SELECT '2012-04-20 11:50:00 AM','12345','Break' UNION ALL

    SELECT '2012-04-20 08:13:00 AM','23456','Project B' UNION ALL

    SELECT '2012-04-20 08:28:00 AM','23456','Break' UNION ALL

    SELECT '2012-04-20 09:58:00 AM','23456','Project C'

    --View the results.

    select * from #tbl_Data

    Select userid, Activity,

    Cast(dateadded AS DateTime) as [fulldate],

    DiffFromPrevious = Coalesce(

    DateDiff(

    mi,

    (

    Select Top 1 Cast(dateadded AS DateTime) AS [fulldate]

    From #tbl_Data

    Where Cast(dateadded AS DateTime) < Cast(t1.dateadded AS DateTime)

    and convert(varchar(100),dateadded,101)

    = convert(datetime,'4/20/2012',101)

    and userid = '12345'

    Order By [fulldate] Desc

    ),

    Cast(dateadded AS DateTime)

    ),

    0)

    From

    #tbl_Data t1

    where convert(varchar(100),t1.dateadded,101)

    = convert(datetime,'4/20/2012',101)

    and userid = '12345'

    Order By

    [fulldate] Asc

    drop table #tbl_data

  • To concretely answer your question about the End Time - each entry "ends" the time spent on the previous entry; for example, 8:15am - 8:30am, (8:30am ends the 8:15am entry and starts the new entry, yielding 15 minutes for the activity associated with the 8:15am entry).

  • brubin 51691 (4/20/2012)


    To concretely answer your question about the End Time - each entry "ends" the time spent on the previous entry; for example, 8:15am - 8:30am, (8:30am ends the 8:15am entry and starts the new entry, yielding 15 minutes for the activity associated with the 8:15am entry).

    Given the above statement, you cannot arrive at an answer as the last task has not ended yielding infinite. Given the demo data in your code the second user has started Project C at 9:58 and is still working.

    This is a business decision you will need to make. Do you want the sum f completed tasks?

    Fitz

  • Thanks for the reply. Yes, the sum of completed tasks is needed; the last entry for each user will be open.

  • Brubin,

    I believe the following will get you what you want although I played a bit fast and loose with your problem description.

    --Create a table to hold the data.

    DECLARE @tbl_Data TABLE

    (TimePunchID int identity(1,1), dateadded datetime, userid varchar(100), activity varchar(100))

    --Populate some data in the table.

    insert into @tbl_Data (dateadded, userid, activity)

    SELECT '2012-04-20 08:15:00 AM','12345','Project A' UNION ALL

    SELECT '2012-04-20 08:30:00 AM','12345','Project D' UNION ALL

    SELECT '2012-04-20 09:45:00 AM','12345','Break' UNION ALL

    SELECT '2012-04-20 11:45:00 AM','12345','Project A' UNION ALL

    SELECT '2012-04-20 11:50:00 AM','12345','Break' UNION ALL

    SELECT '2012-04-20 08:13:00 AM','23456','Project B' UNION ALL

    SELECT '2012-04-20 08:28:00 AM','23456','Break' UNION ALL

    SELECT '2012-04-20 09:58:00 AM','23456','Project C'

    ;WITH MyData AS (

    SELECT dateadded, userid, activity

    ,ROW_NUMBER() OVER (PARTITION BY userid ORDER BY userid, dateadded) As rk

    FROM @tbl_data)

    ,ByActivity AS (

    SELECT dateadded, userid, activity

    ,DATEDIFF(minute

    ,dateadded

    ,(SELECT dateadded

    FROM MyData t2

    WHERE t1.userid = t2.userid and t1.rk + 1 = t2.rk)

    ) As DifferenceInMinute

    FROM MyData t1)

    --SELECT dateadded, userid, activity, DifferenceInMinute

    SELECT userid, activity, SUM(DifferenceInMinute) As TotalByActivity

    FROM ByActivity

    GROUP BY userid, activity

    You can see what I mean by "fast and loose" if you uncomment the SELECT, comment the SELECT with the SUM and comment the GROUP BY lines.

    Basically, I've put the DifferenceInMinute on the task start record (not the task close record). This allows the GROUPing to SUM correctly by activity.

    My results set:

    useridactivityTotalByActivity

    12345Break120

    23456Break90

    12345Project A20

    23456Project B15

    23456Project CNULL

    12345Project D75

    This is actually a knock off of the cumulative sum problem that Jeff Moden's "quirky update" would probably solve faster, but this solution should work.

    It can also be done with a JOIN of course.

    ;WITH MyData AS (

    SELECT dateadded, userid, activity

    ,ROW_NUMBER() OVER (PARTITION BY userid ORDER BY userid, dateadded) As rk

    FROM @tbl_data)

    SELECT t1.userid, t1.activity, SUM(DATEDIFF(minute, t1.dateadded, t2.dateadded)) As TotalByActivity

    FROM MyData t1

    LEFT JOIN MyData t2 ON t1.userid = t2.userid and t1.rk + 1 = t2.rk

    GROUP BY t1.userid, t1.activity

    ORDER BY t1.userid, t1.activity


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (4/22/2012)


    Brubin,

    I believe the following will get you what you want although I played a bit fast and loose with your problem description.

    --Create a table to hold the data.

    DECLARE @tbl_Data TABLE

    (TimePunchID int identity(1,1), dateadded datetime, userid varchar(100), activity varchar(100))

    --Populate some data in the table.

    insert into @tbl_Data (dateadded, userid, activity)

    SELECT '2012-04-20 08:15:00 AM','12345','Project A' UNION ALL

    SELECT '2012-04-20 08:30:00 AM','12345','Project D' UNION ALL

    SELECT '2012-04-20 09:45:00 AM','12345','Break' UNION ALL

    SELECT '2012-04-20 11:45:00 AM','12345','Project A' UNION ALL

    SELECT '2012-04-20 11:50:00 AM','12345','Break' UNION ALL

    SELECT '2012-04-20 08:13:00 AM','23456','Project B' UNION ALL

    SELECT '2012-04-20 08:28:00 AM','23456','Break' UNION ALL

    SELECT '2012-04-20 09:58:00 AM','23456','Project C'

    ;WITH MyData AS (

    SELECT dateadded, userid, activity

    ,ROW_NUMBER() OVER (PARTITION BY userid ORDER BY userid, dateadded) As rk

    FROM @tbl_data)

    ,ByActivity AS (

    SELECT dateadded, userid, activity

    ,DATEDIFF(minute

    ,dateadded

    ,(SELECT dateadded

    FROM MyData t2

    WHERE t1.userid = t2.userid and t1.rk + 1 = t2.rk)

    ) As DifferenceInMinute

    FROM MyData t1)

    --SELECT dateadded, userid, activity, DifferenceInMinute

    SELECT userid, activity, SUM(DifferenceInMinute) As TotalByActivity

    FROM ByActivity

    GROUP BY userid, activity

    You can see what I mean by "fast and loose" if you uncomment the SELECT, comment the SELECT with the SUM and comment the GROUP BY lines.

    Basically, I've put the DifferenceInMinute on the task start record (not the task close record). This allows the GROUPing to SUM correctly by activity.

    My results set:

    useridactivityTotalByActivity

    12345Break120

    23456Break90

    12345Project A20

    23456Project B15

    23456Project CNULL

    12345Project D75

    This is actually a knock off of the cumulative sum problem that Jeff Moden's "quirky update" would probably solve faster, but this solution should work.

    It can also be done with a JOIN of course.

    ;WITH MyData AS (

    SELECT dateadded, userid, activity

    ,ROW_NUMBER() OVER (PARTITION BY userid ORDER BY userid, dateadded) As rk

    FROM @tbl_data)

    SELECT t1.userid, t1.activity, SUM(DATEDIFF(minute, t1.dateadded, t2.dateadded)) As TotalByActivity

    FROM MyData t1

    LEFT JOIN MyData t2 ON t1.userid = t2.userid and t1.rk + 1 = t2.rk

    GROUP BY t1.userid, t1.activity

    ORDER BY t1.userid, t1.activity

    i did not have time over the weekend to post back but went with dwain's self join solution for my approach. had the exact same code to. (well almost but same enough not to post)


    For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

    For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

    Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
    Jeff Moden's Cross tab and Pivots Part 1[/url]
    Jeff Moden's Cross tab and Pivots Part 2[/url]

  • Thanks for the replies! Both of the last two posts produce solid output, and I really appreciate the effort. The last part of the equation is to take the minutes summed and display it as hh.mm (ie, 8.50 would = 8 hours 30 minutes) to 2 decimal places to ensure that the actual times being summed account for seconds.

  • Normally I try to avoid self joins because they can be slow performance-wise with many rows in the table and there's usually another approach that is faster.

    In this case though, the query plan cost for both solutions I posted came up identical, even though the plans themselves were slightly different, so I posted the self join option because some may think it's a little easier on the eyes.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Like I said, self joins aren't particularly efficient. This solution produces a lower query plan cost so is likely to run faster than the self join or the first suggestion I made.

    ;WITH MyData AS (

    SELECT dateadded, userid, activity

    ,(SELECT TOP 1 dateadded

    FROM @tbl_data t2

    WHERE t1.userid = t2.userid and t1.dateadded < t2.dateadded

    ORDER BY dateadded) As dateadded2

    FROM @tbl_data t1)

    SELECT userid, activity, SUM(DATEDIFF(minute, dateadded, dateadded2)) As TotalByActivity

    FROM MyData t1

    GROUP BY userid, activity

    ORDER BY userid, activity

    -- With formatting to MM:SS

    ;WITH MyData AS (

    SELECT dateadded, userid, activity

    ,(SELECT TOP 1 dateadded

    FROM @tbl_data t2

    WHERE t1.userid = t2.userid and t1.dateadded < t2.dateadded

    ORDER BY dateadded) As dateadded2

    FROM @tbl_data t1)

    SELECT userid, activity

    ,CAST(TotalByActivity/60 AS VARCHAR) + ':' +

    RIGHT('00'+CAST(TotalByActivity%60 AS VARCHAR),2) As [TotalByActivity MM:SS]

    FROM (

    SELECT userid, activity, SUM(DATEDIFF(second, dateadded, dateadded2)) As TotalByActivity

    FROM MyData t1

    GROUP BY userid, activity) x

    ORDER BY userid, activity

    The second version even formats to MM:SS as you want.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply