Horrible query - work/idle time within duty day

  • I am trying to write an SSRS report to replace an existing vendor-created report for custom software. I can see how I would figure out the report if I was creating it manually, but I'm having a heck of a time figuring out how to do it in SQL / SSRS. I'm hoping that in describing the problem I might help myself to figure it out... otherwise if anyone has any pointers or suggestions I'd be grateful!

    There are two tables, FlightDay and DutyDay. DutyDay gives a start and end time for when the aircraft began 'working' -- the pilot is sitting in the plane, ready to go. When they actually take off, there is an entry in FlightDay. However, sometimes they just sit on the ground and are considered "idle". Where it's getting confusing is there can be multiple FlightDay entries within the span of time of one DutyDay.

    And I'm being called into a meeting - I'll finish this post in a bit.

  • Okay, I'm back.

    So what I need to do is create a report that shows the individual start and stop times from the FlightDay table, and give a total number of hours per activity. That's really simple - just report on FlightDay, group by activityId, and use a subtotal line on the hours.

    What makes this into a beast is then I *also* need to calculate the total number of idle hours for that activity group, which is calculated by taking the total hours from DutyDay for that activity/time range and subtracting the total FlightDay hours. If there was one flightday record per dutyday record, this would be easy, but there can be multiple flights in one day.

    Here's the code to set it up...

    CREATE TABLE [dbo].[FlightDay](

    [flightDayId] [bigint] IDENTITY(1,1) NOT NULL,

    [resourceId] [bigint] NOT NULL,

    [activityId] [bigint] NOT NULL,

    [startDay] [datetime] NOT NULL,

    [stopDay] [datetime] NULL,

    [hours] [float] NULL,

    PRIMARY KEY CLUSTERED

    (

    [flightDayId] ASC

    )WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]

    ) ON [PRIMARY]

    go

    CREATE TABLE [dbo].[DutyDay](

    [dutyDayId] [bigint] IDENTITY(1,1) NOT NULL,

    [resourceId] [bigint] NOT NULL,

    [activityId] [bigint] NULL,

    [startDay] [datetime] NOT NULL,

    [stopDay] [datetime] NULL,

    [hours] [float] NULL,

    PRIMARY KEY CLUSTERED

    (

    [dutyDayId] ASC

    )WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    insert dutyday (resourceid, activityid, startday,stopday,hours) values (1,1,'2009-09-02 10:21', '2009-09-02 10:29', 0.1)

    insert dutyday (resourceid, activityid, startday,stopday,hours) values (1,1,'2009-09-04 09:19', '2009-09-04 15:30', 6.2)

    insert dutyday (resourceid, activityid, startday,stopday,hours) values (1,1,'2009-09-05 08:59', '2009-09-05 10:57', 2)

    insert flightday(resourceid, activityid, startday,stopday,hours) values (1,1,'2009-09-02 10:21', '2009-09-02 10:29', 0.1)

    insert flightday(resourceid, activityid, startday,stopday,hours) values (1,1,'2009-09-04 09:19', '2009-09-04 12:14', 2.9)

    insert flightday(resourceid, activityid, startday,stopday,hours) values (1,1,'2009-09-04 12:52', '2009-09-04 15:26', 2.6)

    insert flightday(resourceid, activityid, startday,stopday,hours) values (1,1,'2009-09-05 08:59', '2009-09-05 10:54', 1.9)

    go

    select * from DutyDay compute sum(hours)

    select * from FlightDay compute sum(hours)

    This is sort of close but it's not right because it returns only three line items (one per day) and really I want four line items (one per flight) but I only want to count the idle hours once. I don't actually need the individual idle hours as line items, just the total at the end.

    select dd.resourceid, dd.activityid, dd.startday, max(dd.hours) TotalHours, sum(fd.hours) WorkHours, max(dd.hours)-sum(fd.hours) as IdleHours

    from dutyday dd, flightday fd

    where fd.startday between dd.startday and dd.stopday and dd.resourceid = fd.resourceid and dd.activityid = fd.activityid

    group by dd.resourceid, dd.activityid, dd.startday

    I'm wondering if I'll have to do it as two separate queries in SSRS somehow... the main query to get the info from FlightDay and completely ignore DutyDay, and then the subtotal query that would do something like "select sum({hours from query1}) - sum(hours) from DutyDay where starttime = {min start time from query1} and stoptime = {max stop time from query1}"...

    ??????

    Pointers greatly appreciated. I don't care if I end up writing the solution as a stored proc, or mangle it within SSRS... I just don't know how to get two queries in an SSRS report at the same time.

    I can post a screen shot of the report if that's helpful.... although I'm not sure how to do that on here. Upload to photobucket or something first?

  • First of all: you did an excellent job in providing sample data!!

    Not seen too often...

    One thing I'd like you to add: what is the output you're expecting?

    Best thing would be if you'd add the result set based on your sample data.

    It would work just fine if you'd post it as plain text, comma separated, with column headers.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • What I would like to see is something like this:

    ActivityId, Hours, StartTime, EndTime

    1, 1.9, Sep 5 08:59, Sep 5 10:54

    1, 2.6, Sep 4 12:52, Sep 4 15:26

    1, 2.9, Sep 4 09:19, Sep 4 12:14

    1, 0.1, Sep 2 10:21, Sep 2 10:29

    Working Hours: 7.5 Idle Hours: 0.8

    2, 3.4, Sep 1 13:52, Sep 1 17:14

    Working Hours: 3.4 Idle Hours: 0.0

    1, 1.7 Sep 1 09:46, Sep 1 11:28

    (etc)

    If I could just ignore the idle hours requirement it would be all good 🙂 I only included sample data for the first section, but in reality they can go back and forth between several activities. The report is sorted in reverse chronological order, and they want a subtotal every time the activity changes.

  • I have the beginnings of a thought forming. Tell me if this makes sense....

    Create a view for both DutyDay and FlightDay that ignores the time portion of the startTime field, and group by activityId and resourceId to get the total hours for each, and include the min startTime

    Then do a left join from the original tables back to the view so that there is only one view line item returned per DAY from FlightDay rather than per ROW.

    Hmm. I'm going to try that and see if I can get somewhere with it....

  • Success! This works with my four rows of sample data....

    create view viewDutyDay as

    select

    resourceId

    , activityId

    , DATEADD(dd,CAST(DATEDIFF(DAY,'17530101',startDay) AS INT),'17530101') as startDayNoTime

    , sum(hours) as TotalDutyHours

    from dutyday

    group by resourceId, activityId, DATEADD(dd,CAST(DATEDIFF(DAY,'17530101',startDay) AS INT),'17530101')

    go

    create view viewFlightDay as

    select

    resourceId

    , activityId

    , DATEADD(dd,CAST(DATEDIFF(DAY,'17530101',startDay) AS INT),'17530101') as startDayNoTime

    , min(startDay) as FirstFlightDay

    , sum(hours) as TotalFlightHours

    from Flightday

    group by resourceId, activityId, DATEADD(dd,CAST(DATEDIFF(DAY,'17530101',startDay) AS INT),'17530101')

    go

    select

    fd.resourceId

    , fd.activityId

    , fd.hours as FlightHours

    , fd.startDay

    , fd.stopDay

    , vdd.TotalDutyHours

    , vfd.TotalFlightHours

    , vdd.TotalDutyHours - vfd.TotalFlightHours as TotalIdleHours

    from flightday fd left join viewflightday vfd on fd.resourceid = vfd.resourceid and fd.activityid = vfd.activityid and fd.startday = vfd.firstflightday

    left join viewdutyday vdd on vfd.resourceid = vdd.resourceid and vfd.activityid = vdd.activityid and vfd.startdaynotime = vdd.startdaynotime

    Results:

    resourceId,activityId,FlightHours,startDay,stopDay,TotalDutyHours, TotalFlightHours,TotalIdleHours

    1,1,0.1,2009-09-02 10:21:00.000,2009-09-02 10:29:00.000,0.1,0.1,0

    1,1,2.9,2009-09-04 09:19:00.000,2009-09-04 12:14:00.000,6.2,5.5,0.7

    1,1,2.6,2009-09-04 12:52:00.000,2009-09-04 15:26:00.000,NULL,NULL,NULL

    1,1,1.9,2009-09-05 08:59:00.000,2009-09-05 10:54:00.000,2,1.9,0.1

    Now... it's a *way* to do it, but is this a GOOD way?

  • Good job, again!

    It's great to see that there are still some people out there that aren't relying on a forum answer only. This makes it a lot more fun to stay around trying to help 🙂

    Here's another proposal to do it.

    It might be a little faster since there aren't that many joins and date conversions in it.

    The trick is the ROW_NUMBER function. It is used to assign the daily work hours to the first flight and set the following to zero. You can SUM over the last three columns to get the total per day, same as with your solution.

    There might be even easier ways to do it but I can't figure one right now. Maybe one of the guru's will jump in and have a look at it...

    If there's anything else we can help you with or if you have any questions regarding the solution below, let us know!

    ;WITH cte AS

    (

    SELECT

    activityid AS id,

    row_number () OVER (PARTITION BY activityid,dateadd(dd,0,datediff(dd,0,startday)) ORDER BY startday ) AS row,

    startday AS dstart ,

    stopday AS dstop,

    hours AS work

    FROM flightday

    )

    SELECT id,

    dstart,

    dstop,

    work,

    CASE WHEN row = 1 THEN d.hours ELSE 0 END AS TotalHrs ,

    CASE WHEN row = 1 THEN d.hours ELSE 0 END - work AS idle

    FROM dutyday d

    INNER JOIN cte

    ON d.activityid = cte.id

    AND cte.dstart>= d.startday

    AND cte.dstop<= d.stopday

    /* result set

    iddstartdstopworkTotalHrsidle

    12009-09-02 10:21:00.0002009-09-02 10:29:00.0000,10,10

    12009-09-04 09:19:00.0002009-09-04 12:14:00.0002,96,23,3

    12009-09-04 12:52:00.0002009-09-04 15:26:00.0002,60-2,6

    12009-09-05 08:59:00.0002009-09-05 10:54:00.0001,920,1

    */



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • That's an interesting way to do it... I don't fully understand it! I'm going to have to read it a few more times and look up PARTITION. 🙂

    However, using your method against the real data it takes 16 seconds to filter down to just the 4 rows I included in the sample, but mine takes 9 seconds.

    I'm putting your code modified to use live data below so I don't misplace it 🙂

    ;WITH cte AS

    (

    SELECT

    activityid,

    resourceId,

    row_number () OVER (

    PARTITION BY activityid,dateadd(dd,0,datediff(dd,0,startdatetime))

    ORDER BY startdatetime ) AS row,

    startdatetime AS dstart,

    stopdatetime AS dstop,

    hours AS FlightHours

    FROM prt.view_flightday

    )

    SELECT cte.activityId,

    cte.resourceId,

    dstart,

    dstop,

    FlightHours,

    CASE WHEN row = 1 THEN d.hours ELSE 0 END AS TotalHrs ,

    CASE WHEN row = 1 THEN d.hours ELSE 0 END - flighthours AS IdleHours

    FROM prt.view_dutyday d

    INNER JOIN cte

    ON d.activityid = cte.activityid and d.resourceid = cte.resourceid

    AND cte.dstart>= d.startdatetime

    AND cte.dstop<= d.stopdatetime

    where d.resourceid = 7685 and dstart between '2009-09-02' and '2009-09-06'

  • The ROW_NUMBER () OVER (PARTITION BY ... ORDER BY) will simply number the rows for each activityid and day starting from 1 ordered by startdatetime.

    By looking at your real life sample I think it should be modified to

    PARTITION BY resourceId,activityid,dateadd(dd,0,datediff(dd,0,startdatetime))

    To see hoe it works just run the select statement within the WITH claus separately and you'll see how it works.

    regarding the performance issue:

    The cte basically is a subselect, just easier to read... (at least from my point of view...).

    What probably happens here is the subselect will be performed against all data (meaning each and every resourceid) and the result will be joined against the table prt.view_dutyday. After that the filter "d.resourceid = 7685 and dstart between '2009-09-02' and '2009-09-06'"

    will be applied. But I'm not really sure about that since I'm not an expert on how the query optimizer works...

    What I'd recommend is to move the WHERE clause inside the CTE:

    ...FROM prt.view_flightday

    where resourceid = 7685 and startdatetime between '2009-09-02' and '2009-09-06'

    Another question would be how your tables are indexed:

    Do you have an index on resourceid, startdatetime, stopdatetime and activityid on both tables view_flightday and view_dutyday? If not, what are the indexes on those tables and how many rows are in each one?

    Edit:

    Based on the table names: is it possible that you're running the query against two views? If so, indexing the views might get a "little bit complicated", especially if there are aggregation functions involved or anything else that would prevent indexed views...

    Maybe it's possible to find a solution based on the source tables of both views. Would you please post both view definitions?



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply