conditional aggregate functions?

  • I have a table that stores labor charges for employees:

    JobLabor table

    empid - employee id

    hours - number of hours for an operation

    opid - operation id

    date - date of operation

    I am attempting to take this data and create a bi-weekly payroll report showing the total number of regular hours, overtime hours and "special hours" (ie vacation time, holiday pay, etc..)

    unpaid hours have an opid of 1 or 2. special hours have an opid over 95000.. so for a sum total of regular hours, I could do something like this:

    select empid, sum(hours) as tothours from JobLabor where opid between 3 and 94999 group by empid

    However, I would want to push some regular hours to overtime hours if it exceeds 40 for a given week and I would like to sum up each of the special hours..

    Right now I am calculating these in a procedural loop (effecitvely I have a basic SQL select that returns all rows for a given time period and I go through each one and sum up the values) but was curious if there is an easier way to do more of this conditional aggregate calculations at the t-sql level.. thanks for any thoughts. 🙂

  • Yes, there's an easy way to do the whole thing... without a loop. The loop is just going to make this very slow...

    Recommend you post your code and let's fix the whole thing 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Also, how can you tell if a certain day was holiday or weekend?

    What increase in payments can you expect for certain type of overtime?

    Please post a complete set of business rules and explanations here.


    N 56°04'39.16"
    E 12°55'05.25"

  • To answer the follow-up questions --> this report only tabulates the hours for different types of work (regular hours, overtime hours, vacation hours, etc) it does not calculate actual dollar amounts. There is no special pay due to working on weekends/hoidays/etc..

    The current csv is exported from a crystal report, so I don't really have anything to post per say, but can outline the basics of the report here:

    Employees clock time to jobs and this is recorded in the JobLabor table. The pertinant fields are:

    JobLabor table

    empid - employee id

    hours - number of hours for an operation

    opid - operation id

    date - date of operation

    The current report requires a start date (the start date is always the first Sunday of the pay period) and there is a basic query to pull the data from the database:

    select * from JobLabor, Employees where empid.joblabor = id.employees and opid > 2 and date between %startdate% and %startdate% + 14 order by empid, date

    from here, I know the following:

    opid of 1 is a non-paid break

    opid of 2 is end-of-day (no pay)

    opid of 95000 is for vacation

    opid of 95100 is for birthday

    opid of 95200 is for holiday

    there are a few other operaiton ids that are in the database, but for the sake of simplicity, all function the same. Basically for a birthday or vacation, there is a record that consist of a full shift (ie 8 hours) for a given day with the holiday/vacation code attached.

    So from here, in my existing report, I look at each record and have what is basically something like this pseudo code:

    for each record in result set {

    if $date = %startdate% + 7 then break // overtime pay is over 40 hours per week, so we need to calculate it separately

    if opid < 95000 then $reghourswk1 = $reghourswk1 + hours

    if opid = 95000 then $vacationwk1 = $vacationwk1 + hours

    if opid = 95100 then $birthdaywk1 = $birthdaywk1 + hours

    if opid = 95200 then $holidaywk1 = $holidaywk1 + hours

    }

    if $reghourswk1 > 40 then $overtimewk1 = $reghourswk1 - 40 and $reghourswk1 = 40

    for each record in result set {

    if opid < 95000 then $reghourswk2 = $reghourswk2 + hours

    if opid = 95000 then $vacationwk2 = $vacationwk2 + hours

    if opid = 95100 then $birthdaywk2 = $birthdaywk2 + hours

    if opid = 95200 then $holidaywk2 = $holidaywk2 + hours

    }

    if $reghourswk2 > 40 then $overtimewk2 = $reghourswk2 - 40 and $reghourswk2 = 40

    $reghours = $reghourswk1 + $reghourswk2

    $overtime = $overtimewk1 + $overtimewk2

    $birthday = $birthdaywk1 + $birthdaywk2

    $vacation = $vacationwk1 + $vacationwk2

    $holiday = $holidaywk1 + $holidaywk2

    from here, I end up displaying the payroll line on my report:

    $empid, $reghours, $overtime, $birthday, $vacation, $holiday

    So this is the basic overview of what I am trying to accomplish .. thanks again in advance. 🙂

  • Lordy, I hate setting up test data for payroll examples... whatever...

    Here's some code that generates 100,000 entries for 200 employes for a year's worth of payroll following your rules for OPID... keep in mind that the data is random which means that the holiday time may not fall on holidays and there will be entries for birthday time that isn't necessarily an employee's birthday... it' just randomized test data... it takes just over 2 seconds to generate...

    THIS CODE IS NOT PART OF THE SOLUTION! IT JUST GENERATES DATA TO TEST THE SOLUTION!

    SELECT TOP 100000

    RowNum = IDENTITY(INT,1,1),

    EmpID = ABS(CHECKSUM(NEWID()))%200+1,

    Hours = CAST(ABS(CHECKSUM(NEWID()))%1200 /100.0 AS DECIMAL(9,1)),

    OpID = ABS(CHECKSUM(NEWID()))%23+1,

    Date = CAST(ABS(CHECKSUM(NEWID()))%365+36524 AS DATETIME)

    INTO dbo.JBMTest

    FROM Master.dbo.SysColumns t1,

    Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN

    --===== A table is not properly formed unless a Primary Key has been assigned

    -- Takes about 12 seconds to execute.

    ALTER TABLE dbo.JBMTest

    ADD PRIMARY KEY CLUSTERED (RowNum)

    --===== Change a couple of the OpID's to vacation, birthday, holiday

    UPDATE dbo.JBMTest

    SET OpID = CASE OpID

    WHEN 11 THEN 95000

    WHEN 12 THEN 95100

    WHEN 13 THEN 95200

    END

    WHERE OPID BETWEEN 11 AND 13

    Ok... I don't necessarily agree with your rules of NOT adding the vacation, holiday, and birthday time to the work hours, but the following code does meet your rules. Despite it's apparent size, it beats the pants off a loop or a cursor... AND, it's pretty easy to modify for rule changes...

    DECLARE @StartDate DATETIME

    SET @StartDate = '20000312'

    SELECT --===== This puts the two weeks together leaving the OT calcs per week

    EmpID,

    WorkHours = SUM(WorkHours),

    Vacation = SUM(Vacation),

    Birthday = SUM(Birthday),

    Holiday = SUM(Holiday),

    TotalHours = SUM(TotalHours),

    RegHours = SUM(RegHours),

    OTHours = SUM(OTHours),

    OtherHours = SUM(OtherHours)

    FROM (--==== Derived table "ot" calculates OT and other for each of 2 weeks starting at StartDate

    SELECT EmpID,

    WeekStart,

    WorkHours,

    Vacation,

    Birthday,

    Holiday,

    TotalHours,

    RegHours = CASE WHEN WorkHours >= 40 THEN 40 ELSE WorkHours END,

    OTHours = CASE WHEN WorkHours > 40 THEN WorkHours-40 ELSE 0 END,

    OtherHours = Vacation+Birthday+Holiday

    FROM (--==== Derived table "wt" gets the weekly total for each of 2 weeks starting at StartDate

    SELECT EmpID,

    WeekStart = DATEADD(dd,DATEDIFF(dd,-1,Date)/7*7,-1),

    WorkHours = SUM(CASE WHEN OpID < 95000 THEN Hours ELSE 0 END),

    Vacation = SUM(CASE WHEN OpID = 95000 THEN Hours ELSE 0 END),

    Birthday = SUM(CASE WHEN OpID = 95100 THEN Hours ELSE 0 END),

    Holiday = SUM(CASE WHEN OpID = 95200 THEN Hours ELSE 0 END),

    TotalHours = SUM(Hours)

    FROM dbo.JBMTest

    WHERE OpID > 2

    AND DATE >= @StartDate

    AND DATE < @StartDate + 14

    GROUP BY EmpID, DATEADD(dd,DATEDIFF(dd,-1,Date)/7*7,-1)

    )wt

    )ot

    GROUP BY EmpID

    ORDER BY EmpID

    Code takes about 1 second to resolve a full result set.

    Note that this is an "onion" SELECT with 3 layers... the comments explain each layer... the inner-most layer is always resolved first and, on these "onion" SELECTs, you can execute the inner most layer separately just to see what it produces... in fact, that's true for each layer provided that you include any layers below the current layer. In English, you can execute each layered SELECT to see what it does...

    By the way, when looking at the results you asked for, do keep in mind that a person can work over 40 hours for one week (generates overtime) and can work less than 40 hours the next week. That means a person can generate overtime hours without have 80 regular hours.

    Just an FYI... anytime you think you need a cursor or a loop to manipulate a set of data 1 row at a time, you're probably wrong 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • WOW.. that is AWESOME. It works great.... just a quick follow-up. 🙂

    Ok... I don't necessarily agree with your rules of NOT adding the vacation, holiday, and birthday time to the work hours, but the following code does meet your rules. Despite it's apparent size, it beats the pants off a loop or a cursor... AND, it's pretty easy to modify for rule changes...

    I was incorrect on my statement about the rules for vacation/bday/holiday .. you are right, it is calculated when determining OT pay .. however, for reporting purposes, it is kept as a separate item. I corrected this.

    WeekStart = DATEADD(dd,DATEDIFF(dd,-1,Date)/7*7,-1),

    Can you explain this line to me? I looked up the DATEADD/DATEDIFF functions and they seem relatively straight forward, but the /7*7 really has me scratching my head (I read that as divide by 7, multiply by 7 which should be the same number!?!?) .. I toyed with this a bit but haven't quite figured out the logic behind it (though it is quite cool that it works)

    By the way, when looking at the results you asked for, do keep in mind that a person can work over 40 hours for one week (generates overtime) and can work less than 40 hours the next week. That means a person can generate overtime hours without have 80 regular hours.

    That is correct.. and in fact, quite common.

    Just an FYI... anytime you think you need a cursor or a loop to manipulate a set of data 1 row at a time, you're probably wrong 😉

    Ok.. I'll take you up on this .. I wrote a trigger with a cursor .. the interesting thing about this is taking multiple rows of data and merging it together and inserting it into one row in a different table. The kicker is the data would be inserted into different fields and not directly into a given field.. so for example, I would have a table like this:

    jobnumber,fieldname,value

    So I might have several rows of the same job number .. ie:

    job1,field1,firstvalue

    job1,field2,secondvalue

    job1,field3,thirdvalue

    and ultimately I would insert it into a table that was setup like this:

    jobnumber,field1,field2,field3

    so I wrote the cursor and some basic logic to either insert or update (if the row exists) ... so can I go cursor-less for this type of operation?

  • Ok, I'm not Jeff but I'm gonna take a shot at this anyway 🙂

    First the question

    WeekStart = DATEADD(dd,DATEDIFF(dd,-1,Date)/7*7,-1),

    The /7*7 gives you a different number because the 7's are integers. If they were decimals you would be right. But because all of the numbers in question are integers you get an affect like this: 71/7*7 = 70. Because 71/7 = 10. It returns the same data type.

    Next the challange. Again .. I'm not Jeff so no fancy test data 🙂

    So I might have several rows of the same job number .. ie:

    job1,field1,firstvalue

    job1,field2,secondvalue

    job1,field3,thirdvalue

    and ultimately I would insert it into a table that was setup like this:

    jobnumber,field1,field2,field3

    Try this :

    SELECT Job1 AS JobNumber,

    SUM(CASE WHEN Field1 IS NOT NULL THEN FirstValue ELSE NULL END) AS Field1,

    SUM(CASE WHEN Field2 IS NOT NULL THEN SecondValue ELSE NULL END) AS Field2,

    SUM(CASE WHEN Field3 IS NOT NULL THEN ThirdValue ELSE NULL END) AS Field3

    FROM RecordSet

    Now that only works if you have a set number of fields and there is probably a better way using pivots but thats a fairly simple one 🙂

    Kenneth

    Kenneth FisherI was once offered a wizards hat but it got in the way of my dunce cap.--------------------------------------------------------------------------------For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url]For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/[/url]Link to my Blog Post --> www.SQLStudies.com[/url]

  • Ok, I'm not Jeff but I'm gonna take a shot at this anyway

    First the question

    WeekStart = DATEADD(dd,DATEDIFF(dd,-1,Date)/7*7,-1),

    The /7*7 gives you a different number because the 7's are integers. If they were decimals you would be right. But because all of the numbers in question are integers you get an affect like this: 71/7*7 = 70. Because 71/7 = 10. It returns the same data type.

    Spot on, Kenneth... the only thing else folks need to know is that -1 days = 12/31/1899 which was a Sunday.

    WOW.. that is AWESOME. It works great....

    Hey, Joe... thanks not only for the compliment, but also the great feedback about what you did. Stuff like that is what makes this forum great! Thanks!

    Ok.. I'll take you up on this .. I wrote a trigger with a cursor ..

    Kenneth is on the right track... but it would be handy if you posted the actual code so we can see what's going on...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply