Joining the pieces together

  • I am trying to write a query to do the following with the seperate queries and data below:

    TotalVacation + CarryOver

    I have all the pieces working separately, I just don't know how to join it all together so the query returns:

    EmployeeId (TotalVacation + CarryOver)

    1 152

    2 168

    TotalVacation =

    SELECT EV.EmployeeId, EV.TotalVacation FROM @EmployeeVacation EV

    WHERE ev.id in (SELECT MAX(ID) id FROM @EmployeeVacation WHERE EmployeeID IN ('1','2') AND EndDate IS NULL GROUP BY EmployeeId);

    CarryOver = SUM(Credits) - SUM(Debits)

    SUM(Credits) =

    SELECT EmployeeId, ISNULL(SUM(Credit),0) AS Credit FROM @AbsenceLedger WHERE AbsenceTypeId = 2 AND

    StartDate >= '01-01-2011' AND EndDate <= '12-31-2011' AND EmployeeId IN ('1','2') Group By EmployeeId

    SUM(Debits)=

    SELECT EmployeeId, ISNULL(SUM(Debit),0) AS Credit FROM @AbsenceLedger WHERE AbsenceTypeId = 2 AND

    StartDate >= '01-01-2011' AND EndDate <= '12-31-2011' AND EmployeeId IN ('1','2') Group By EmployeeId

    Test Data:

    DECLARE @EmployeeVacation TABLE(

    [ID] [int] IDENTITY(1,1) NOT NULL,

    [EmployeeId] [int] NOT NULL,

    [StartDate] [date] NOT NULL,

    [EndDate] [date] NULL,

    [TotalVacation] [decimal](6, 2) NOT NULL);

    DECLARE @AbsenceLedger TABLE(

    [ID] [int] IDENTITY(1,1) NOT NULL,

    [EmployeeId] [int] NOT NULL,

    [AbsenceTypeId] [int] NOT NULL,

    [StartDate] [date] NOT NULL,

    [EndDate] [date] NOT NULL,

    [Credit] [decimal](6, 2) NULL,

    [Debit] [decimal](6, 2) NULL);

    INSERT INTO @EmployeeVacation (EmployeeId, StartDate, TotalVacation) VALUES(1,'01-01-2011', 120)

    INSERT INTO @EmployeeVacation (EmployeeId, StartDate, TotalVacation) VALUES(2,'01-01-2011', 120)

    INSERT INTO @AbsenceLedger(EmployeeId, AbsenceTypeId, StartDate, EndDate, Credit) VALUES(1,2,'01-01-2011', '01-01-2011', 16)

    INSERT INTO @AbsenceLedger(EmployeeId, AbsenceTypeId, StartDate, EndDate, Credit) VALUES(2,2,'01-01-2011', '01-01-2011', 16)

    INSERT INTO @AbsenceLedger(EmployeeId, AbsenceTypeId, StartDate, EndDate, Debit) VALUES(1,2,'05-05-2011', '05-06-2011', 16)

    INSERT INTO @AbsenceLedger(EmployeeId, AbsenceTypeId, StartDate, EndDate, Debit) VALUES(1,2,'07-10-2011', '07-12-2011', 24)

    INSERT INTO @AbsenceLedger(EmployeeId, AbsenceTypeId, StartDate, EndDate, Debit) VALUES(2,2,'03-13-2011', '03-17-2011', 40)

    INSERT INTO @AbsenceLedger(EmployeeId, AbsenceTypeId, StartDate, EndDate, Debit) VALUES(2,2,'09-12-2011', '09-15-2011', 32)

    Thanks!!

  • Actually I was able to get it to work using this.

    But there has got to be a more elegant way...

    ;with Total AS

    (SELECT EV.EmployeeId, EV.TotalVacation FROM @EmployeeVacation EV

    WHERE ev.id in (SELECT MAX(ID) id FROM @EmployeeVacation WHERE EmployeeID IN ('1','2') AND EndDate IS NULL GROUP BY EmployeeId)),

    Credit AS

    (SELECT EmployeeId, ISNULL(SUM(Credit),0) AS Credit FROM @AbsenceLedger WHERE AbsenceTypeId = 2 AND

    StartDate >= '01-01-2011' AND EndDate <= '12-31-2011' AND EmployeeId IN ('1','2') Group By EmployeeId),

    Debit AS

    (SELECT EmployeeId, ISNULL(SUM(Debit),0) AS Debit FROM @AbsenceLedger WHERE AbsenceTypeId = 2 AND

    StartDate >= '01-01-2011' AND EndDate <= '12-31-2011' AND EmployeeId IN ('1','2') Group By EmployeeId)

    SELECT T.EmployeeId, T.TotalVacation + (C.Credit - D.Debit) AS "Total + CarryOver"

    FROM Total T, Credit C, Debit D

    WHERE T.EmployeeId = C.EmployeeId AND T.EmployeeId = D.EmployeeId

  • This can definitely be done in a more elegant manner, with much better performance to boot. I'll post one possible solution in a moment; first though, I have some questions about the underlying business logic. The sample data you've provided is very limited, so it's easy to code something that will return the expected results for this set of data, but may have some serious bugs when run against your full set of live data. The @AbsenceLedger data seems pretty self explanatory, but it's the @EmployeeVacation data I really need further details on.

    What does each row in @EmployeeVacation represent? Is TotalVacation the amount of vacation accrued during the period between StartDate and EndDate? Or is it the previous period's TotalVacation + (Credits - Debits) for the employee? I'm guessing the latter, since you appear to be attempting to get the most recent @EmployeeVacation rows by means of the MAX(ID) subquery, but I want to make sure. If that's the case, then why are the EndDates null in your sample data? Does the EndDate get populated once a new accrual period begins? This is where a larger set of sample data including multiple rows for each EmployeeID would be very useful, so we can get a handle on what the actual data looks like. I'm also guessing that the aggregated values from @AbsenceLedger should be limited to the most recent @EmployeeVacation date range; is that correct? If so, rather than hard coding the values as you've done in your examples, it's going to be important to join @AbsenceLedger to @EmployeeVacation on the StartDate and EndDate values.

    If you could answer those questions, and provide a larger/more diverse set of sample data, I'll be better able to point you towards a solution that won't have major logic flaws when run against your real live data. Thanks!

  • Ok, taking into the account the issues I mentioned above, here's a possible solution that attempts to be as robust as possible (based on what I know/guess about your underlying data and business rules). I've added a lot of comments to try to make it clear what each part is doing; let me know if you have any additional questions about any of this. Note that I'm using the StartDate column of @EmployeeVacation to find the most recent row for each employee, rather than the identity column; I generally prefer to do it this way when possible, because you just can't always rely on identity columns for this kind of thing (there are several scenarios where your assumption that the MAX(id) is always the most recent row could break down badly). There are several other ways to find the most recent row for each employee; I used the ROW_NUMBER() method here in order to keep the code simple and also because it's a good thing to know about if you haven't encountered window functions before. There are also other ways to obtain the aggregated values from @AbsenceLedger by employee and date range, but using APPLY simplifies the code quite a bit here and, again, it's a good thing to know how to do.

    SELECT EV.EmployeeId, EV.TotalVacation, EV.StartDate, EV.EndDate,

    EV.TotalVacation + ISNULL(AL.TotalCredit,0) - ISNULL(AL.TotalDebit,0) AS TotalCarryover

    FROM (-- Create a derived table with a rownumber that resets to 1 on the most recent row for each employee.

    -- Note that this won't work if an employee can ever have two rows with the same StartDate.

    SELECT EmployeeId, TotalVacation, StartDate, EndDate,

    ROW_NUMBER() OVER (PARTITION BY EmployeeID ORDER BY StartDate DESC) AS OneIsCurrent

    FROM @EmployeeVacation

    ) AS EV OUTER APPLY-- Using APPLY simplifies the query a bit.

    -- Needs to be OUTER to make sure we don't exclude employees with no

    -- AbsenceLedger entries for the current period, just to be safe.

    (-- This is effectively an Inline Table Valued Function (ITVF) that returns the SUM of

    -- Debits and Credits for each employee for the date range specified by the most recent

    -- EmployeeVacation row.

    SELECT EmployeeID, SUM(ISNULL(Credit,0)) AS TotalCredit, SUM(ISNULL(Debit,0)) AS TotalDebit

    FROM @AbsenceLedger

    WHERE EmployeeID = EV.EmployeeID

    AND StartDate >= EV.StartDate

    AND (EndDate <= EV.EndDate OR EV.EndDate IS NULL)

    AND AbsenceTypeId = 2

    GROUP BY EmployeeId

    ) AS AL

    WHERE EV.OneIsCurrent = 1-- Filter out all but the most recent EmployeeVacation row for each employee

    This will perform better if the AbsenceLedger table has a usable index on EmployeeID, StartDate and EndDate. This will be especially important if there are a lot of rows in the real tables. If you don't get decent performance when you run this against your full data set, post the actual execution plan and I'll take a look to see what your options are.

    Again, this may not be exactly correct when run against your live data; test it very very carefully. If you run into anything odd, please post a larger selection of sample data, for a wider variety of dates, and I'll take another look at it. Let me know if you have further questions about how the query works, too. Hope this helps!

  • Thanks for the response JonFox.

    What does each row in @EmployeeVacation represent?

    Each row represents the total number of vacation hours an employee is entitled to per year. For example, an employee is hired in 2009 and has 100 hours of vacation. When the employee is entered into the system a row is entered into the EmployeeVacation table. Now lets say in 2010 the employee is granted another 8 hours. When the employee's data is updated the previous entry is "closed off" by adding in the end date. A new row is added with 108 hours of vacation time and a null end date. I leave the end date null on the current entry as a second way of determining it's the most current entry. Max(ID) AND EndDate IS NULL.

    The reason I am doing this is because I need to keep a record of the number of hours an employee was entitled to in the past. I am all ears if you can suggest a more efficient way of handling this data.

    Now when a new year rolls around and the system resets an employees vacation time, it just takes the most recent TotalVacation value and adds it to (Sum(Credits) - Sum(Debits)) of the previous year to account for carryover (If carryover is allowed). This value is then entered into the AbsenceLedger.

    By having the EmployeeVacation table it's easy to determine how many hours an employee is entitled to now or anytime in the past.

    I can use the AbsenceLedger table to determine how many hours an employee has remaining during any date range by simply taking the

    SUM(Credits) - Sum(Debits) for a given user selected date range.

    Does this make sense? Or is my logic flawed?

    Thanks again for the response and feedback. It's much appreciated.

  • Ok, I think I was pretty close then with my assumptions. The code I posted should work for whenever you want to find out how much vacation time an employee has left right now...it finds the most recent period and then sums the absence ledger entries that fall into that time span. It will need a little bit of tweaking to be usable to return available leave over a custom date range, but shouldn't take too much to make it work.

    I'll have to give your design some further thought, but I do wonder if you'd be better off setting the EndDate for the period when you first create the row, instead of leaving it NULL; you most likely already know what the last day of the period will be, right? In any case, I would suggest building a check constraint that won't allow StartDate and EndDate values that create an overlap with previous rows; if that's allowed to happen it will really mess up your calculations, so you want to make it physically impossible to occur.

    Now when a new year rolls around and the system resets an employees vacation time, it just takes the most recent TotalVacation value and adds it to (Sum(Credits) - Sum(Debits)) of the previous year to account for carryover (If carryover is allowed). This value is then entered into the AbsenceLedger.

    How do you determine if carryover is allowed?

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply