How to show multiple individuals>

  • Hello,

    Sorry I couldn't think if a descriptive title. I found this on the internet and really it works well. It can list someone's beginning and ending dates for their leave. This is nice because I can use this in a report without having to show each and every day. I'm not going to tell you I fully understand how this works, because I don't; but I will read up on it.

    My problem is, how do I show it for all employees and not just the one in the Where? A temp table or Cross Apply could work, but I haven't figured that out?

    Below are two different versions, with some test data.

    Create Table #LeaveTable(Co int, Emp int, LeaveDate date);
    Insert Into #LeaveTable (Co, Emp, LeaveDate)
    Values
    (1, 1001, '12/22/2020')
    , (1, 1001, '12/23/2020')
    , (1, 1001, '12/24/2020')
    , (1, 1001, '12/28/2020')
    , (1, 1001, '12/29/2020')
    , (1, 1001, '12/30/2020')
    , (1, 1001, '12/31/2020')
    , (1, 1002, '11/23/2020')
    , (1, 1002, '11/24/2020')
    , (1, 1002, '11/25/2020');

    Example 1:

    WITH
    -- This table contains all the distinct date
    -- instances in the data set
    dates(date) AS (
    SELECT DISTINCT CAST(LeaveDate AS DATE)
    FROM #LeaveTable
    WHERE Emp = 1001 And LeaveDate >= GETDATE()
    ),

    groups AS (
    SELECT
    ROW_NUMBER() OVER (ORDER BY date) AS rn,
    dateadd(day, -ROW_NUMBER() OVER (ORDER BY date), date) AS grp,
    date
    FROM dates
    )
    SELECT
    COUNT(*) AS consecutiveDates,
    MIN(date) AS minDate,
    MAX(date) AS maxDate
    --Into #MyTempTable
    FROM groups
    GROUP BY grp
    ORDER BY 2 ;

    Example 2:

    WITH 
    groups(date, grp) AS (
    SELECT DISTINCT
    CAST(LeaveDate AS DATE),
    dateadd(day,
    -DENSE_RANK() OVER (ORDER BY CAST(LeaveDate AS DATE)),
    CAST(LeaveDate AS DATE)) AS grp
    FROM #LeaveTable
    WHERE Emp = 1002
    And LeaveDate >= GETDATE()
    )
    SELECT
    COUNT(*) AS consecutiveDates,
    MIN(date) AS minDate,
    MAX(date) AS maxDate
    FROM groups
    GROUP BY grp
    ORDER BY 2;

    Either will work.

    Thank you for any help,

    • This topic was modified 4 years, 1 month ago by  Ken at work.
  • ???

    You left out (1) what is "this"... the code? and (2) what are you trying to accomplish? Show everyone's days they won't be at work? (so unpacking the interval between (leaveDate, returnDate) so that you can show that on a calendar or count the number of people who will be absent on a given date from work?

  • Ken at work wrote:

    Create Table #LeaveTable(Co int, Emp int, LeaveDate date);
    Insert Into #LeaveTable (Co, Emp, LeaveDate)
    Values
    (1, 1001, '12/22/2020')
    , (1, 1001, '12/23/2020')
    , (1, 1001, '12/24/2020')
    , (1, 1001, '12/28/2020')
    , (1, 1001, '12/29/2020')
    , (1, 1001, '12/30/2020')
    , (1, 1001, '12/31/2020')
    , (1, 1002, '11/23/2020')
    , (1, 1002, '11/24/2020')
    , (1, 1002, '11/25/2020');

    This data is inherently flawed.  There is nothing in the data to state whether a date is a start or end date of a leave period and there's nothing I see that says there's a guarantee that all dates are guaranteed to be present.  Any code written against such data is guaranteed to eventually fail due to missing data.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • This is how the data is stored in the system. There is no start or end date in the system. Any actual day off is recorded in a row separately. I can't change that.

    The code, after the test data, is what shows the start and end dates which is great. But I can't figure out how to show this from all employees at once and identify which employee is taking time-off for which dates.

  • What would I like to accomplish? I would like a single result set that shows all employees and the Start and End dates of their time off:

    Column A: EE

    Column B: Leave Start Date (labeled minDate in queries I've included)

    Column C: Leave End Date (labeled maxDate in queries I've included)

    Edit: P.S. The code I included does this but only for one employee. If I take out the Where statement I don't know how to show each employee.

    • This reply was modified 4 years, 1 month ago by  Ken at work.
  • To show all employees:

    1. Remove the filter on EmployeeID from the WHERE clause
    2. Add EmployeeID to the grouping

    Using Example 2:

    -- Add Emp to groups so it can be used later
    WITH groups(Emp, [date], grp, [TotalDatesInPeriod]) AS (
    SELECT DISTINCT
    Emp,
    CAST(LeaveDate AS DATE),
    dateadd(day, -DENSE_RANK() OVER (ORDER BY CAST(LeaveDate AS DATE)), CAST(LeaveDate AS DATE)) AS grp,
    COUNT(*) OVER (PARTITION BY Emp) AS [TotalDatesInPeriod]
    FROM #LeaveTable
    WHERE
    -- remove this: Emp = 1002 AND
    LeaveDate >= GETDATE()
    )
    -- Add Emp to SELECT, GROUP BY, and ORDER BY
    SELECT Emp,
    COUNT(*) AS consecutiveDates,
    MIN(date) AS minDate,
    MAX(date) AS maxDate,
    MAX([TotalDatesInPeriod]) AS [TotalDatesInPeriod]
    FROM groups
    GROUP BY Emp, grp
    ORDER BY Emp, 2;

     

    Eddie Wuerch
    MCM: SQL

  • Awesome, that worked perfect

  • You don't know how to do a data model for temporal data. Please get a copy of Rick Snodgrass's book from the University of Arizona on temporal queries in SQL. It's available as a free download in PDF format. You don't seem to know the table must have a key. You don't know the only format allowed in ANSI/ISO standard SQL for a date is based on ISO 8601. It's also interesting to see that after 50 years, people still put a comma at the front of the line in a program text – we used to do that on punch cards so that we could rearrange the deck easily at the expense of making your program harder to read (yes, I did text readability studies years ago for AIRMICS). You also don't seem to know that identifiers are on a nominal scale and cannot be integers; this is usually covered in the first week of any course on basic data modeling. Table names should never include meta data. What you have what you have posted has a design flaw called a "tibble" and we make fun of you for it. What I think you meant to post would look like this:

    CREATE TABLE Employee_Leave

    (company_duns CHAR(8) NOT NULL,

    emp_id CHAR(4) NOT NULL,

    leave_start_date DATE DEFAULT CURRENT_TIMESTAMP NOT NULL,

    leave_end_date DATE,

    CHECK(leave_start_date <= leave_end_date)

    PRIMARY KEY(company_duns, emp_id, leave_start_date);

    While SQL Server does not support ANSI/ISO standard info interval data types, we can fake them with a start and end date columns in the constraint on their ordering. You might want to take some time to understand how the null works in the check constraint.

    INSERT INTO Employee_Leave

    VALUES

    (1, 1001, '2020-12-22', NULL),

    (1, 1001, '2020-12-23', NULL),

    (1, 1001, '2020-12-24', NULL),

    (1, 1001, '2020-12-28', NULL),

    (1, 1001, '2020-12-29', NULL),

    (1, 1001, '2020-12-30', NULL),

    (1, 1001, '2020-12-31', NULL),

    (1, 1002, '2020-11-23', NULL),

    (1, 1002, '2020-11-24', NULL),

    (1, 1002, '2020-11-25', NULL);

    This simply translate your incorrect design into valid SQL. What you really wanted was:

    INSERT INTO Employee_Leave

    VALUES

    (1, 1001, '2020-12-22', 2020-12-24'),

    (1, 1001, '2020-12-28', '2020-12-31'),

    (1, 1002, '2020-11-23', '2020-11-25');

    Your computations now become quite a bit easier:

    SELECT DATEDIFF (leave_start_date, leave_end_date)

    MIN(leave_start_date) AS min_date, MAX(leave_end_date) AS max_date

    FROM Employee_Leave;

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • jcelko212 32090 wrote:

    You don't know how to do a data model for temporal data. Please get a copy of Rick Snodgrass's book from the University of Arizona on temporal queries in SQL.

    Since you know so much, post the link for the PDF itself!

    As for the rest of it, the OP didn't design this.  It's basically a log table that identifies only what's needed... the IDs for the company and the employee as well as the day taken as a vacation day.

    Ken at work wrote:

    This is how the data is stored in the system. There is no start or end date in the system. Any actual day off is recorded in a row separately. I can't change that.

    The code, after the test data, is what shows the start and end dates which is great. But I can't figure out how to show this from all employees at once and identify which employee is taking time-off for which dates.

    Ah... ok.  Got it.  My apologies.  I misread what it contained.  I made the mistake of thinking that it was an EAV and that start and end dates were implied in the data rather than it being a simple log of dates that were taken as whole vacation days.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • http://www2.cs.arizona.edu/~rts/publications.html

    This also has his other books on temporal SQL, etc.

     

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • >> ]This is how the data is stored in the system. There is no start or end date in the system. Any actual day off is recorded in a row separately. I can't change that. <<

    If you record a day off as a separate row, you're saying it's totally different from any other day (sort of sounds like the start of Hanukkah, doesn't it? Why is this day. Not like any other day…). The day off is just a day, but it has what's called an attribute property; how is this day used? There is a reason the ANSI/ISO standards have a temporal interval data type.

    I'm sorry to tell you, but based on 30+ years of doing SQL, you're screwed. You will keep writing slower and slower code, and will become more and more convoluted as you tried to construct a consistent temporal model on the fly. Perhaps someday Microsoft will add interval data types. After all, they finally did give us ANSI/ISO style DATE, TIME and DATETIME2() datatypes to replace the old Sybase/UNIX ones.

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • I believe all you need to do is specify the proper PARTITION BY clause in the code (new code is below).  The code originally comes from Itzik Ben-Gan (I believe), a great SQL Server researcher and author, , as part of his "Gaps and Islands" work.  At least he led the popularization of it.

    The only thing I see wrong with the table is that you didn't specify the natural key, which is both correct for modeling and gains performance in this case because it will speed up processing of the table considerably (for the love of all that is holy, do not cluster this table by an identity column (yes, technically a column with an identity property, for the ultra-pendantic)).

    I think one could argue that the current structure is actually a cleaner design (for dates specifically, since they are so easily enumerated).  I, too, prefer the current table structure for this.  For example, if an employee had to be called in from leave for an emergency, with the current table all you have to do is delete the dates(s) they were called back in.  With a range, you'd need code to split the row into two remaining rows.  Similarly, if someone adds a date later, you have to check if that new date falls within an existing range and add it to that range rather than just simply insert the row, as in the current table.

    Also, you wouldn't put nonwork dates in the Leaves table.  Ah, so now you have to determine and code around each employee's nonwork days in order to build the proper range.  Ugh!  And such days can change over time.  Arghh!

    Create Table #LeaveTable(Co int NOT NULL, Emp int NOT NULL, LeaveDate date NOT NULL,

    PRIMARY KEY CLUSTERED ( Co, Emp, LeaveDate ) );

     

    ;WITH
    groups AS (
    SELECT
    Co, Emp, LeaveDate,
    ROW_NUMBER() OVER (PARTITION BY Co, Emp ORDER BY LeaveDate) AS rn,
    DATEADD(DAY, -ROW_NUMBER() OVER (PARTITION BY Co, Emp ORDER BY LeaveDate), LeaveDate) AS grp
    FROM #LeaveTable
    )
    SELECT
    Co, Emp,
    MIN(LeaveDate) AS LeaveStartDate,
    MAX(LeaveDate) AS LeaveEndDate,
    COUNT(*) AS ConsecutiveLeaveDays
    FROM groups
    GROUP BY Co, Emp, Grp
    ORDER BY Co, Emp, LeaveStartDate

     

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • jcelko212 32090 wrote:

    http://www2.cs.arizona.edu/~rts/publications.html

    This also has his other books on temporal SQL, etc.

    Excellent.  Thanks, Joe.  You've mentioned this site many times.  Consider adding the link to your signature line. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • jcelko212 32090 wrote:

    CREATE TABLE Employee_Leave

    While I agree that the original table name that the OP posted left some to be desired, so does the one you created.  Wouldn't Employee_Leave_Days_Taken be more sufficiently descriptive to keep people from having to have esoteric knowledge of the database to understand that this table will have one entry for each day taken by the given employee?

    The rest of your post is a pretty good lesson on design, so thanks for that.

    Of course, I'll strongly disagree to the emp_id being relegated to a CHAR(4) and also state that your CHECK for the start date being less than or equal to the end data will fail if the end date is not known at the time the entry is made, which is sadly the frequent case for when people "design" tables and data (EDIT: Sorry... that was a mistake on my part.  It works just fine).  I think the CHECK is spot on if NULLs are never used for End Dates whether they're known at the time of entry or not (far future surrogate for NULL should be used) (EDIT: I still hate the idea of NULL end dates instead of far-future dates, though)

    By the same token, this table may be nothing more than a log of special events and so there may be no need for end dates.  Yes, I agree that can make for a hell of a reporting problem such as what the OP was after but that's better than dealing with what most people end up doing and that's using a NULL for end dates.

    Anyway... thanks again for the link you provided and the time you took to post some examples of what you were talking about.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    jcelko212 32090 wrote:

    CREATE TABLE Employee_Leave

    also state that your CHECK for the start date being less than or equal to the end data will fail if the end date is not known at the time the entry is made, which is sadly the frequent case for when people "design" tables and data.  I think the CHECK is spot on if NULLs are never used for End Dates whether they're known at the time of entry or not (far future surrogate for NULL should be used)

    I don't believe the CHECK will fail if end date is NULL.  A CHECK clause must fail, not just be NULL, to prevent the row from being INSERTed; somewhat analogous to how a WHERE condition must be true, and not NULL, for the row to be SELECTed.

    CREATE TABLE #test1 ( col1 date NULL, col2 date NULL, CHECK(col2 > col1) )

    INSERT INTO #test1 VALUES

    ( GETDATE(), GETDATE()+1 ),

    ( GETDATE(), NULL ),

    ( NULL, GETDATE()+1 )

    SELECT * FROM #test1

     

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply