January 17, 2019 at 1:20 pm
Hi,
I have a very simple table
create table #employee
(
EmpNumber int,
HireDate datetime,
TerminationDate datetime
)
insert into #employee (EmpNumber, HireDate, TerminationDate) values (12, '09/02/1986', '12/31/2022' );
insert into #employee (EmpNumber, HireDate, TerminationDate) values (13, '09/01/1986', '12/31/2012' );
insert into #employee (EmpNumber, HireDate, TerminationDate) values (14, '09/02/2001', '12/31/2002' );
insert into #employee (EmpNumber, HireDate, TerminationDate) values (15, '09/02/2011', '12/31/2022' );
And on the end result of the query I would like to get the employees active per day.
Date ActiveEmployees
01/01/1986 10
02/01/1986 11
etc etc.
I am not sure how to do it, because I would also like to have every single date, even though if I didn't have any employee active.
Thanks
Astrid
January 17, 2019 at 2:11 pm
Do you have an calendar table? Otherwise, you'd have to 'generate' one. I used Jeff Moden's tally table (attached) to generate the dates for 2019 and modified your data to show people hired and terminated at different times
create table #employee
(
EmpNumber int,
HireDate datetime,
TerminationDate datetime
)
insert into #employee (EmpNumber, HireDate, TerminationDate) values (12, '09/02/1986', '12/31/2022' );
insert into #employee (EmpNumber, HireDate, TerminationDate) values (13, '09/01/1986', '12/31/2012' );
insert into #employee (EmpNumber, HireDate, TerminationDate) values (14, '09/02/2001', '01/15/2019' );
insert into #employee (EmpNumber, HireDate, TerminationDate) values (15, '09/02/2011', '01/20/2019' );
;with cte as (
select DateAdd(day, N, '1/1/2019') CalDate
from dbo.Tally(0,365))
select c.Caldate, count(e.EmpNumber) EmpCount
from cte c
left join #employee e
on e.HireDate <= c.CalDate
and e.TerminationDate >= c.CalDate
group by C.CalDate
For better, quicker answers, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
January 17, 2019 at 2:20 pm
I am confused, I have a date table, but I am not sure how to join it. that is my problem, now i see that cte, it is not my day.... I am checking
January 17, 2019 at 2:31 pm
Mike01 - Thursday, January 17, 2019 2:11 PMDo you have an calendar table? Otherwise, you'd have to 'generate' one. I used Jeff Moden's tally table (attached) to generate the dates for 2019 and modified your data to show people hired and terminated at different times
create table #employee
(
EmpNumber int,
HireDate datetime,
TerminationDate datetime
)insert into #employee (EmpNumber, HireDate, TerminationDate) values (12, '09/02/1986', '12/31/2022' );
insert into #employee (EmpNumber, HireDate, TerminationDate) values (13, '09/01/1986', '12/31/2012' );
insert into #employee (EmpNumber, HireDate, TerminationDate) values (14, '09/02/2001', '01/15/2019' );
insert into #employee (EmpNumber, HireDate, TerminationDate) values (15, '09/02/2011', '01/20/2019' );;with cte as (
select DateAdd(day, N, '1/1/2019') CalDate
from dbo.Tally(0,365))select c.Caldate, count(e.EmpNumber) EmpCount
from cte c
left join #employee e
on e.HireDate <= c.CalDate
and e.TerminationDate >= c.CalDate
group by C.CalDate
I am not sure what you mean by
;with cte as (
select DateAdd(day, N, '1/1/2019') CalDate
from dbo.Tally(0,365))
I dont know what table tally is.
January 17, 2019 at 3:10 pm
I dont know what table tally is.
You will after reading this: The "Numbers" or "Tally" Table: What it is and how it replaces a loop
You can create one like so (the code here will generate a tally table with 100K numbers:
CREATE TABLE dbo.tally(N bigint NOT NULL);
ALTER TABLE dbo.tally ADD CONSTRAINT pk_cl__dbo_tally__N PRIMARY KEY CLUSTERED(N)
WITH (FILLFACTOR=100);
ALTER TABLE dbo.tally ADD CONSTRAINT uq_nc__dbo_tally__N UNIQUE NONCLUSTERED(N);
INSERT dbo.tally (N)
SELECT TOP (100000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns a, sys.all_columns b;
-- Itzik Ben-Gan 2001
January 17, 2019 at 6:22 pm
astrid 69000 - Thursday, January 17, 2019 2:20 PMI am confused, I have a date table, but I am not sure how to join it. that is my problem, now i see that cte, it is not my day.... I am checking
The CTE is generating a dynamic table of dates... Just think of it as a black box that returns a table of dates, if that helps. (Yeah, I know you're not supposed to use code you don't understand, but sometimes you have to worry about one piece at a time.
January 17, 2019 at 9:43 pm
Duplicated post deleted.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 17, 2019 at 9:44 pm
astrid 69000 - Thursday, January 17, 2019 1:20 PMHi,
I have a very simple table
create table #employee
(
EmpNumber int,
HireDate datetime,
TerminationDate datetime
)insert into #employee (EmpNumber, HireDate, TerminationDate) values (12, '09/02/1986', '12/31/2022' );
insert into #employee (EmpNumber, HireDate, TerminationDate) values (13, '09/01/1986', '12/31/2012' );
insert into #employee (EmpNumber, HireDate, TerminationDate) values (14, '09/02/2001', '12/31/2002' );
insert into #employee (EmpNumber, HireDate, TerminationDate) values (15, '09/02/2011', '12/31/2022' );And on the end result of the query I would like to get the employees active per day.
Date ActiveEmployees
01/01/1986 10
02/01/1986 11etc etc.
I am not sure how to do it, because I would also like to have every single date, even though if I didn't have any employee active.
Thanks
Astrid
I don't know how may employees are involved but I do know that the methods demonstrated so far will produce a Cartesian Product of the # of employees times the number of days in the range, which is 13,514 days to cover the complete whole year date range prescribed by the 4 employee's data that you provided. That means that just 4 employees you have will produce an internal CROSS JOIN of 54,056 rows to produce the desired result using the current methods. That means that for every 1,000 employee rows, the internal CROSS JOIN will have 13,514,000 internal rows generated. This can get expensive quickly.
With that extra bit of knowledge, I have xx questions for you...
1. How many total employee rows are we talking about?
2. What is the full range of dates you're expecting?
3. How often to you have to calculate this?
--Jeff Moden
Change is inevitable... Change for the better is not.
January 18, 2019 at 10:12 am
Alan.B - Thursday, January 17, 2019 3:10 PMI dont know what table tally is.
You will after reading this: The "Numbers" or "Tally" Table: What it is and how it replaces a loop
You can create one like so (the code here will generate a tally table with 100K numbers:
CREATE TABLE dbo.tally(N bigint NOT NULL);
ALTER TABLE dbo.tally ADD CONSTRAINT pk_cl__dbo_tally__N PRIMARY KEY CLUSTERED(N)
WITH (FILLFACTOR=100);
ALTER TABLE dbo.tally ADD CONSTRAINT uq_nc__dbo_tally__N UNIQUE NONCLUSTERED(N);INSERT dbo.tally (N)
SELECT TOP (100000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns a, sys.all_columns b;
First, you should use a table value constructor to generate the appropriate number of rows, because reading from disk is relatively expensive.
Second, you should use an inline table-valued function instead of an instantiated table for your tally table, because, again, reading from disk is relatively expensive. Itzik Ben Gan created a blazing fast function that you should look up. If you can't create user-defined functions, you can still use the vast majority of the code as a CTE.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
January 18, 2019 at 2:48 pm
Your simple table is very badly done. You don't know about current data types. You don't know that a table has to have a primary key. Since a table models a set of things, its name has to be a plural or collective – unless you actually do only have one employee.
You don't know that an integer or any other numeric can never be an identifier by definition. In short, you are not writing SQL yet, but only mimic punch cards.
CREATE TABLE Personnel
(emp_nbr CHAR (3) NOT NULL PRIMARY KEY,
hire_date DATE NOT NULL,
termination_date DATE, --- null is still employes
CHECK(hire_date <= termination_date));
Please look at the collective name, the character string identifier, the ISO 11179 format on the names and constraint among the dates. Get a copy of the temporal SQL book from the University of Arizona by Rick Snodgrass; is a free PDF download and should be part of every working SQL programmers library. Among other things, you will learn that the only legal format in ANSI-ISO standard SQL is"yyyy-mm-dd" and not the ambiguous local dialect you used. I had to guess at exactly what those dates were to translate them into ANSI-ISO standards.
You also need to learn for the insertion statement that Microsoft is had for quite a few years now. You are still using the "one punch card at a time" style that Sybase originally had decades ago.
INSERT INTO Personnel
VALUES
('012', '1986-09-02', '2022-02-31' ),
('013', '1986-09-01', '2012-12-31' ),
('014', '2001-09-02', '2002-12-31' ),
('015', '2011-09-02', '2022-12-31' );
Build a calendar table with one column for the calendar data and other columns to show whatever your business needs in the way of temporal information. Do not try to calculate holidays in SQL -- Easter alone requires too much math and there are two of them (Orthodox and Catholic).
The Ordinal business day is a good trick. Number the days from whenever your calendar starts and repeat a number for a weekend or company holiday.
CREATE TABLE Calendar
(cal_date DATE NOT NULL PRIMARY KEY,
ordinal_business_nbr INTEGER NOT NULL,
...);
INSERT INTO Calendar
VALUES ('2007-04-05', 42);
('2007-04-06', 43); -- Good Friday
('2007-04-07', 43);
('2007-04-08', 43); -- Easter Sunday
('2007-04-09', 44);
('2007-04-10', 45); -- Tuesday, back to work
To compute the business days from '2007-04-05', Thursday before Easter in 2007 to '2007-04-10', the next Tuesday:
SELECT (C2.ordinal_business_nbr - C1.ordinal_business_nbr -1) AS business_day_cnt
FROM Calendar AS C1, Calendar AS C2
WHERE C1.cal_date = '2007-04-05'
AND C2.cal_date = '2007-04-10';
This saves having to scan all of the rows within the range to get a sum. The -1 is to avoid counting the current day as one of the business days, but you might have a different rule in your company.
Please post DDL and follow ANSI/ISO standards when asking for help.
January 18, 2019 at 3:13 pm
drew.allen - Friday, January 18, 2019 10:12 AMAlan.B - Thursday, January 17, 2019 3:10 PMI dont know what table tally is.
You will after reading this: The "Numbers" or "Tally" Table: What it is and how it replaces a loop
You can create one like so (the code here will generate a tally table with 100K numbers:
CREATE TABLE dbo.tally(N bigint NOT NULL);
ALTER TABLE dbo.tally ADD CONSTRAINT pk_cl__dbo_tally__N PRIMARY KEY CLUSTERED(N)
WITH (FILLFACTOR=100);
ALTER TABLE dbo.tally ADD CONSTRAINT uq_nc__dbo_tally__N UNIQUE NONCLUSTERED(N);INSERT dbo.tally (N)
SELECT TOP (100000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns a, sys.all_columns b;First, you should use a table value constructor to generate the appropriate number of rows, because reading from disk is relatively expensive.
Second, you should use an inline table-valued function instead of an instantiated table for your tally table, because, again, reading from disk is relatively expensive. Itzik Ben Gan created a blazing fast function that you should look up. If you can't create user-defined functions, you can still use the vast majority of the code as a CTE.
Drew
If you read from an instantiated table, especially if you just built it, you won't be reading it from memory. A physical Tally table will usually beat Itzik's wonderful cascading CTE method for sheer performance.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 18, 2019 at 4:17 pm
astrid 69000 - Thursday, January 17, 2019 1:20 PM
Your simple table is very badly done. You don't know about current data types. You don't know that a table has to have a primary key. Since a table models a set of things, its name has to be a plural or collective – unless you actually do only have one employee.
You don't know that an integer or any other numeric can never be an identifier by definition. In short, you are not writing SQL yet, but only mimic punch cards.
CREATE TABLE Personnel
(emp_nbr CHAR (3) NOT NULL PRIMARY KEY,
hire_date DATE NOT NULL,
termination_date DATE, --- null is still employes
CHECK(hire_date <= termination_date));
Please look at the collective name, the character string identifier, the ISO 11179 format on the names and constraint among the dates. Get a copy of the temporal SQL book from the University of Arizona by Rick Snodgrass; is a free PDF download and should be part of every working SQL programmers library. Among other things, you will learn that the only legal format in ANSI-ISO standard SQL is"yyyy-mm-dd" and not the ambiguous local dialect you used. I had to guess at exactly what those dates were to translate them into ANSI-ISO standards.
You also need to learn for the insertion statement that Microsoft is had for quite a few years now. You are still using the "one punch card at a time" style that Sybase originally had decades ago.
INSERT INTO Personnel
VALUES
('012', '1986-09-02', '2022-02-31' ),
('013', '1986-09-01', '2012-12-31' ),
('014', '2001-09-02', '2002-12-31' ),
('015', '2011-09-02', '2022-12-31' );
Build a calendar table with one column for the calendar data and other columns to show whatever your business needs in the way of temporal information. Do not try to calculate holidays in SQL -- Easter alone requires too much math and there are two of them (Orthodox and Catholic).
The Ordinal business day is a good trick. Number the days from whenever your calendar starts and repeat a number for a weekend or company holiday.
CREATE TABLE Calendar
(cal_date DATE NOT NULL PRIMARY KEY,
ordinal_business_nbr INTEGER NOT NULL,
...);
INSERT INTO Calendar
VALUES ('2007-04-05', 42);
('2007-04-06', 43); -- Good Friday
('2007-04-07', 43);
('2007-04-08', 43); -- Easter Sunday
('2007-04-09', 44);
('2007-04-10', 45); -- Tuesday, back to work
To compute the business days from '2007-04-05', Thursday before Easter in 2007 to '2007-04-10', the next Tuesday:
SELECT (C2.ordinal_business_nbr - C1.ordinal_business_nbr -1) AS business_day_cnt
FROM Calendar AS C1, Calendar AS C2
WHERE C1.cal_date = '2007-04-05'
AND C2.cal_date = '2007-04-10';
This saves having to scan all of the rows within the range to get a sum. The -1 is to avoid counting the current day as one of the business days, but you might have a different rule in your company.
If, by current data types, you mean not using DATETIME, then you'll be "pleased" to know that the newer datatypes are actually NOT ISO compliant when it comes to date math whereas the DATETIME and, SMALLDATETIME datatypes actually are.
You continued observation about not using an Integer or similar datatype as a row identifier also continues to demonstrate that you know nothing of the advantages of doing so nor are you aware of the math that goes on behind the scenes for such a column.
Last but not least, the OP isn't trying to calculate business days. He's trying to calculate the number of employees for any given date for all dates in the possible temporal range.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 18, 2019 at 4:20 pm
@astrid 69000 ,
Do you have answers to my questions? 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply