October 24, 2017 at 7:30 am
lars.hesselberg 49028 - Tuesday, October 24, 2017 3:35 AMHi,How about this approach:
DECLARE @BegDate DATETIME = '20000701',
@EndDate DATETIME = '20000707';
Declare @tHolidays TABLE (Holiday DATETIME PRIMARY KEY);
INSERT INTO @tHolidays (Holiday)
SELECT Holiday FROM (VALUES (CAST('20000529' AS DATETIME)),
('20000704'), ('20000904'), ('20001123'), ('20001124'), ('20001225'), ('20010101'), ('20010528'), ('20010704'), ('20010903'), ('20011122'),
('20011123'), ('20011225'), ('20020101'), ('20020527'), ('20020704'), ('20020902'), ('20021128'), ('20021129'), ('20021225'), ('20030101'),
('20030526'), ('20030704'), ('20030901'), ('20031127'), ('20031128'), ('20031225'), ('20040101'), ('20040531'), ('20040705'), ('20040906'),
('20041125'), ('20041126'), ('20041224'), ('20041231'), ('20050530'), ('20050704'), ('20050905'), ('20051124'), ('20051125'), ('20051226'),
('20060102'), ('20060529'), ('20060704'), ('20060904'), ('20061123'), ('20061124'), ('20061225'), ('20070101'), ('20070528'), ('20070704'),
('20070903'), ('20071122'), ('20071123'), ('20071225'), ('20080101'), ('20080526'), ('20080704'), ('20080901'), ('20081127'), ('20081128'),
('20081225'), ('20090101'), ('20090525'), ('20090703'), ('20090907'), ('20091126'), ('20091127'), ('20091225'), ('20100101'), ('20100531'),
('20100705'), ('20100906'), ('20101125'), ('20101126'), ('20101224'), ('20101231'), ('20110530'), ('20110704'), ('20110905'), ('20111124'),
('20111125'), ('20111226'), ('20120102'), ('20120528'), ('20120704'), ('20120903'), ('20121122'), ('20121123'), ('20121225'), ('20130101'),
('20130527'), ('20130704'), ('20130902'), ('20131128'), ('20131129'), ('20131225'), ('20140101'), ('20140526'), ('20140704'), ('20140901'),
('20141127'), ('20141128'), ('20141225'), ('20150101'), ('20150525'), ('20150703'), ('20150907'), ('20151126'), ('20151127'), ('20151225'),
('20160101'), ('20160530'), ('20160704'), ('20160905'), ('20161124'), ('20161125'), ('20161226'), ('20170102'), ('20170529'), ('20170704'),
('20170904'), ('20171123'), ('20171124'), ('20171225'), ('20180101'), ('20180528'), ('20180704'), ('20180903'), ('20181122'), ('20181123'),
('20181225'), ('20190101'), ('20190527'), ('20190704'), ('20190902'), ('20191128'), ('20191129'), ('20191225'), ('20200101'), ('20200525'),
('20200703'), ('20200907'), ('20201126'), ('20201127'), ('20201225'), ('20210101'), ('20210531'), ('20210705'), ('20210906'), ('20211125'),
('20211126'), ('20211224'), ('20211231'), ('20220530'), ('20220704'), ('20220905'), ('20221124'), ('20221125'), ('20221226'), ('20230102'),
('20230529'), ('20230704'), ('20230904'), ('20231123'), ('20231124'), ('20231225'), ('20240101'), ('20240527'), ('20240704'), ('20240902'),
('20241128'), ('20241129'), ('20241225'), ('20250101'), ('20250526'), ('20250704'), ('20250901'), ('20251127'), ('20251128'), ('20251225'),
('20260101'), ('20260525'), ('20260703'), ('20260907'), ('20261126'), ('20261127'), ('20261225'), ('20270101'), ('20270531'), ('20270705'),
('20270906'), ('20271125'), ('20271126'), ('20271224'), ('20271231'), ('20280529'), ('20280704'), ('20280904'), ('20281123'), ('20281124'),
('20281225'), ('20290101'), ('20290528'), ('20290704'), ('20290903'), ('20291122'), ('20291123'), ('20291225')
) h (Holiday);
SELECT
(DATEDIFF(dd, @BegDate, @EndDate) + 1)
-(DATEDIFF(wk, @BegDate, @EndDate) * 2)
-(CASE WHEN DATENAME(dw, @BegDate) = 'Sunday' THEN 1 ELSE 0 END)
-(CASE WHEN DATENAME(dw, @EndDate) = 'Saturday' THEN 1 ELSE 0 END)
- (SELECT SUM(CASE WHEN DATENAME(dw, Holiday) = 'Saturday' Or DATENAME(dw, Holiday) = 'Sunday' THEN 0 ELSE 1 END)
FROM @tHolidays WHERE Holiday >= @BegDate AND Holiday <= @EndDate);
Lars
Good idea but try your code with the following dates, Lars.
DECLARE @BegDate DATETIME = '20000101',
@EndDate DATETIME = '20000221'
;
It returns a NULL even though there are several weekends involved that should be counted as "holidays"/non-working days. The reason is because the code only returns a value if the happens to be a holiday.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 24, 2017 at 8:07 am
Hi Jeff,
You got me! The reason is obvious: number - NULL = NULL.
Here is the fix:
-- Comment in/out as needed for testing
--DECLARE @BegDate DATETIME = '20000701', @EndDate DATETIME = '20000707'; -- 5 days, 1 Holiday => 4 Workdays
DECLARE @BegDate DATETIME = '20000101', @EndDate DATETIME = '20000221'; -- 36 days, No holidays => 36 workdays
--DECLARE @BegDate DATETIME = '20021225', @EndDate DATETIME = '20030101'; -- 6 days, 2 holidays => 4 workdays
select (DATEDIFF(dd, @BegDate, @EndDate) + 1)
Declare @tHolidays TABLE (Holiday DATETIME PRIMARY KEY);
INSERT INTO @tHolidays (Holiday)
SELECT Holiday FROM (VALUES (CAST('20000529' AS DATETIME)),
('20000704'), ('20000904'), ('20001123'), ('20001124'), ('20001225'), ('20010101'), ('20010528'), ('20010704'), ('20010903'), ('20011122'),
('20011123'), ('20011225'), ('20020101'), ('20020527'), ('20020704'), ('20020902'), ('20021128'), ('20021129'), ('20021225'), ('20030101'),
('20030526'), ('20030704'), ('20030901'), ('20031127'), ('20031128'), ('20031225'), ('20040101'), ('20040531'), ('20040705'), ('20040906'),
('20041125'), ('20041126'), ('20041224'), ('20041231'), ('20050530'), ('20050704'), ('20050905'), ('20051124'), ('20051125'), ('20051226'),
('20060102'), ('20060529'), ('20060704'), ('20060904'), ('20061123'), ('20061124'), ('20061225'), ('20070101'), ('20070528'), ('20070704'),
('20070903'), ('20071122'), ('20071123'), ('20071225'), ('20080101'), ('20080526'), ('20080704'), ('20080901'), ('20081127'), ('20081128'),
('20081225'), ('20090101'), ('20090525'), ('20090703'), ('20090907'), ('20091126'), ('20091127'), ('20091225'), ('20100101'), ('20100531'),
('20100705'), ('20100906'), ('20101125'), ('20101126'), ('20101224'), ('20101231'), ('20110530'), ('20110704'), ('20110905'), ('20111124'),
('20111125'), ('20111226'), ('20120102'), ('20120528'), ('20120704'), ('20120903'), ('20121122'), ('20121123'), ('20121225'), ('20130101'),
('20130527'), ('20130704'), ('20130902'), ('20131128'), ('20131129'), ('20131225'), ('20140101'), ('20140526'), ('20140704'), ('20140901'),
('20141127'), ('20141128'), ('20141225'), ('20150101'), ('20150525'), ('20150703'), ('20150907'), ('20151126'), ('20151127'), ('20151225'),
('20160101'), ('20160530'), ('20160704'), ('20160905'), ('20161124'), ('20161125'), ('20161226'), ('20170102'), ('20170529'), ('20170704'),
('20170904'), ('20171123'), ('20171124'), ('20171225'), ('20180101'), ('20180528'), ('20180704'), ('20180903'), ('20181122'), ('20181123'),
('20181225'), ('20190101'), ('20190527'), ('20190704'), ('20190902'), ('20191128'), ('20191129'), ('20191225'), ('20200101'), ('20200525'),
('20200703'), ('20200907'), ('20201126'), ('20201127'), ('20201225'), ('20210101'), ('20210531'), ('20210705'), ('20210906'), ('20211125'),
('20211126'), ('20211224'), ('20211231'), ('20220530'), ('20220704'), ('20220905'), ('20221124'), ('20221125'), ('20221226'), ('20230102'),
('20230529'), ('20230704'), ('20230904'), ('20231123'), ('20231124'), ('20231225'), ('20240101'), ('20240527'), ('20240704'), ('20240902'),
('20241128'), ('20241129'), ('20241225'), ('20250101'), ('20250526'), ('20250704'), ('20250901'), ('20251127'), ('20251128'), ('20251225'),
('20260101'), ('20260525'), ('20260703'), ('20260907'), ('20261126'), ('20261127'), ('20261225'), ('20270101'), ('20270531'), ('20270705'),
('20270906'), ('20271125'), ('20271126'), ('20271224'), ('20271231'), ('20280529'), ('20280704'), ('20280904'), ('20281123'), ('20281124'),
('20281225'), ('20290101'), ('20290528'), ('20290704'), ('20290903'), ('20291122'), ('20291123'), ('20291225')
) h (Holiday);
SELECT
(DATEDIFF(dd, @BegDate, @EndDate) + 1)
-(DATEDIFF(wk, @BegDate, @EndDate) * 2)
-(CASE WHEN DATENAME(dw, @BegDate) = 'Sunday' THEN 1 ELSE 0 END)
-(CASE WHEN DATENAME(dw, @EndDate) = 'Saturday' THEN 1 ELSE 0 END)
- COALESCE((SELECT SUM(CASE WHEN DATENAME(dw, Holiday) = 'Saturday' Or DATENAME(dw, Holiday) = 'Sunday' THEN 0 ELSE 1 END)), 0)
FROM @tHolidays WHERE Holiday >= @BegDate AND Holiday <= @EndDate;
October 24, 2017 at 8:13 am
Lars - or this:
SELECT
(DATEDIFF(dd, @BegDate, @EndDate) + 1)
- (DATEDIFF(wk, @BegDate, @EndDate) * 2)
- (CASE WHEN DATENAME(dw, @BegDate) = 'Sunday' THEN 1 ELSE 0 END)
- (CASE WHEN DATENAME(dw, @EndDate) = 'Saturday' THEN 1 ELSE 0 END)
- ISNULL(SUM(CASE WHEN DATENAME(dw, Holiday) IN ('Saturday','Sunday') THEN 0 ELSE 1 END),0)
FROM @tHolidays
WHERE Holiday >= @BegDate
AND Holiday <= @EndDate;
However, you still have the issue of a Cartesian join between the input table (not used in your sample) and the holidays table.
Also, performance may suffer a second blow due to the use of a table variable since they don't hold representative statistics.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 24, 2017 at 8:30 am
Hi Jeff,
However, you still have the issue of a Cartesian join between the input table (not used in your sample) and the holidays table.
Also, performance may suffer a second blow due to the use of a table variable since they don't hold representative statistics.
Well, yes. CROSS APPLY yells for cartesian product, so why not change the table value function tfn_GetWorkingDays_X to a
scalar function taking td.beg_dt and td.end_dt as input parameters and returning a number?
The table variable is only for the sake of the approach, I guess in real life it would be a database table.
October 24, 2017 at 12:49 pm
lars.hesselberg 49028 - Tuesday, October 24, 2017 8:30 AMHi Jeff,However, you still have the issue of a Cartesian join between the input table (not used in your sample) and the holidays table.
Also, performance may suffer a second blow due to the use of a table variable since they don't hold representative statistics.Well, yes. CROSS APPLY yells for cartesian product, so why not change the table value function tfn_GetWorkingDays_X to a
scalar function taking td.beg_dt and td.end_dt as input parameters and returning a number?
The table variable is only for the sake of the approach, I guess in real life it would be a database table.
For the same reason I don't set MAXDOP = 1 at the instance level.
The reasons for not using scalar functions are well know and well documented. I try not to have too many ALWAYS & NEVER rules...
but until MS makes inline scalars a reality, they will remain on my never list.
October 25, 2017 at 4:31 pm
Jeff Moden - Sunday, October 22, 2017 11:44 AMHere's my shot at this problem. Just to make life easy, I'll keep all the code together for this post. The code is attached as .txt files, which is a bit annoying... why wouldn't a forum for SQL Server allow someone to save .SQL files, I'll never know. :blink:
Build the Calendar Table
First, see the attached "Calender Rev 02 Small.txt" file. It's the old code I resurrected to make a Calendar table with some "special" columns along with some of the more traditional columns. It does not have any "holidays" associated with Easter and doesn't contain any ISO Week/Year. If you need any of that, post back. The "special" columns are...1. WorkdayNumber - An ascending count of workdays in the calendar table. If a day is a weekend or holiday, the previous workday number is "smeared" down into the weekend or holiday. This allows for some incredibly simple code (simple subtraction between two integers) to determine the number of work days between two dates. It also allows for other easy and fast calculations such as determining lead times and what things like "in 5 business days actually means all using a simple lookup and integer addition or subtraction. This column is the reason why I wanted to resurrect this skinny and quite old table. It's magic. This is the same kind of column that Joe Celko spoke of in his post except the code I've included actually does such a thing and, using the proprietary 3 part UPDATE (also known as a "Quirky Update"), is nasty fast and even runs in SQL Server 2000 (haven't tested it in 6.5 or 7) because it doesn't use RBAR and it doesn't use Lead/Lag/Preceeding, etc.
2. DTSerial - The PK of this table is the DT column, which actually stands for "DateTime" and not just "Date". DTSerial is the underlying integer value of the number of whole days that have passed since the 1st of January, 1900. It doesn't have a huge impact for performance but, as you'll see, it can make code simpler and so I recently added it for this problem.
3. DWOccurance - Although not used for the problem on this thread, this column is very useful for greatly simplifying questions/queries like "What is the date of the 3rd Monday of June" or "Create a list of the dates for every Tuesday of the month for the next two years". And, yes, "DW" stands for "Day of the Week" and is the same notation as what is offered in the SQL Server date/time functions. Please get used to that or change it to "Weekday" if you can't memorize the simple and obvious 2 character names for the common date parts. In any case, quitchurbitchin about it because I'll never use the long names for the date parts. "Keep your eye upon the donut and not upon the hole". 😉
Those columns and all the others are described in the header of the code that makes the table. I didn't take the time to add those descriptions to each column as extended properties but will in the future. This might make a good article.
To use the code, open it in SSMS, look for "TODO" to find where you need to change the range of dates to be included and make any changes you need (preset for the requirements Jason posted earlier). Then , after doing your safety review of the code, run the code.
IF you ever need to Move, Add, or Delete (MAD) a holiday, do your thing to the calendar table making sure to set the IsHoliday column in the table and adding the description. This can be done using a "bulk" bit of code to do the whole table (as in the original code for a given holiday) or just a one-off for a given year (I have to do this a lot of years at work) and then run the last two "paragraphs" of this code to renumber the WorkdayNumber column according to the changes in holidays.
The code that builds the Calendar table also includes ONLY those indexes necessary to solve this problem
Jeff, My apologies for the slow reply. I haven't been sandbagging on this I'm simply trying to move at a more deliberate pace this time around.
Yesterday afternoon and well into the evening, was spent just on the Calendar table alone…
First of all… you clearly put a huge amount of time & thought into this, going well beyond what any normal forum post could ever warrant. Seriously, if your goal was to dispel my belief that you’re one the very best SQL practitioner/educators in the world… you really missed the mark this time. I don’t mean to gush, but this really is exceptional work!
Celko was the first one, on this thread, to introduce the idea of using “ordinal_business_nbr†[#1903571], not long after I had implemented the idea of dumping the table constructor in favor of the long CASE expressions. Different implementations of the same principal… simple addition and subtraction operations are orders of magnitude faster than counting/aggregation operations… In fact, that was final nudge that prompted me to post the “X2†version early, in the first place. No need to rehash those gory details…
In short, I’ve actually had a “Celko†version kicking around the ol’ noggin for a few days now…
So, since I didn’t say it on Friday… Thank you Joe!... You’re you and you take a lot of heat for that… while I’m not saying you don’t deserve most of it… I will say that you don’t necessarily get the recognition you deserve for the good stuff. If I’m working on anything that involves hierarchies, Nested Sets is pretty much automatic at this point… And, thanks to you, I will NEVER make the mistake of conflating “records†for rows or “fields†for columns. 😉
I’m actually taken aback that by the fact that all 3 solutions were developed independently and yet they all essentially work off of the same principal. Great minds really do think alike! Either that, or this has always been common knowledge and I just showed up really late to the party…
I was definitely WAY late to the party on the noticing that you’d used “SET DATEFIRST 1†to make Monday the 1st day of the week… I had altered you original code to append “_JM†to the end of your Calendar table because I wanted to live in the same test database as a copy of our existing Calendar table (thank you for the DROP warning by the way)… I probably spent close to an hour and a half trying to figure out how and where I’d screwed up a simple name alteration so badly that Friday’s are weekends and Sundays are workdays…
Now that know what the cause is, and I can see how it makes setting DWOccurance easier and DW & DWOccurance are working on tandem to help calculate holidays. Is there a reason (that I haven’t found yet) to keep it like that? 1) It threw me for a big loop and I’m sure it’ll do the same to others and 2) I can’t make the same change to the incumbent DW column on the existing Calendar table. I can, however create that column with a different name if it is indeed an ongoing part of the overall solution.
The DTInt and DTSerial columns… Also ideas that have been rolling around, specifically, it occurred to me that 1) I wasn’t making good use of the CASE expressions ability to short circuit and 2) I could be getting killed by implicit conversions (CHAR à DATETIME) resulting from the hardcoded holiday dates. The DTInt version was what occurred to me first but basic tests showed that getting the input dates was expensive compared to the DTSerial and had no advantage. I’m not going to lie, I was patting myself on the back a little when I saw it on the table. The fact that you’ve had it completely fleshed out, packaged up and waiting on me for a few days, clearly demonstrates I’m no pretender to the thrown… but I did take it as confirmation that I wasn’t just chasing squirrels.
The conceptual details aside… Very cool t-sql… I wouldn’t have thought about that approach so it cool to see it done from a using a different approach (for me anyway). I’ll see if I can find the code I was using prior to resigning myself to using hard coded values. It worked and was fast, just not “do it a billion times in a function fastâ€â€¦ and finally, QUIRKY! I do have a soft spot for Quiry updates… even if I do have to go back to double check the “must do / must not do†checklist…
And on that note, I figured I would share this…
--=========================================================================
-- start by making a fresh copy of Jeff's Calendar table.
--=========================================================================
IF OBJECT_ID('CodeTest.dbo.Calendar_Copy', 'U') IS NULL
BEGIN -- DROP TABLE dbo.Calendar_Copy;
SELECT
cj.DT, cj.DTNext, cj.DTInt, cj.DTSerial, cj.YY, cj.MM, cj.DD, cj.DW,
cj.DWOccurance, cj.IsWorkDay, cj.IsHoliday,
WorkDayNumber = CAST(NULL AS INT), --<<== this method relys on NULLs as opposed to 0's.
cj.HolidayName
INTO CodeTest.dbo.Calendar_Copy
FROM
dbo.Calendar_JM cj;
-- With the exception of NULL's inplave of zeros,
-- bring it to a state just prior to the "Observed" code.
UPDATE cc SET
cc.IsWorkDay = 1,
cc.IsHoliday = 0,
cc.HolidayName = REPLACE(cc.HolidayName, ' (Observed)', '')
FROM
dbo.Calendar_Copy cc
WHERE
cc.HolidayName LIKE '% (Observed)';
ALTER TABLE dbo.Calendar_Copy
ADD CONSTRAINT PK_Calendar_Copy PRIMARY KEY CLUSTERED (DT) WITH FILLFACTOR = 100;
END;
--=========================================================================
-- use the LAG & LEAD windowing functions to update the "Observed" values
-- in a single refference to the base table. Using a covering index key
-- prevents a sort operation in the execution plan.
--=========================================================================
WITH
cte_Observed AS (
SELECT
cc.DW,
cc.IsHoliday,
cc.IsWorkDay,
cc.HolidayName,
oIsHoliday = CASE -- this is where I was getting beat up by the DATEFIRST = 1 ... I was using 2 & 6 origionally.
WHEN cc.DW = 1 AND LAG(cc.IsHoliday, 1, cc.IsHoliday) OVER (ORDER BY cc.DT) = 1 THEN 1
WHEN cc.DW = 5 AND LEAD(cc.IsHoliday, 1, cc.IsHoliday) OVER (ORDER BY cc.DT) = 1 THEN 1
ELSE cc.IsHoliday
END,
oIsWorkDay = CASE
WHEN cc.DW = 1 AND LAG(cc.IsHoliday, 1, cc.IsHoliday) OVER (ORDER BY cc.DT) = 1 THEN 0
WHEN cc.DW = 5 AND LEAD(cc.IsHoliday, 1, cc.IsHoliday) OVER (ORDER BY cc.DT) = 1 THEN 0
ELSE cc.IsWorkDay
END,
oHolidayName = CASE
WHEN cc.DW = 1 THEN ISNULL(NULLIF(LAG(cc.HolidayName, 1, cc.HolidayName) OVER (ORDER BY cc.DT), '') + ' (Observed)', cc.HolidayName)
WHEN cc.DW = 5 THEN ISNULL(NULLIF(LEAD(cc.HolidayName, 1, cc.HolidayName) OVER (ORDER BY cc.DT), '') + ' (Observed)', cc.HolidayName)
ELSE cc.HolidayName
END
FROM
dbo.Calendar_Copy cc
)
UPDATE o SET
o.IsHoliday = o.oIsHoliday,
o.IsWorkDay = o.oIsWorkDay,
o.HolidayName = o.oHolidayName
FROM
cte_Observed o
WHERE
o.oIsHoliday = 1
AND o.DW IN (1, 5);
--=========================================================================
-- start by filling the WorkDayNumber w/ ROW_NUMBER() where IsWorkDay = 1
-- and leaving non-workdays NULL.
--=========================================================================
WITH
cte_WorkDayNumberFill AS (
SELECT
cc.WorkDayNumber,
RN = ROW_NUMBER() OVER (ORDER BY cc.DT)
FROM
dbo.Calendar_Copy cc
WHERE
cc.IsWorkDay = 1
)
UPDATE wf SET
wf.WorkDayNumber = wf.RN
FROM
cte_WorkDayNumberFill wf;
--=========================================================================
-- Smear the over the NULL values using MAX() in a window frame.
--=========================================================================
WITH
cte_WorkDayNumberSmear AS (
SELECT
cc.WorkDayNumber,
wdns = MAX(cc.WorkDayNumber) OVER (ORDER BY cc.DT ROWS UNBOUNDED PRECEDING)
FROM
dbo.Calendar_Copy cc
)
UPDATE ws SET
ws.WorkDayNumber = ISNULL(ws.wdns, 0)
FROM
cte_WorkDayNumberSmear ws
WHERE
ws.WorkDayNumber IS NULL;
--=========================================================================
-- The net result...
--=========================================================================
SELECT
cc.DT, cc.DTNext, cc.DTInt, cc.DTSerial, cc.YY, cc.MM, cc.DD, cc.DW,
cc.DWOccurance, cc.IsWorkDay, cc.IsHoliday, cc.WorkDayNumber, cc.HolidayName
FROM
dbo.Calendar_Copy cc;
October 25, 2017 at 4:48 pm
lars.hesselberg 49028 - Tuesday, October 24, 2017 8:30 AMHi Jeff,However, you still have the issue of a Cartesian join between the input table (not used in your sample) and the holidays table.
Also, performance may suffer a second blow due to the use of a table variable since they don't hold representative statistics.Well, yes. CROSS APPLY yells for cartesian product, so why not change the table value function tfn_GetWorkingDays_X to a
scalar function taking td.beg_dt and td.end_dt as input parameters and returning a number?
The table variable is only for the sake of the approach, I guess in real life it would be a database table.
It's not "Jeff" you're responding to. It should be "Chris".
--Jeff Moden
Change is inevitable... Change for the better is not.
October 26, 2017 at 8:53 am
lars.hesselberg 49028 - Tuesday, October 24, 2017 8:30 AMHi Jeff,However, you still have the issue of a Cartesian join between the input table (not used in your sample) and the holidays table.
Also, performance may suffer a second blow due to the use of a table variable since they don't hold representative statistics.Well, yes. CROSS APPLY yells for cartesian product, so why not change the table value function tfn_GetWorkingDays_X to a
scalar function taking td.beg_dt and td.end_dt as input parameters and returning a number?
The table variable is only for the sake of the approach, I guess in real life it would be a database table.
The Cartesian product is what costs dearly in performance. That's why the I wrote the special column in the Calendar table I built. I traded off a Cartesian product for each row for doing 2 lookups from memory for each row.
If you'd like to encapsulate your good code as an iTVF, I'd be happy to test it using the million row test table that I posted.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 26, 2017 at 8:59 am
Jeff Moden - Thursday, October 26, 2017 8:53 AMlars.hesselberg 49028 - Tuesday, October 24, 2017 8:30 AMHi Jeff,However, you still have the issue of a Cartesian join between the input table (not used in your sample) and the holidays table.
Also, performance may suffer a second blow due to the use of a table variable since they don't hold representative statistics.Well, yes. CROSS APPLY yells for cartesian product, so why not change the table value function tfn_GetWorkingDays_X to a
scalar function taking td.beg_dt and td.end_dt as input parameters and returning a number?
The table variable is only for the sake of the approach, I guess in real life it would be a database table.The Cartesian product is what costs dearly in performance. That's why the I wrote the special column in the Calendar table I built. I traded off a Cartesian product for each row for doing 2 lookups from memory for each row.
If you'd like to encapsulate your good code as an iTVF, I'd be happy to test it using the million row test table that I posted.
That's what makes Joe's contribution so attractive. All of the work is performed up front when you create the calendar table, or adjust the holidays. I can't see how any other solution is going to come close to two seeks.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 26, 2017 at 3:25 pm
Something about keeping the running 'work day' count in the table just feels wrong. I know, get over it because it works and is faster. So I decided to test out Jeff's code versus mine with the same 'TestData' table., Jeff's was faster. I guess I would need to know what the purpose for this function would be to know which one I would implement. Is this a function that is being called by 'millions of users' at one time? Or is this part of some nightly batch processing to calculate the working days? Or some thing else? If it was a nightly batch I may go with mine. Needing instant feed back, then probably Jeff's.
Here is the results of my testing.
Using Jeff's code
========== DirectCode uses Calendar Table ===============================================
Table 'Workfile'.Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, loblogical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'.Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, loblogical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TestData'.Scan count 1, logical reads 3599, physical reads 0, read-ahead reads 0, loblogical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Calendar'.Scan count 2, logical reads 64, physical reads 0, read-ahead reads 0, loblogical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 671 ms, elapsed time = 7370 ms.
My code
Table 'Worktable'.Scan count 994398, logical reads 2105638, physical reads 0, read-ahead reads 0,lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'.Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, loblogical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TestData'.Scan count 5, logical reads 3645, physical reads 0, read-ahead reads 0, loblogical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table'WorkingDayTable'. Scan count 1, logical reads 22, physical reads 0, read-aheadreads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times:
CPU time = 14327 ms, elapsed time = 38613 ms.Just looking at one record in the TestData table.
Jeff's code
========== DirectCode uses Calendar Table ===============================================
Table 'Calendar'.Scan count 0, logical reads 4, physical reads 0, read-ahead reads 0, loblogical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TestData'.Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, loblogical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
My Code
Table'WorkingDayTable'. Scan count 1, logical reads 22, physical reads 0, read-aheadreads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TestData'.Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, loblogical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 6 ms.
Dates used for the test above.
beg_dt end_dt Span
2017-10-2602:50:09.963 2017-12-0414:22:10.740 39
My Code:
DECLARE @beg_dt DATETIME
,@end_dt DATETIME
,@BusinessDays INT
;SET STATISTICSTIME,IO ON;
SELECT @beg_dt = td.beg_dt
,@end_dt = td.end_dt
,@BusinessDays =
(SELECT SUM(WorkDay) AS NumberOfWorkingDays
FROM dbo.WorkingDayTable
WHERE DateField >= td.beg_dt
AND DateField <= td.end_dt )
FROM dbo.TestData td
--WHERE td.beg_dt = '2017-10-2602:50:09.963'
;
SET STATISTICS TIME,IO OFF
;
I just realized I forgot to strip the time off of beg_dt and end_dt, so that may add more time to mine. Oh well, not going to retest it.
-------------------------------------------------------------
we travel not to escape life but for life not to escape us
Don't fear failure, fear regret.
October 26, 2017 at 5:19 pm
below86 - Thursday, October 26, 2017 3:25 PMI just realized I forgot to strip the time off of beg_dt and end_dt, so that may add more time to mine. Oh well, not going to retest it.
It could also provide the wrong answer because the Holidays don't have times and could fall outside the range if beg_dt is a non-workday.
Also, I'm getting 1.2 seconds on the return for my function code and 0.6 seconds on the direct code. Are you returning to the screen or to variables to take the display time out of the picture?
--Jeff Moden
Change is inevitable... Change for the better is not.
October 26, 2017 at 5:22 pm
below86 - Thursday, October 26, 2017 3:25 PMI guess I would need to know what the purpose for this function would be to know which one I would implement. Is this a function that is being called by 'millions of users' at one time? Or is this part of some nightly batch processing to calculate the working days? Or some thing else?
That's the kind of question that would be relevant when trade-offs are being made... For example, trying to determine weather or not the increased read performance of a new index is worth the cost of slower writes and larger backups.
I don't see how that applies to user defined functions, because I don't see a trade-off... What is the down side of have faster, more efficient functions?
But, to answer your question... Everything. This is only one of many functions and they are used almost everywhere. As a business we are the go-between for insurance companies and service providers. One of out primary "value-adds" is that our contracts with various vendors mandate various time tolerances for various services. As a result, EVERYTHING gets measured on a temporal scale.
If you jump back to the 2nd page, I posted a screen shot of index usage stats for the main application database ordered by total user reads in descending order... https://www.sqlservercentral.com/Forums/FindPost1903411.aspx
The 1st two tables are calendar tables...
October 27, 2017 at 7:23 am
Jeff Moden - Thursday, October 26, 2017 5:19 PMbelow86 - Thursday, October 26, 2017 3:25 PMI just realized I forgot to strip the time off of beg_dt and end_dt, so that may add more time to mine. Oh well, not going to retest it.It could also provide the wrong answer because the Holidays don't have times and could fall outside the range if beg_dt is a non-workday.
Also, I'm getting 1.2 seconds on the return for my function code and 0.6 seconds on the direct code. Are you returning to the screen or to variables to take the display time out of the picture?
I ran it like yours, to variables to take the display out of it.
-------------------------------------------------------------
we travel not to escape life but for life not to escape us
Don't fear failure, fear regret.
October 27, 2017 at 7:44 am
Jason A. Long - Thursday, October 26, 2017 5:22 PMbelow86 - Thursday, October 26, 2017 3:25 PMI guess I would need to know what the purpose for this function would be to know which one I would implement. Is this a function that is being called by 'millions of users' at one time? Or is this part of some nightly batch processing to calculate the working days? Or some thing else?That's the kind of question that would be relevant when trade-offs are being made... For example, trying to determine weather or not the increased read performance of a new index is worth the cost of slower writes and larger backups.
I don't see how that applies to user defined functions, because I don't see a trade-off... What is the down side of have faster, more efficient functions?But, to answer your question... Everything. This is only one of many functions and they are used almost everywhere. As a business we are the go-between for insurance companies and service providers. One of out primary "value-adds" is that our contracts with various vendors mandate various time tolerances for various services. As a result, EVERYTHING gets measured on a temporal scale.
If you jump back to the 2nd page, I posted a screen shot of index usage stats for the main application database ordered by total user reads in descending order... https://www.sqlservercentral.com/Forums/FindPost1903411.aspx
The 1st two tables are calendar tables...
I'm all for having the code run as fast as possible. Like I said it is just my hang up, not liking the column with the running count. And what I was getting at was that the time saved on this function if it was running as part of a batch processing was not significant enough(IMHO) for me to want to use the calendar with the running count. As someone who also works in the insurance industry I understand the need for calculating the 'working days'. I guess in my case we don't have a need, at this time, to calculate this on the fly, it would be more of a nightly batch process.
-------------------------------------------------------------
we travel not to escape life but for life not to escape us
Don't fear failure, fear regret.
October 29, 2017 at 3:15 am
Okay... where to begin???
First... I haven't been putting this off... the exact opposite actually. I've been sucked in... It the posting updates, getting a full nights sleep and asking for help, that have been putt off for too long...
Jeff,
If the roles were reversed, and I was the one who stepped up and delivered a package of that caliber, to someone else... At the very least, I'd want SOME feedback and some test results. No excuses, I should have been back here with my preliminary results on Wednesday. Long story short, I saw something I've never seen before, and went down the rabbit hole after it... I can only hope that you can accept my apology, I really am sorry.
An explanation is in order...
It's 4:30 AM so it won't a great explanation, but here goes...
#1... The function you supplied is fast... It's really fast... When I put it against the last few that I'd been working with, it blew their doors off. anywhere from 0.9 - 1.3 secs vs 4.5 -5.5 secs... and that's the part that's been throwing me for a loop.
I've seen my fair share of bad estimates. Sometimes it's takes longer than I'd like to figure it out, but I get there and usually learn something new along the way...
What sucked me in, wasn't the fact that, your function was crushing mine... It's the fact that it shouldn't have.
I can't tell you how much time I've killed staring at execution plan node values... Looking for bad estimates, mismatches between estimated and actual rows... any thing...
And that's what's weird... the actuals and estimates were consistently a smack on match...
Let me start by saying that, I make no claims about having mastered the art of reading execution plans... but I've never seen the optimized make a 100% accurate prediction about row count and an be so off when it comes to duration.
So, I figured it should be easy enough figure out a 3 node trivial plan... No clues... I wanted to write it off as yous going parallel and mine not... but nothing was adding up.
Fast forward... alot... I've got DBCC SETCPUWEIGHT(1000); & DBCC SETIOWEIGHT(1); the damned thing are going parallel and I see "something"...
All of the actual numbers add up to roughly 1/4 second, until it hits the gather streams and then just stops hangs out, drinks a beer.
So, where I'm at now... I started dumping sys.dm_os_wait_stats into a table with a time stamp and comparing the changes... capture_dt wait_type c_waiting_tasks_count c_wait_time_sec
2017-10-29 00:04:47.4415917 CXPACKET NULL NULL
2017-10-29 00:51:42.3217744 CXPACKET 58 116.832000
2017-10-29 01:05:48.6912441 CXPACKET 55 122.180000
2017-10-29 01:21:27.8460532 CXPACKET 55 116.422000
2017-10-29 01:30:08.0868170 CXPACKET 54 117.491000
So... 2 mins per test round of CXPACKET waits...
The problem is, the more research I do, the more this sounds like a symptom, than the actual problem... Meaning I'm still in the dark about the actual problem.
I'll provide more details, execution plans and test scripts tomorrow... errr... after I wake up later today...
Viewing 15 posts - 61 through 75 (of 92 total)
You must be logged in to reply to this topic. Login to reply