Trying to speed up this 'GetWorkingDays' function.

Question

Trying to speed up this 'GetWorkingDays' function.

Viewing 15 posts - 61 through 75 (of 92 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004436 More actions · Answer 1

lars.hesselberg 49028 - Tuesday, October 24, 2017 3:35 AM
Hi,
How about this approach:
DECLARE @BegDate DATETIME = '20000701',
@EndDate DATETIME = '20000707';
Declare @tHolidays TABLE (Holiday DATETIME PRIMARY KEY);
INSERT INTO @tHolidays (Holiday)
SELECT Holiday FROM (VALUES (CAST('20000529' AS DATETIME)),
('20000704'), ('20000904'), ('20001123'), ('20001124'), ('20001225'), ('20010101'), ('20010528'), ('20010704'), ('20010903'), ('20011122'),
('20011123'), ('20011225'), ('20020101'), ('20020527'), ('20020704'), ('20020902'), ('20021128'), ('20021129'), ('20021225'), ('20030101'),
('20030526'), ('20030704'), ('20030901'), ('20031127'), ('20031128'), ('20031225'), ('20040101'), ('20040531'), ('20040705'), ('20040906'),
('20041125'), ('20041126'), ('20041224'), ('20041231'), ('20050530'), ('20050704'), ('20050905'), ('20051124'), ('20051125'), ('20051226'),
('20060102'), ('20060529'), ('20060704'), ('20060904'), ('20061123'), ('20061124'), ('20061225'), ('20070101'), ('20070528'), ('20070704'),
('20070903'), ('20071122'), ('20071123'), ('20071225'), ('20080101'), ('20080526'), ('20080704'), ('20080901'), ('20081127'), ('20081128'),
('20081225'), ('20090101'), ('20090525'), ('20090703'), ('20090907'), ('20091126'), ('20091127'), ('20091225'), ('20100101'), ('20100531'),
('20100705'), ('20100906'), ('20101125'), ('20101126'), ('20101224'), ('20101231'), ('20110530'), ('20110704'), ('20110905'), ('20111124'),
('20111125'), ('20111226'), ('20120102'), ('20120528'), ('20120704'), ('20120903'), ('20121122'), ('20121123'), ('20121225'), ('20130101'),
('20130527'), ('20130704'), ('20130902'), ('20131128'), ('20131129'), ('20131225'), ('20140101'), ('20140526'), ('20140704'), ('20140901'),
('20141127'), ('20141128'), ('20141225'), ('20150101'), ('20150525'), ('20150703'), ('20150907'), ('20151126'), ('20151127'), ('20151225'),
('20160101'), ('20160530'), ('20160704'), ('20160905'), ('20161124'), ('20161125'), ('20161226'), ('20170102'), ('20170529'), ('20170704'),
('20170904'), ('20171123'), ('20171124'), ('20171225'), ('20180101'), ('20180528'), ('20180704'), ('20180903'), ('20181122'), ('20181123'),
('20181225'), ('20190101'), ('20190527'), ('20190704'), ('20190902'), ('20191128'), ('20191129'), ('20191225'), ('20200101'), ('20200525'),
('20200703'), ('20200907'), ('20201126'), ('20201127'), ('20201225'), ('20210101'), ('20210531'), ('20210705'), ('20210906'), ('20211125'),
('20211126'), ('20211224'), ('20211231'), ('20220530'), ('20220704'), ('20220905'), ('20221124'), ('20221125'), ('20221226'), ('20230102'),
('20230529'), ('20230704'), ('20230904'), ('20231123'), ('20231124'), ('20231225'), ('20240101'), ('20240527'), ('20240704'), ('20240902'),
('20241128'), ('20241129'), ('20241225'), ('20250101'), ('20250526'), ('20250704'), ('20250901'), ('20251127'), ('20251128'), ('20251225'),
('20260101'), ('20260525'), ('20260703'), ('20260907'), ('20261126'), ('20261127'), ('20261225'), ('20270101'), ('20270531'), ('20270705'),
('20270906'), ('20271125'), ('20271126'), ('20271224'), ('20271231'), ('20280529'), ('20280704'), ('20280904'), ('20281123'), ('20281124'),
('20281225'), ('20290101'), ('20290528'), ('20290704'), ('20290903'), ('20291122'), ('20291123'), ('20291225')
) h (Holiday);
SELECT
(DATEDIFF(dd, @BegDate, @EndDate) + 1)
-(DATEDIFF(wk, @BegDate, @EndDate) * 2)
-(CASE WHEN DATENAME(dw, @BegDate) = 'Sunday' THEN 1 ELSE 0 END)
-(CASE WHEN DATENAME(dw, @EndDate) = 'Saturday' THEN 1 ELSE 0 END)
- (SELECT SUM(CASE WHEN DATENAME(dw, Holiday) = 'Saturday' Or DATENAME(dw, Holiday) = 'Sunday' THEN 0 ELSE 1 END)
FROM @tHolidays WHERE Holiday >= @BegDate AND Holiday <= @EndDate);
Lars

Good idea but try your code with the following dates, Lars.

DECLARE @BegDate DATETIME = '20000101', @EndDate DATETIME = '20000221' ;

It returns a NULL even though there are several weekends involved that should be counted as "holidays"/non-working days. The reason is because the code only returns a value if the happens to be a holiday.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

lars.hesselberg 49028 Valued Member Points: 57 More actions · Answer 2

Hi Jeff,
You got me! The reason is obvious: number - NULL = NULL.
Here is the fix:
-- Comment in/out as needed for testing
--DECLARE @BegDate DATETIME = '20000701', @EndDate DATETIME = '20000707'; -- 5 days, 1 Holiday => 4 Workdays
DECLARE @BegDate DATETIME = '20000101', @EndDate DATETIME = '20000221'; -- 36 days, No holidays => 36 workdays
--DECLARE @BegDate DATETIME = '20021225', @EndDate DATETIME = '20030101'; -- 6 days, 2 holidays => 4 workdays

select (DATEDIFF(dd, @BegDate, @EndDate) + 1)

Declare @tHolidays TABLE (Holiday DATETIME PRIMARY KEY);

INSERT INTO @tHolidays (Holiday)
SELECT Holiday FROM (VALUES (CAST('20000529' AS DATETIME)),
('20000704'), ('20000904'), ('20001123'), ('20001124'), ('20001225'), ('20010101'), ('20010528'), ('20010704'), ('20010903'), ('20011122'),
('20011123'), ('20011225'), ('20020101'), ('20020527'), ('20020704'), ('20020902'), ('20021128'), ('20021129'), ('20021225'), ('20030101'),
('20030526'), ('20030704'), ('20030901'), ('20031127'), ('20031128'), ('20031225'), ('20040101'), ('20040531'), ('20040705'), ('20040906'),
('20041125'), ('20041126'), ('20041224'), ('20041231'), ('20050530'), ('20050704'), ('20050905'), ('20051124'), ('20051125'), ('20051226'),
('20060102'), ('20060529'), ('20060704'), ('20060904'), ('20061123'), ('20061124'), ('20061225'), ('20070101'), ('20070528'), ('20070704'),
('20070903'), ('20071122'), ('20071123'), ('20071225'), ('20080101'), ('20080526'), ('20080704'), ('20080901'), ('20081127'), ('20081128'),
('20081225'), ('20090101'), ('20090525'), ('20090703'), ('20090907'), ('20091126'), ('20091127'), ('20091225'), ('20100101'), ('20100531'),
('20100705'), ('20100906'), ('20101125'), ('20101126'), ('20101224'), ('20101231'), ('20110530'), ('20110704'), ('20110905'), ('20111124'),
('20111125'), ('20111226'), ('20120102'), ('20120528'), ('20120704'), ('20120903'), ('20121122'), ('20121123'), ('20121225'), ('20130101'),
('20130527'), ('20130704'), ('20130902'), ('20131128'), ('20131129'), ('20131225'), ('20140101'), ('20140526'), ('20140704'), ('20140901'),
('20141127'), ('20141128'), ('20141225'), ('20150101'), ('20150525'), ('20150703'), ('20150907'), ('20151126'), ('20151127'), ('20151225'),
('20160101'), ('20160530'), ('20160704'), ('20160905'), ('20161124'), ('20161125'), ('20161226'), ('20170102'), ('20170529'), ('20170704'),
('20170904'), ('20171123'), ('20171124'), ('20171225'), ('20180101'), ('20180528'), ('20180704'), ('20180903'), ('20181122'), ('20181123'),
('20181225'), ('20190101'), ('20190527'), ('20190704'), ('20190902'), ('20191128'), ('20191129'), ('20191225'), ('20200101'), ('20200525'),
('20200703'), ('20200907'), ('20201126'), ('20201127'), ('20201225'), ('20210101'), ('20210531'), ('20210705'), ('20210906'), ('20211125'),
('20211126'), ('20211224'), ('20211231'), ('20220530'), ('20220704'), ('20220905'), ('20221124'), ('20221125'), ('20221226'), ('20230102'),
('20230529'), ('20230704'), ('20230904'), ('20231123'), ('20231124'), ('20231225'), ('20240101'), ('20240527'), ('20240704'), ('20240902'),
('20241128'), ('20241129'), ('20241225'), ('20250101'), ('20250526'), ('20250704'), ('20250901'), ('20251127'), ('20251128'), ('20251225'),
('20260101'), ('20260525'), ('20260703'), ('20260907'), ('20261126'), ('20261127'), ('20261225'), ('20270101'), ('20270531'), ('20270705'),
('20270906'), ('20271125'), ('20271126'), ('20271224'), ('20271231'), ('20280529'), ('20280704'), ('20280904'), ('20281123'), ('20281124'),
('20281225'), ('20290101'), ('20290528'), ('20290704'), ('20290903'), ('20291122'), ('20291123'), ('20291225')
) h (Holiday);

SELECT

(DATEDIFF(dd, @BegDate, @EndDate) + 1)
-(DATEDIFF(wk, @BegDate, @EndDate) * 2)
-(CASE WHEN DATENAME(dw, @BegDate) = 'Sunday' THEN 1 ELSE 0 END)
-(CASE WHEN DATENAME(dw, @EndDate) = 'Saturday' THEN 1 ELSE 0 END)

- COALESCE((SELECT SUM(CASE WHEN DATENAME(dw, Holiday) = 'Saturday' Or DATENAME(dw, Holiday) = 'Sunday' THEN 0 ELSE 1 END)), 0)

FROM @tHolidays WHERE Holiday >= @BegDate AND Holiday <= @EndDate;

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 3

Lars - or this:

SELECT

(DATEDIFF(dd, @BegDate, @EndDate) + 1)

- (DATEDIFF(wk, @BegDate, @EndDate) * 2)

- (CASE WHEN DATENAME(dw, @BegDate) = 'Sunday' THEN 1 ELSE 0 END)

- (CASE WHEN DATENAME(dw, @EndDate) = 'Saturday' THEN 1 ELSE 0 END)

- ISNULL(SUM(CASE WHEN DATENAME(dw, Holiday) IN ('Saturday','Sunday') THEN 0 ELSE 1 END),0)

FROM @tHolidays

WHERE Holiday >= @BegDate

AND Holiday <= @EndDate;

However, you still have the issue of a Cartesian join between the input table (not used in your sample) and the holidays table.
Also, performance may suffer a second blow due to the use of a table variable since they don't hold representative statistics.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

lars.hesselberg 49028 Valued Member Points: 57 More actions · Answer 4

Hi Jeff,

However, you still have the issue of a Cartesian join between the input table (not used in your sample) and the holidays table.
Also, performance may suffer a second blow due to the use of a table variable since they don't hold representative statistics.

Well, yes. CROSS APPLY yells for cartesian product, so why not change the table value function tfn_GetWorkingDays_X to a
scalar function taking td.beg_dt and td.end_dt as input parameters and returning a number?
The table variable is only for the sake of the approach, I guess in real life it would be a database table.

Jason A. Long SSC-Insane Points: 23786 More actions · Answer 5

lars.hesselberg 49028 - Tuesday, October 24, 2017 8:30 AM
Hi Jeff,
However, you still have the issue of a Cartesian join between the input table (not used in your sample) and the holidays table.
Also, performance may suffer a second blow due to the use of a table variable since they don't hold representative statistics.
Well, yes. CROSS APPLY yells for cartesian product, so why not change the table value function tfn_GetWorkingDays_X to a
scalar function taking td.beg_dt and td.end_dt as input parameters and returning a number?
The table variable is only for the sake of the approach, I guess in real life it would be a database table.

For the same reason I don't set MAXDOP = 1 at the instance level.
The reasons for not using scalar functions are well know and well documented. I try not to have too many ALWAYS & NEVER rules...
but until MS makes inline scalars a reality, they will remain on my never list.

Jason A. Long SSC-Insane Points: 23786 More actions · Answer 6

Jeff Moden - Sunday, October 22, 2017 11:44 AM
Here's my shot at this problem. Just to make life easy, I'll keep all the code together for this post. The code is attached as .txt files, which is a bit annoying... why wouldn't a forum for SQL Server allow someone to save .SQL files, I'll never know. :blink:
Build the Calendar Table
First, see the attached "Calender Rev 02 Small.txt" file. It's the old code I resurrected to make a Calendar table with some "special" columns along with some of the more traditional columns. It does not have any "holidays" associated with Easter and doesn't contain any ISO Week/Year. If you need any of that, post back. The "special" columns are...
1. WorkdayNumber - An ascending count of workdays in the calendar table. If a day is a weekend or holiday, the previous workday number is "smeared" down into the weekend or holiday. This allows for some incredibly simple code (simple subtraction between two integers) to determine the number of work days between two dates. It also allows for other easy and fast calculations such as determining lead times and what things like "in 5 business days actually means all using a simple lookup and integer addition or subtraction. This column is the reason why I wanted to resurrect this skinny and quite old table. It's magic. This is the same kind of column that Joe Celko spoke of in his post except the code I've included actually does such a thing and, using the proprietary 3 part UPDATE (also known as a "Quirky Update"), is nasty fast and even runs in SQL Server 2000 (haven't tested it in 6.5 or 7) because it doesn't use RBAR and it doesn't use Lead/Lag/Preceeding, etc.
2. DTSerial - The PK of this table is the DT column, which actually stands for "DateTime" and not just "Date". DTSerial is the underlying integer value of the number of whole days that have passed since the 1st of January, 1900. It doesn't have a huge impact for performance but, as you'll see, it can make code simpler and so I recently added it for this problem.
3. DWOccurance - Although not used for the problem on this thread, this column is very useful for greatly simplifying questions/queries like "What is the date of the 3rd Monday of June" or "Create a list of the dates for every Tuesday of the month for the next two years". And, yes, "DW" stands for "Day of the Week" and is the same notation as what is offered in the SQL Server date/time functions. Please get used to that or change it to "Weekday" if you can't memorize the simple and obvious 2 character names for the common date parts. In any case, quitchurbitchin about it because I'll never use the long names for the date parts. "Keep your eye upon the donut and not upon the hole". 😉
Those columns and all the others are described in the header of the code that makes the table. I didn't take the time to add those descriptions to each column as extended properties but will in the future. This might make a good article.
To use the code, open it in SSMS, look for "TODO" to find where you need to change the range of dates to be included and make any changes you need (preset for the requirements Jason posted earlier). Then , after doing your safety review of the code, run the code.
IF you ever need to Move, Add, or Delete (MAD) a holiday, do your thing to the calendar table making sure to set the IsHoliday column in the table and adding the description. This can be done using a "bulk" bit of code to do the whole table (as in the original code for a given holiday) or just a one-off for a given year (I have to do this a lot of years at work) and then run the last two "paragraphs" of this code to renumber the WorkdayNumber column according to the changes in holidays.
The code that builds the Calendar table also includes ONLY those indexes necessary to solve this problem

Jeff, My apologies for the slow reply. I haven't been sandbagging on this I'm simply trying to move at a more deliberate pace this time around.

Yesterday afternoon and well into the evening, was spent just on the Calendar table aloneâ€¦

First of allâ€¦ you clearly put a huge amount of time & thought into this, going well beyond what any normal forum post could ever warrant. Seriously, if your goal was to dispel my belief that youâ€™re one the very best SQL practitioner/educators in the worldâ€¦ you really missed the mark this time. I donâ€™t mean to gush, but this really is exceptional work!

Celko was the first one, on this thread, to introduce the idea of using â€œordinal_business_nbrâ€ [#1903571], not long after I had implemented the idea of dumping the table constructor in favor of the long CASE expressions. Different implementations of the same principalâ€¦ simple addition and subtraction operations are orders of magnitude faster than counting/aggregation operationsâ€¦ In fact, that was final nudge that prompted me to post the â€œX2â€ version early, in the first place. No need to rehash those gory detailsâ€¦

In short, Iâ€™ve actually had a â€œCelkoâ€ version kicking around the olâ€™ noggin for a few days nowâ€¦

So, since I didnâ€™t say it on Fridayâ€¦ Thank you Joe!... Youâ€™re you and you take a lot of heat for thatâ€¦ while Iâ€™m not saying you donâ€™t deserve most of itâ€¦ I will say that you donâ€™t necessarily get the recognition you deserve for the good stuff. If Iâ€™m working on anything that involves hierarchies, Nested Sets is pretty much automatic at this pointâ€¦ And, thanks to you, I will NEVER make the mistake of conflating â€œrecordsâ€ for rows or â€œfieldsâ€ for columns. 😉

Iâ€™m actually taken aback that by the fact that all 3 solutions were developed independently and yet they all essentially work off of the same principal. Great minds really do think alike! Either that, or this has always been common knowledge and I just showed up really late to the partyâ€¦

I was definitely WAY late to the party on the noticing that youâ€™d used â€œSET DATEFIRST 1â€ to make Monday the 1^st day of the weekâ€¦ I had altered you original code to append â€œ_JMâ€ to the end of your Calendar table because I wanted to live in the same test database as a copy of our existing Calendar table (thank you for the DROP warning by the way)â€¦ I probably spent close to an hour and a half trying to figure out how and where Iâ€™d screwed up a simple name alteration so badly that Fridayâ€™s are weekends and Sundays are workdaysâ€¦

Now that know what the cause is, and I can see how it makes setting DWOccurance easier and DW & DWOccurance are working on tandem to help calculate holidays. Is there a reason (that I havenâ€™t found yet) to keep it like that? 1) It threw me for a big loop and Iâ€™m sure itâ€™ll do the same to others and 2) I canâ€™t make the same change to the incumbent DW column on the existing Calendar table. I can, however create that column with a different name if it is indeed an ongoing part of the overall solution.

The DTInt and DTSerial columnsâ€¦ Also ideas that have been rolling around, specifically, it occurred to me that 1) I wasnâ€™t making good use of the CASE expressions ability to short circuit and 2) I could be getting killed by implicit conversions (CHAR Ã DATETIME) resulting from the hardcoded holiday dates. The DTInt version was what occurred to me first but basic tests showed that getting the input dates was expensive compared to the DTSerial and had no advantage. Iâ€™m not going to lie, I was patting myself on the back a little when I saw it on the table. The fact that youâ€™ve had it completely fleshed out, packaged up and waiting on me for a few days, clearly demonstrates Iâ€™m no pretender to the thrownâ€¦ but I did take it as confirmation that I wasnâ€™t just chasing squirrels.

The conceptual details asideâ€¦ Very cool t-sqlâ€¦ I wouldnâ€™t have thought about that approach so it cool to see it done from a using a different approach (for me anyway). Iâ€™ll see if I can find the code I was using prior to resigning myself to using hard coded values. It worked and was fast, just not â€œdo it a billion times in a function fastâ€â€¦ and finally, QUIRKY! I do have a soft spot for Quiry updatesâ€¦ even if I do have to go back to double check the â€œmust do / must not doâ€ checklistâ€¦

And on that note, I figured I would share thisâ€¦

--========================================================================= -- start by making a fresh copy of Jeff's Calendar table. --========================================================================= IF OBJECT_ID('CodeTest.dbo.Calendar_Copy', 'U') IS NULL BEGIN -- DROP TABLE dbo.Calendar_Copy; SELECT cj.DT, cj.DTNext, cj.DTInt, cj.DTSerial, cj.YY, cj.MM, cj.DD, cj.DW, cj.DWOccurance, cj.IsWorkDay, cj.IsHoliday, WorkDayNumber = CAST(NULL AS INT), --<<== this method relys on NULLs as opposed to 0's. cj.HolidayName INTO CodeTest.dbo.Calendar_Copy FROM dbo.Calendar_JM cj;

-- With the exception of NULL's inplave of zeros, -- bring it to a state just prior to the "Observed" code. UPDATE cc SET cc.IsWorkDay = 1, cc.IsHoliday = 0, cc.HolidayName = REPLACE(cc.HolidayName, ' (Observed)', '') FROM dbo.Calendar_Copy cc WHERE cc.HolidayName LIKE '% (Observed)';

ALTER TABLE dbo.Calendar_Copy ADD CONSTRAINT PK_Calendar_Copy PRIMARY KEY CLUSTERED (DT) WITH FILLFACTOR = 100; END;

--========================================================================= -- use the LAG & LEAD windowing functions to update the "Observed" values -- in a single refference to the base table. Using a covering index key -- prevents a sort operation in the execution plan. --========================================================================= WITH cte_Observed AS ( SELECT cc.DW, cc.IsHoliday, cc.IsWorkDay, cc.HolidayName, oIsHoliday = CASE -- this is where I was getting beat up by the DATEFIRST = 1 ... I was using 2 & 6 origionally. WHEN cc.DW = 1 AND LAG(cc.IsHoliday, 1, cc.IsHoliday) OVER (ORDER BY cc.DT) = 1 THEN 1 WHEN cc.DW = 5 AND LEAD(cc.IsHoliday, 1, cc.IsHoliday) OVER (ORDER BY cc.DT) = 1 THEN 1 ELSE cc.IsHoliday END, oIsWorkDay = CASE WHEN cc.DW = 1 AND LAG(cc.IsHoliday, 1, cc.IsHoliday) OVER (ORDER BY cc.DT) = 1 THEN 0 WHEN cc.DW = 5 AND LEAD(cc.IsHoliday, 1, cc.IsHoliday) OVER (ORDER BY cc.DT) = 1 THEN 0 ELSE cc.IsWorkDay END, oHolidayName = CASE WHEN cc.DW = 1 THEN ISNULL(NULLIF(LAG(cc.HolidayName, 1, cc.HolidayName) OVER (ORDER BY cc.DT), '') + ' (Observed)', cc.HolidayName) WHEN cc.DW = 5 THEN ISNULL(NULLIF(LEAD(cc.HolidayName, 1, cc.HolidayName) OVER (ORDER BY cc.DT), '') + ' (Observed)', cc.HolidayName) ELSE cc.HolidayName END FROM dbo.Calendar_Copy cc ) UPDATE o SET o.IsHoliday = o.oIsHoliday, o.IsWorkDay = o.oIsWorkDay, o.HolidayName = o.oHolidayName FROM cte_Observed o WHERE o.oIsHoliday = 1 AND o.DW IN (1, 5);

--========================================================================= -- start by filling the WorkDayNumber w/ ROW_NUMBER() where IsWorkDay = 1 -- and leaving non-workdays NULL. --========================================================================= WITH cte_WorkDayNumberFill AS ( SELECT cc.WorkDayNumber, RN = ROW_NUMBER() OVER (ORDER BY cc.DT) FROM dbo.Calendar_Copy cc WHERE cc.IsWorkDay = 1 ) UPDATE wf SET wf.WorkDayNumber = wf.RN FROM cte_WorkDayNumberFill wf;

--========================================================================= -- Smear the over the NULL values using MAX() in a window frame. --========================================================================= WITH cte_WorkDayNumberSmear AS ( SELECT cc.WorkDayNumber, wdns = MAX(cc.WorkDayNumber) OVER (ORDER BY cc.DT ROWS UNBOUNDED PRECEDING) FROM dbo.Calendar_Copy cc ) UPDATE ws SET ws.WorkDayNumber = ISNULL(ws.wdns, 0) FROM cte_WorkDayNumberSmear ws WHERE ws.WorkDayNumber IS NULL;

--========================================================================= -- The net result... --========================================================================= SELECT cc.DT, cc.DTNext, cc.DTInt, cc.DTSerial, cc.YY, cc.MM, cc.DD, cc.DW, cc.DWOccurance, cc.IsWorkDay, cc.IsHoliday, cc.WorkDayNumber, cc.HolidayName FROM dbo.Calendar_Copy cc;

Jeff Moden SSC Guru Points: 1004436 More actions · Answer 7

lars.hesselberg 49028 - Tuesday, October 24, 2017 8:30 AM
Hi Jeff,
However, you still have the issue of a Cartesian join between the input table (not used in your sample) and the holidays table.
Also, performance may suffer a second blow due to the use of a table variable since they don't hold representative statistics.
Well, yes. CROSS APPLY yells for cartesian product, so why not change the table value function tfn_GetWorkingDays_X to a
scalar function taking td.beg_dt and td.end_dt as input parameters and returning a number?
The table variable is only for the sake of the approach, I guess in real life it would be a database table.

It's not "Jeff" you're responding to. It should be "Chris".

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004436 More actions · Answer 8

lars.hesselberg 49028 - Tuesday, October 24, 2017 8:30 AM
Hi Jeff,
However, you still have the issue of a Cartesian join between the input table (not used in your sample) and the holidays table.
Also, performance may suffer a second blow due to the use of a table variable since they don't hold representative statistics.
Well, yes. CROSS APPLY yells for cartesian product, so why not change the table value function tfn_GetWorkingDays_X to a
scalar function taking td.beg_dt and td.end_dt as input parameters and returning a number?
The table variable is only for the sake of the approach, I guess in real life it would be a database table.

The Cartesian product is what costs dearly in performance. That's why the I wrote the special column in the Calendar table I built. I traded off a Cartesian product for each row for doing 2 lookups from memory for each row.

If you'd like to encapsulate your good code as an iTVF, I'd be happy to test it using the million row test table that I posted.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 9

Jeff Moden - Thursday, October 26, 2017 8:53 AM
lars.hesselberg 49028 - Tuesday, October 24, 2017 8:30 AM
Hi Jeff,
However, you still have the issue of a Cartesian join between the input table (not used in your sample) and the holidays table.
Also, performance may suffer a second blow due to the use of a table variable since they don't hold representative statistics.
Well, yes. CROSS APPLY yells for cartesian product, so why not change the table value function tfn_GetWorkingDays_X to a
scalar function taking td.beg_dt and td.end_dt as input parameters and returning a number?
The table variable is only for the sake of the approach, I guess in real life it would be a database table.
The Cartesian product is what costs dearly in performance. That's why the I wrote the special column in the Calendar table I built. I traded off a Cartesian product for each row for doing 2 lookups from memory for each row.
If you'd like to encapsulate your good code as an iTVF, I'd be happy to test it using the million row test table that I posted.

That's what makes Joe's contribution so attractive. All of the work is performed up front when you create the calendar table, or adjust the holidays. I can't see how any other solution is going to come close to two seeks.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

below86 SSChampion Points: 11554 More actions · Answer 10

Something about keeping the running 'work day' count in the table just feels wrong. I know, get over it because it works and is faster. So I decided to test out Jeff's code versus mine with the same 'TestData' table., Jeff's was faster. I guess I would need to know what the purpose for this function would be to know which one I would implement. Is this a function that is being called by 'millions of users' at one time? Or is this part of some nightly batch processing to calculate the working days? Or some thing else? If it was a nightly batch I may go with mine. Needing instant feed back, then probably Jeff's.
Here is the results of my testing.
Using Jeff's code
========== DirectCode uses Calendar Table ===============================================
Table 'Workfile'.Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, loblogical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'.Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, loblogical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TestData'.Scan count 1, logical reads 3599, physical reads 0, read-ahead reads 0, loblogical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Calendar'.Scan count 2, logical reads 64, physical reads 0, read-ahead reads 0, loblogical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 671 ms, elapsed time = 7370 ms.

My code
Table 'Worktable'.Scan count 994398, logical reads 2105638, physical reads 0, read-ahead reads 0,lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'.Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, loblogical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TestData'.Scan count 5, logical reads 3645, physical reads 0, read-ahead reads 0, loblogical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table'WorkingDayTable'. Scan count 1, logical reads 22, physical reads 0, read-aheadreads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times:
CPU time = 14327 ms, elapsed time = 38613 ms.Just looking at one record in the TestData table.
Jeff's code
========== DirectCode uses Calendar Table ===============================================
Table 'Calendar'.Scan count 0, logical reads 4, physical reads 0, read-ahead reads 0, loblogical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TestData'.Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, loblogical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

My Code
Table'WorkingDayTable'. Scan count 1, logical reads 22, physical reads 0, read-aheadreads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TestData'.Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, loblogical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 6 ms.

Dates used for the test above.
beg_dt end_dt Span
2017-10-2602:50:09.963 2017-12-0414:22:10.740 39

My Code:
DECLARE @beg_dt       DATETIME
,@end_dt        DATETIME
,@BusinessDays INT
;SET STATISTICSTIME,IO ON;
SELECT @beg_dt        = td.beg_dt
,@end_dt        = td.end_dt
,@BusinessDays =
(SELECT SUM(WorkDay) AS NumberOfWorkingDays
FROM dbo.WorkingDayTable
WHERE DateField >= td.beg_dt
AND DateField <= td.end_dt )
FROM dbo.TestData td
--WHERE td.beg_dt = '2017-10-2602:50:09.963'
;
SET STATISTICS TIME,IO OFF
;

I just realized I forgot to strip the time off of beg_dt and end_dt, so that may add more time to mine. Oh well, not going to retest it.

-------------------------------------------------------------
we travel not to escape life but for life not to escape us
Don't fear failure, fear regret.

Jeff Moden SSC Guru Points: 1004436 More actions · Answer 11

below86 - Thursday, October 26, 2017 3:25 PM
I just realized I forgot to strip the time off of beg_dt and end_dt, so that may add more time to mine. Oh well, not going to retest it.

It could also provide the wrong answer because the Holidays don't have times and could fall outside the range if beg_dt is a non-workday.

Also, I'm getting 1.2 seconds on the return for my function code and 0.6 seconds on the direct code. Are you returning to the screen or to variables to take the display time out of the picture?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jason A. Long SSC-Insane Points: 23786 More actions · Answer 12

below86 - Thursday, October 26, 2017 3:25 PM
I guess I would need to know what the purpose for this function would be to know which one I would implement. Is this a function that is being called by 'millions of users' at one time? Or is this part of some nightly batch processing to calculate the working days? Or some thing else?

That's the kind of question that would be relevant when trade-offs are being made... For example, trying to determine weather or not the increased read performance of a new index is worth the cost of slower writes and larger backups.
I don't see how that applies to user defined functions, because I don't see a trade-off... What is the down side of have faster, more efficient functions?

But, to answer your question... Everything. This is only one of many functions and they are used almost everywhere. As a business we are the go-between for insurance companies and service providers. One of out primary "value-adds" is that our contracts with various vendors mandate various time tolerances for various services. As a result, EVERYTHING gets measured on a temporal scale.
If you jump back to the 2nd page, I posted a screen shot of index usage stats for the main application database ordered by total user reads in descending order... https://www.sqlservercentral.com/Forums/FindPost1903411.aspx
The 1st two tables are calendar tables...

below86 SSChampion Points: 11554 More actions · Answer 13

Jeff Moden - Thursday, October 26, 2017 5:19 PM
below86 - Thursday, October 26, 2017 3:25 PM
I just realized I forgot to strip the time off of beg_dt and end_dt, so that may add more time to mine. Oh well, not going to retest it.
It could also provide the wrong answer because the Holidays don't have times and could fall outside the range if beg_dt is a non-workday.
Also, I'm getting 1.2 seconds on the return for my function code and 0.6 seconds on the direct code. Are you returning to the screen or to variables to take the display time out of the picture?

I ran it like yours, to variables to take the display out of it.

-------------------------------------------------------------
we travel not to escape life but for life not to escape us
Don't fear failure, fear regret.

below86 SSChampion Points: 11554 More actions · Answer 14

Jason A. Long - Thursday, October 26, 2017 5:22 PM
below86 - Thursday, October 26, 2017 3:25 PM
I guess I would need to know what the purpose for this function would be to know which one I would implement. Is this a function that is being called by 'millions of users' at one time? Or is this part of some nightly batch processing to calculate the working days? Or some thing else?
That's the kind of question that would be relevant when trade-offs are being made... For example, trying to determine weather or not the increased read performance of a new index is worth the cost of slower writes and larger backups.
I don't see how that applies to user defined functions, because I don't see a trade-off... What is the down side of have faster, more efficient functions?
But, to answer your question... Everything. This is only one of many functions and they are used almost everywhere. As a business we are the go-between for insurance companies and service providers. One of out primary "value-adds" is that our contracts with various vendors mandate various time tolerances for various services. As a result, EVERYTHING gets measured on a temporal scale.
If you jump back to the 2nd page, I posted a screen shot of index usage stats for the main application database ordered by total user reads in descending order... https://www.sqlservercentral.com/Forums/FindPost1903411.aspx
The 1st two tables are calendar tables...

I'm all for having the code run as fast as possible. Like I said it is just my hang up, not liking the column with the running count. And what I was getting at was that the time saved on this function if it was running as part of a batch processing was not significant enough(IMHO) for me to want to use the calendar with the running count. As someone who also works in the insurance industry I understand the need for calculating the 'working days'. I guess in my case we don't have a need, at this time, to calculate this on the fly, it would be more of a nightly batch process.

-------------------------------------------------------------
we travel not to escape life but for life not to escape us
Don't fear failure, fear regret.

Jason A. Long SSC-Insane Points: 23786 More actions · Answer 15

Okay... where to begin???

First... I haven't been putting this off... the exact opposite actually. I've been sucked in... It the posting updates, getting a full nights sleep and asking for help, that have been putt off for too long...

Jeff,
If the roles were reversed, and I was the one who stepped up and delivered a package of that caliber, to someone else... At the very least, I'd want SOME feedback and some test results. No excuses, I should have been back here with my preliminary results on Wednesday. Long story short, I saw something I've never seen before, and went down the rabbit hole after it... I can only hope that you can accept my apology, I really am sorry.

An explanation is in order...
It's 4:30 AM so it won't a great explanation, but here goes...
#1... The function you supplied is fast... It's really fast... When I put it against the last few that I'd been working with, it blew their doors off. anywhere from 0.9 - 1.3 secs vs 4.5 -5.5 secs... and that's the part that's been throwing me for a loop.
I've seen my fair share of bad estimates. Sometimes it's takes longer than I'd like to figure it out, but I get there and usually learn something new along the way...
What sucked me in, wasn't the fact that, your function was crushing mine... It's the fact that it shouldn't have.
I can't tell you how much time I've killed staring at execution plan node values... Looking for bad estimates, mismatches between estimated and actual rows... any thing...
And that's what's weird... the actuals and estimates were consistently a smack on match...
Let me start by saying that, I make no claims about having mastered the art of reading execution plans... but I've never seen the optimized make a 100% accurate prediction about row count and an be so off when it comes to duration.
So, I figured it should be easy enough figure out a 3 node trivial plan... No clues... I wanted to write it off as yous going parallel and mine not... but nothing was adding up.
Fast forward... alot... I've got DBCC SETCPUWEIGHT(1000); & DBCC SETIOWEIGHT(1); the damned thing are going parallel and I see "something"...
All of the actual numbers add up to roughly 1/4 second, until it hits the gather streams and then just stops hangs out, drinks a beer.
So, where I'm at now... I started dumping sys.dm_os_wait_stats into a table with a time stamp and comparing the changes...
capture_dt wait_type c_waiting_tasks_count c_wait_time_sec 2017-10-29 00:04:47.4415917 CXPACKET NULL NULL 2017-10-29 00:51:42.3217744 CXPACKET 58 116.832000 2017-10-29 01:05:48.6912441 CXPACKET 55 122.180000 2017-10-29 01:21:27.8460532 CXPACKET 55 116.422000 2017-10-29 01:30:08.0868170 CXPACKET 54 117.491000
So... 2 mins per test round of CXPACKET waits...
The problem is, the more research I do, the more this sounds like a symptom, than the actual problem... Meaning I'm still in the dark about the actual problem.
I'll provide more details, execution plans and test scripts tomorrow... errr... after I wake up later today...